ArchiveItAPI Reference
The main client class for interacting with the Archive-it API.
First time using pyarchiveit?
See the Getting Started guide for installation and initialization instructions.
A client for interacting with the Archive-it API.
__init__(account_name, account_password, base_url='https://partner.archive-it.org/api/', default_timeout=None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
account_name
|
str
|
The account name for authentication. |
required |
account_password
|
str
|
The account password for authentication. |
required |
base_url
|
str
|
The base URL for the API endpoints. Defaults to Archive-it API base URL. |
'https://partner.archive-it.org/api/'
|
default_timeout
|
float | None
|
Default timeout in seconds. Defaults to None. Use None for no timeout. |
None
|
create_seed(url, collection_id, crawl_definition_id, other_params=None, metadata=None)
Create a new seed in a specified collection with given crawl definition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL of the seed to create. |
required |
collection_id
|
str | int
|
The ID of the collection to add the seed to. |
required |
crawl_definition_id
|
str | int
|
The ID of the crawl definition to associate with the seed. |
required |
other_params
|
dict | None
|
Additional parameters for the seed creation. |
None
|
metadata
|
dict | None
|
Metadata to set for the seed after creation. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The validated created seed data returned by the API. |
Raises:
| Type | Description |
|---|---|
ValidationError
|
If the input data or metadata structure is invalid. |
delete_seed(seed_id)
Delete a seed by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_id
|
str | int
|
The ID of the seed to delete. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The validated seed data from the API after deletion. The 'deleted' flag should be True. |
Raises:
| Type | Description |
|---|---|
ValidationError
|
If the API returns invalid seed data. |
get_seed_by_id(seed_id)
Get a seed by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_id
|
str | int
|
The ID of the seed to retrieve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The validated seed data returned by the API. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the API request fails. |
TimeoutException
|
If the request times out. |
ValidationError
|
If the API returns invalid seed data. |
get_seed_list(collection_id, limit=-1, sort=None, pluck=None, format='json', additional_query=None)
Get seeds for a given collection ID or list of collection IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str | int | list[str | int]
|
Collection ID or list of Collection IDs. |
required |
limit
|
int
|
Maximum number of seeds to retrieve per collection. Defaults to -1 (no limit). |
-1
|
sort
|
str | None
|
Sort order based on the result. Negative values (-) indicate ascending order. Defaults to None. |
None
|
pluck
|
str | None
|
Specific field to extract from each seed object (e.g. "url", "id" ). Defaults to None (returns full seed objects). |
None
|
format
|
str
|
The format of the response (json or xml). Defaults to "json". |
'json'
|
additional_query
|
dict
|
Additional query parameters to include in the request. |
None
|
Returns:
| Type | Description |
|---|---|
list
|
list[SeedKeys] | list: If pluck is None, returns list of validated seed objects. If pluck is specified, returns list of the plucked field values. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the API request fails. |
TimeoutException
|
If the request times out. |
ValidationError
|
If the API returns invalid seed data. |
ValueError
|
If the |
get_seed_with_metadata(metadata_field=None, metadata_value=None, limit=-1, pluck=None)
Get seeds that match a specific metadata field and value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_field
|
str | None
|
The metadata field to search (e.g., "Title", "Author"). |
None
|
metadata_value
|
str | None
|
The value to search for within the specified metadata field. |
None
|
limit
|
int
|
Maximum number of seeds to retrieve. Defaults to -1 (no limit). |
-1
|
pluck
|
str | None
|
Specific field to extract from each seed object (e.g. "collection"). Defaults to None (returns full seed objects). |
None
|
search_seed_metadata(metadata_field=None, metadata_value=None, limit=-1, pluck=None)
Search seeds by metadata field and value.
Note
It is not necessary to search with the metadata_field to search for the value. If you just want to look up a value across all metadata fields, simply pass the value to metadata_value and leave metadata_field as None.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metadata_field
|
str | list | None
|
The metadata field to search (e.g., "Title", "Author"). If a list is provided, searches within any of the fields. |
None
|
metadata_value
|
str | list | None
|
The value to search for within the specified metadata field. If a list is provided, searches for any of the values. |
None
|
limit
|
int
|
Maximum number of seeds to retrieve. Defaults to -1 (no limit). |
-1
|
pluck
|
str | None
|
Specific field to extract from each seed object (e.g. "seed", "name_control"). Defaults to None (returns full seed objects). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
list |
list
|
A list of seeds matching the search criteria. |
update_seed_metadata(seed_id, metadata)
Update metadata for a specific seed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_id
|
str | int
|
The ID of the seed to update. |
required |
metadata
|
dict
|
The metadata to update for the seed. |
required |
Raises:
| Type | Description |
|---|---|
ValidationError
|
If the metadata structure is invalid. |