ArchiveItAPI Reference
The main client class for interacting with the Archive-it API.
First time using pyarchiveit?
See the Getting Started guide for installation and initialization instructions.
A client for interacting with the Archive-it API.
__init__(account_name, account_password, base_url='https://partner.archive-it.org/api/', default_timeout=None)
Initialize the ArchiveItAPI client with authentication and base URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
account_name
|
str
|
The account name for authentication. |
required |
account_password
|
str
|
The account password for authentication. |
required |
base_url
|
str
|
The base URL for the API endpoints. Defaults to Archive-it API base URL. |
'https://partner.archive-it.org/api/'
|
default_timeout
|
float | None
|
Default timeout in seconds. Defaults to None. Use None for no timeout. |
None
|
get_seed_list(collection_id, limit=-1, format='json', timeout=None)
Get seeds for a given collection ID or list of collection IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_id
|
str | int | list[str | int]
|
Collection ID or list of Collection IDs. |
required |
limit
|
int
|
Maximum number of seeds to retrieve per collection. Defaults to -1 (no limit). |
-1
|
format
|
str
|
The format of the response (json or xml). Defaults to "json". |
'json'
|
timeout
|
float | None
|
Timeout in seconds for this request. Uses client default if not specified. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict]
|
list[dict]: List of seeds from all requested collections. |
Raises:
| Type | Description |
|---|---|
HTTPStatusError
|
If the API request fails. |
TimeoutException
|
If the request times out. |
update_seed_metadata(seed_id, metadata)
Update metadata for a specific seed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_id
|
str | int
|
The ID of the seed to update. |
required |
metadata
|
dict
|
The metadata to update for the seed. |
required |
create_seed(url, collection_id, crawl_definition_id, other_params=None, metadata=None)
Create a new seed in a specified collection with given crawl definition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL of the seed to create. |
required |
collection_id
|
str | int
|
The ID of the collection to add the seed to. |
required |
crawl_definition_id
|
str | int
|
The ID of the crawl definition to associate with the seed. |
required |
other_params
|
dict | None
|
Additional parameters for the seed creation. |
None
|
metadata
|
dict | None
|
Metadata to set for the seed after creation. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The created seed data returned by the API. |
delete_seed(seed_id)
Delete a seed by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seed_id
|
str | int
|
The ID of the seed to delete. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
The seed data from the API after deletion. If successful, the 'deleted' flag should be True. |