Skip to content

ArchiveItAPI Reference

The main client class for interacting with the Archive-it API.

First time using pyarchiveit?

See the Getting Started guide for installation and initialization instructions.

A client for interacting with the Archive-it API.

__init__(account_name, account_password, base_url='https://partner.archive-it.org/api/', default_timeout=None)

Parameters:

Name Type Description Default
account_name str

The account name for authentication.

required
account_password str

The account password for authentication.

required
base_url str

The base URL for the API endpoints. Defaults to Archive-it API base URL.

'https://partner.archive-it.org/api/'
default_timeout float | None

Default timeout in seconds. Defaults to None. Use None for no timeout.

None

create_seed(url, collection_id, crawl_definition_id, other_params=None, metadata=None)

Create a new seed in a specified collection with given crawl definition.

Parameters:

Name Type Description Default
url str

The URL of the seed to create.

required
collection_id str | int

The ID of the collection to add the seed to.

required
crawl_definition_id str | int

The ID of the crawl definition to associate with the seed.

required
other_params dict | None

Additional parameters for the seed creation.

None
metadata dict | None

Metadata to set for the seed after creation.

None

Returns:

Name Type Description
dict dict

The validated created seed data returned by the API.

Raises:

Type Description
ValidationError

If the input data or metadata structure is invalid.

delete_seed(seed_id)

Delete a seed by its ID.

Parameters:

Name Type Description Default
seed_id str | int

The ID of the seed to delete.

required

Returns:

Name Type Description
dict dict

The validated seed data from the API after deletion. The 'deleted' flag should be True.

Raises:

Type Description
ValidationError

If the API returns invalid seed data.

get_seed_by_id(seed_id)

Get a seed by its ID.

Parameters:

Name Type Description Default
seed_id str | int

The ID of the seed to retrieve.

required

Returns:

Name Type Description
dict dict

The validated seed data returned by the API.

Raises:

Type Description
HTTPStatusError

If the API request fails.

TimeoutException

If the request times out.

ValidationError

If the API returns invalid seed data.

get_seed_list(collection_id, limit=-1, sort=None, pluck=None, format='json', additional_query=None)

Get seeds for a given collection ID or list of collection IDs.

Parameters:

Name Type Description Default
collection_id str | int | list[str | int]

Collection ID or list of Collection IDs.

required
limit int

Maximum number of seeds to retrieve per collection. Defaults to -1 (no limit).

-1
sort str | None

Sort order based on the result. Negative values (-) indicate ascending order. Defaults to None.

See the available fields in the API documentation (Data Models > Seed).

Example values: "id", "-id", "last_updated_date", "-last_updated_date".

None
pluck str | None

Specific field to extract from each seed object (e.g. "url", "id" ). Defaults to None (returns full seed objects).

None
format str

The format of the response (json or xml). Defaults to "json".

'json'
additional_query dict

Additional query parameters to include in the request.

Can either be a string or list. A list means to query for multiple values for that parameter (OR statement).

Format: {"param_name": } e.g. {"last_updated_by": "PersonA"} or {"last_updated_by": ["PersonA", "PersonB"]}.

None

Returns:

Type Description
list

list[SeedKeys] | list: If pluck is None, returns list of validated seed objects. If pluck is specified, returns list of the plucked field values.

Raises:

Type Description
HTTPStatusError

If the API request fails.

TimeoutException

If the request times out.

ValidationError

If the API returns invalid seed data.

ValueError

If the sort parameter is invalid.

get_seed_with_metadata(metadata_field=None, metadata_value=None, limit=-1, pluck=None)

Get seeds that match a specific metadata field and value.

Parameters:

Name Type Description Default
metadata_field str | None

The metadata field to search (e.g., "Title", "Author").

None
metadata_value str | None

The value to search for within the specified metadata field.

None
limit int

Maximum number of seeds to retrieve. Defaults to -1 (no limit).

-1
pluck str | None

Specific field to extract from each seed object (e.g. "collection"). Defaults to None (returns full seed objects).

None

search_seed_metadata(metadata_field=None, metadata_value=None, limit=-1, pluck=None)

Search seeds by metadata field and value.

Note

It is not necessary to search with the metadata_field to search for the value. If you just want to look up a value across all metadata fields, simply pass the value to metadata_value and leave metadata_field as None.

Parameters:

Name Type Description Default
metadata_field str | list | None

The metadata field to search (e.g., "Title", "Author"). If a list is provided, searches within any of the fields.

None
metadata_value str | list | None

The value to search for within the specified metadata field. If a list is provided, searches for any of the values.

None
limit int

Maximum number of seeds to retrieve. Defaults to -1 (no limit).

-1
pluck str | None

Specific field to extract from each seed object (e.g. "seed", "name_control"). Defaults to None (returns full seed objects).

None

Returns:

Name Type Description
list list

A list of seeds matching the search criteria.

update_seed_metadata(seed_id, metadata)

Update metadata for a specific seed.

Parameters:

Name Type Description Default
seed_id str | int

The ID of the seed to update.

required
metadata dict

The metadata to update for the seed.

required

Raises:

Type Description
ValidationError

If the metadata structure is invalid.