retriever

FileRetriever classes

class retriever.FileRetriever[source]

Base class for retrieving metadata from a source

async add(files: list, dids: list | None = None) dict[source]

Add the metadata for a list of files to the set.

Parameters:
  • files – list of dictionaries with file metadata

  • dids – optional list of DIDs requested, used to check for missing files

Returns:

dict of MergeFile objects that were added

abstract async connect() None[source]

Connect to the metadata source

property dupes: dict

Return the set of duplicate files from the source

property files: MergeSet

Return the set of files from the source

abstract async input_batches() AsyncGenerator[dict, None][source]

Asynchronously retrieve metadata for the next batch of files.

Returns:

dict of MergeFile objects that were added

property missing: dict

Return the set of missing files from the source

output_chunks() Generator[MergeChunk, None, None][source]

Yield chunks of files for merging.

Returns:

yields a series of MergeChunk objects

run() None[source]

Retrieve metadata for all files.

class retriever.LocalRetriever(filelist: list, meta_dirs: list | None = None)[source]

FileRetriever for local files

Initialize the LocalRetriever with a list of files and optional metadata directories.

Parameters:
  • filelist – list of input data files

  • meta_dirs – optional list of directories to search for metadata files

async connect() None[source]

No need to connect to the local filesystem, but we can do some preprocessing.

async get_metadata(file: str) dict[source]

Retrieve metadata for a single file

async input_batches() AsyncGenerator[dict, None][source]

Retrieve metadata for local files in batches