retriever

FileRetriever classes

class retriever.MetaRetriever[source]

Base class for retrieving metadata from a source

async add(files: list, dids: list | None = None) dict[source]

Add the metadata for a list of files to the set.

Parameters:
  • files – list of dictionaries with file metadata

  • dids – optional list of DIDs requested, used to check for missing files

Returns:

dict of MergeFile objects that were added

async connect() None[source]

Connect to the metadata source

property dupes: dict

Return the set of duplicate files from the source

property files: MergeSet

Return the set of files from the source

abstract async input_batches() AsyncGenerator[dict, None][source]

Asynchronously retrieve metadata for the next batch of files.

Returns:

dict of MergeFile objects that were added

property missing: dict

Return the set of missing files from the source

output_chunks() Generator[MergeChunk, None, None][source]

Yield chunks of files for merging.

Returns:

yields a series of MergeChunk objects

run() None[source]

Retrieve metadata for all files.

class retriever.PathFinder(meta: MetaRetriever)[source]

Base class for finding paths to files

async connect() None[source]

Connect to the file source

property files: MergeSet

Return the set of files from the source

async input_batches() AsyncGenerator[dict, None][source]

Asynchronously retrieve paths for the next batch of files.

Returns:

dict of MergeFile objects that were added

abstract async process(files: dict) None[source]

Process a batch of files to find their physical locations.

Parameters:

files – dictionary of files to process