Metadata categories and samweb->metacat conversion

Samweb and metacat turn out to need quite a lot of information to describe a file.

There is

  • basic information about the file as a file - size, creation, creator, checksum

  • core information about the file contents such as run_number, data tier, and trigger stream that are needed to define data samples. Most of these are now in the core category in metacat. If it is a core item, you probably should be filling it.

  • data that is useful to know - provenance (what was it created from, how, who did it)

  • additional information that may be specific to a given physics group or extra details on reconstruction/simulation parameters.

Template for minimal metadata for a file shows a metacat example for a raw data file - sam fields are similar with core. removed

Template for minimal metadata for a Monte Carlo file shows a metacat example for a monte carlo file

Both sam and metacat put the more detailed information in objects with <type>.<subtype> format. Metacat extends this to <category>.<sub>.<sub …>.<name>

Here is a table to convert fields

metacat equivalents for sam fields

Metacat

type

Samweb

Basics

fid

int

file_id

namespace

string

name

string

file_name

creator

string

user

created_timestamp

timestamp

create_date

size

int

file_size

checksums

dictionary

check_sum (dict)

retired

bool

retired_by

string

retired_timestamp

timestamp

updated_by

string

update_user

updated_timestamp

timestamp

update_date

update_comment

blob

Core attributes

core.application.version

string

app_family

core.application.family

string

app_version

core.application.name

string

app_name

core.event_count

int

event_count

core.first_event_number

int

first_event

core.last_event_number

int

last_event

core.start_time

timestamp

start_time

core.end_time

timestamp

end_time

core.file_content_status

string

content_status

core.data_stream

string

data_stream

core.data_tier

string

data_tier

core.events

array

core.file_type

text

file_type

core.file_format

text

file_format

core.run_type

text

run_type

core.runs

array

run_number (integer part)

core.runs_subruns

array

run_number (part past the decimal point)

core.raw_timestamp

timestamp

dune_mc.*

all types

DUNE_MC.*

Retention/access keys

retention.status

string

retention.class

string

Additional attributes

<category>.<subcategory>.<name> …

string,int, bool, ….

<category>.<name>

Here is a table of common command translations

metacat equivalents for sam commands

metacat

samweb

comment

metacat query ‘files from dnamespace:dataset where x=y’

samweb list-files ‘ x y ‘

??

samweb list-files –summary ‘x y’

metacat file show -m <namespace>:<filename>

samweb get-metadata <filename>

metacat query ‘files from dune:all with name=<filename>’

finds the namespace