Metadata categories and samweb->metacat conversion¶
Samweb and metacat turn out to need quite a lot of information to describe a file.
There is
basic information about the file as a file - size, creation, creator, checksum
core information about the file contents such as run_number, data tier, and trigger stream that are needed to define data samples. Most of these are now in the core category in metacat. If it is a core item, you probably should be filling it.
data that is useful to know - provenance (what was it created from, how, who did it)
additional information that may be specific to a given physics group or extra details on reconstruction/simulation parameters.
Template for minimal metadata for a file shows a metacat example for a raw data file - sam fields are similar with core. removed
Template for minimal metadata for a Monte Carlo file shows a metacat example for a monte carlo file
Both sam and metacat put the more detailed information in objects with <type>.<subtype> format. Metacat extends this to <category>.<sub>.<sub …>.<name>
Here is a table to convert fields¶
Metacat |
type |
Samweb |
---|---|---|
Basics |
||
fid |
int |
file_id |
namespace |
string |
|
name |
string |
file_name |
creator |
string |
user |
created_timestamp |
timestamp |
create_date |
size |
int |
file_size |
checksums |
dictionary |
check_sum (dict) |
retired |
bool |
|
retired_by |
string |
|
retired_timestamp |
timestamp |
|
updated_by |
string |
update_user |
updated_timestamp |
timestamp |
update_date |
update_comment |
blob |
|
Core attributes |
||
core.application.version |
string |
app_family |
core.application.family |
string |
app_version |
core.application.name |
string |
app_name |
core.event_count |
int |
event_count |
core.first_event_number |
int |
first_event |
core.last_event_number |
int |
last_event |
core.start_time |
timestamp |
start_time |
core.end_time |
timestamp |
end_time |
core.file_content_status |
string |
content_status |
core.data_stream |
string |
data_stream |
core.data_tier |
string |
data_tier |
core.events |
array |
|
core.file_type |
text |
file_type |
core.file_format |
text |
file_format |
core.run_type |
text |
run_type |
core.runs |
array |
run_number (integer part) |
core.runs_subruns |
array |
run_number (part past the decimal point) |
core.raw_timestamp |
timestamp |
|
dune_mc.* |
all types |
DUNE_MC.* |
Retention/access keys |
||
retention.status |
string |
|
retention.class |
string |
|
Additional attributes |
||
<category>.<subcategory>.<name> … |
string,int, bool, …. |
<category>.<name> |
Here is a table of common command translations¶
metacat |
samweb |
comment |
---|---|---|
metacat query ‘files from dnamespace:dataset where x=y’ |
samweb list-files ‘ x y ‘ |
|
?? |
samweb list-files –summary ‘x y’ |
|
metacat file show -m <namespace>:<filename> |
samweb get-metadata <filename> |
|
metacat query ‘files from dune:all with name=<filename>’ |
finds the namespace |