Metacat Query examples¶
This document includes examples of metacat queries
Example: Get the raw data from given protodune-sp detector runs¶
metacat
metacat query "files from dune:all where core.file_type=detector \ and core.run_type='protodune-sp' and core.data_tier=raw \ and core.data_stream=physics and core.runs[any] in (5141,5143)"
add –summary or -s after query if you want just the # of files
Notes:
things run faster if you ask for files from a known dataset like `dune:all`
core.runs[any] means check any of the runs associated with the file for being 5141
core.runs[any] in (5141, 5142, 5147) - any of these 3 runs
core.runs[any] = 5141- single run, equivalent: 5141 in core.runs
5141 in core.runs also works
you can ask for multiple runs by using the `in (X,Y)` syntax
Example: Save a dataset or definition query¶
If you are interested in everything physics from protodune-sp, you might want to save a generic dataset or query which you can then reuse in further filtered queries. Then as you narrow thing down you can build additional datasets.
metacat
To run a MQL query and create a new dataset with the query results:
metacat dataset create -f "files from dune:all where \ ..." <dataset_namespace>:<dataset_name>
metacat dataset create -f @file_with_mql_query.txt \ <dataset_namespace>:<dataset_name> <dataset description>
You likely need to ask for your own namespace or use namespace usertests.
To run a query and add matching files to an existing dataset:
metacat dataset add-files -q "files from dune:all where ..." <dataset_namespace>:<dataset_name> metacat dataset add-files -q @file_with_mql_query.txt <dataset_namespace>:<dataset_name>
check it by querying the files in the dataset
metacat query -s "files from schellma:protodune-sp-physics-generic" metacat dataset show schellma:protodune-sp-physics-generic children : created_timestamp : 2022-10-08 11:41:54 creator : schellma description : files from dune:all where core.file_type=detector and core.run_type='protodune-sp' and core.data_stream=physics file_count : 772631 file_meta_requirements : {} frozen : False metadata : {} monotonic : False name : protodune-sp-physics-generic namespace : schellma parents :
You can then ask for the subset from a particular data tier and run number.
metacat query "files from schellma:protodune-sp-physics-generic \ where core.runs[all]=5141 and core.data_tier=raw"
Find only the files not processed with a version of code¶
metacat
metacat query -s "files from schellma:protodune-sp-physics-generic \ where core.data_tier=raw and 5141 in core.runs - parents(files \ from schellma:protodune-sp-physics-generic where 5141 in core.runs \ and core.data_tier='full-reconstructed' and core.application.version~'v08_27_.*')" 12 files