Campaigns¶
You can set up and run campaigns with multiple datasets/configurations using the campaign folders
cd $MERGE_UTILS_DIR
cd campaigns
mkdir <campaign_name>
source setup_campaign.sh <campaign_name>
cd <campaign_name>
the directory name will be stored in $CAMPAIGN and the full directory path will be in $CAMPAIGN_DIR
In that directory you need to make a csv file $CAMPAIGN.csv that stores tagged rows for each dataset you want to run over.
TAG,CONFIG,CAMPAIGN,NAMESPACE,BATCH,DATASET
TAGis a unique tag for this run - if you putTESTin the tag, jobs will run interactively.CONFIGis the merge yaml or json override of defaults. Should have the same name as the fcl file if using lar.CAMPAIGNis the campaignNAMESPACEis the output namespace (‘usertests’)BATCHis how many input files are sent to the merger at once - for production batches of 2000-5000 are good.DATASETthe metacat dataset you want to run over. Generally should be official
production scripts¶
The production scripts are in src/prod_utils
when you run the setup_campaign.sh script it should be added to your path.
build_jobs.pythis takes the original csv file, figures out how many files you will be running over and produces<jobs csv>make_pass1.py <tag>makes a script that submits the pass1 jobs for<tag>
utilities¶
get_tasks.pyinternal utility that scripts use to match tag with taskspass1_check.pypass1_summary.pyworkflow_check.pymake_safe_query.py
sequence of actions¶
Make your campaign directory
internally it does adds the production utils to your path
export PYTHONPATH=$MERGE_UTILS_DIR/src/prod_utils:$PYTHONPATH
Set up your base csv file
$CAMPAIGN.csvbased on examples. Each row should reference a yaml file which contains the correct fcl file. Rows can share yaml files if you are running the same config on different datasets.use it to build the
by doing metacat queries.
Each campaign directory should have a unique csv with the same suffix.
python -m build_jobs
makes $CAMPAIGN_jobs.csv
you can then use this to generate sub-campaigns for each
fclfile
python -m make_pass1 <tag>
will make a long list of potential merge submissions, each of which has length --limit and skips by --skip
those merge submission commands are stored in <TAG.sh> so you can use them later.
This is what they look like:
merge -l -vv -c /Users/schellma/Dropbox/merge-utils/campaigns/trigprim-2026-03/triggerana_tree_1x2x2_simpleThr_production.yaml --skip=0 --limit=2 --tag="TEST-TRGSIM_CC_v1" dataset fardet-hd:fardet-hd__trg_mc_2025a__detector-simulated__v10_06_00d01__detsim_dune10kt_1x2x2_notpcsigproc__prodmarley_nue_flat_cc_dune10kt_1x2x2__out1__v1_official > TEST-TRGSIM_CC_v1_20260321T161849_0.log 2>&1
-lmeans run locally (remove to run as batch)--retrythis checks every single file to see if it is a duplicate. Not done by default-vvmeans run verbose-cpoints to the config gile--skipskips n input files--limitlimits the # of files--tagshould be a unique tag for this workflow - you use this to check for duplicates and make the final merged file datasetdataset(could also bequeryorlist) specifies what you will run over
Have set this up to write to log files.
run some of those merge scripts.
the merge scripts will check your metadata for validity and set up jobscripts to run locally or through justIn.
The last 2-3 lines of the merge scripts have the actual submission commands.