Campaigns¶
You can set up and run campaigns with multiple datasets/configurations with one setup per campaign.
Each campaign may have sub-campaigns that run different fcls over different datasets.
There is a shift operation procedure defined after the main setup and explanations.
There is also an example of this procedure in
/exp/dune/data/users/schellma/merge/hd_atmos_202604
Setup¶
To log in as a production role
ssh -l duneproshift@dunegpvmXX@fnal.gov
We suggest that for each campaign you make a subdirectory:
for example in production when you first log in.
Get into an apptainer:
/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainer shell --shell=/bin/bash \
-B /cvmfs,/exp,/nashome,/pnfs/dune,/opt,/run/user,/etc/hostname,/etc/krb5.conf --ipc --pid \
/cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest
Make an area that you use for merging projects - you can put several campaigns there.
You need to set the CAMPAIGN environmentals to set up a unique campaign directory
export TOP_MERGE_DIR=<where you want your merge stuff to go>
export CAMPAIGN=<campaign_name for a particular campaign>
mkdir $TOP_MERGE_DIR/$CAMPAIGN
cd $TOP_MERGE_DIR/$CAMPAIGN
git clone https://github.com/dune/merge-utils.git
Then make a script called setup.sh, make certain it has the code version you want and put it in
$TOP_MERGE_DIR/$CAMPAIGN
Text of setup.sh
echo "this is setup.sh for $CAMPAIGN"
export RUCIO_ACCOUNT=justin_readonly # need this to access rucio
export CAMPAIGN=<campaign name>
export DUNE_VERSION=<version>
export DUNE_QUALIFIER=<qualifier>
cd $TOP_MERGE_DIR/$CAMPAIGN/merge-utils
source setup_prod.sh
cd campaigns
mkdir -p $CAMPAIGN
source setup_campaign.sh $CAMPAIGN
cd $CAMPAIGN
You need to run setup.sh every time you log in.
ie, after the apptainer command
export TOP_MERGE_DIR=$HOME/merge # change to the location you want
source $TOP_MERGE_DIR/$CAMPAIGN/setup.sh
this will set up merge-utils and put you into the subdirectory where you can configure your campaign.
The full directory path to your $CAMPAIGN configuration will be in $CAMPAIGN_DIR
In that directory you need to make a csv file with the same name as the directory. $CAMPAIGN.csv that stores tagged rows for each dataset you want to run over.
TAG,FCL,CONFIG,CAMPAIGN,NAMESPACE,BATCH,DATASET
TAGis a unique tag for this run of the merger - if you putTESTin the tag, jobs will be set up to run interactively.FCLis the name of the fcl file you want to use. A single campaign can have differenc fcls.CONFIGis the merge yaml or json override of defaults. Should have the same name as the fcl file if using lar. You may have to edit it to make certain that thecfgoption points to the correct fcl and themetadatafield contains the correcdune.campaignfield. Thebuild_jobsscript will complain if these are not set consistently.CAMPAIGNis the campaign - same as directory and cvs vfileNAMESPACEis the output namespace (default is ‘usertests’, for production you need to change it to the right namespace.)BATCHis how many input files are sent to the merger at once - for large production batches of 2000-5000 are good.DATASETthe metacat dataset you want to run over. Generally should be official
production scripts¶
The production scripts are in src/prod_utils
when you run the setup_campaign.sh script (called in the overall setup script) it should be added to your path.
build_jobs.pythis takes the original csv file, figures out how many files you will be running over for each sub-campaign and produces$CAMPAIGN_jobs.csvand$CAMPAIGN_checklist.csv. It checks that you are set up with the right larsoft version, campaign name and that your config files contain the right fcl file names.
You generally only have to do this once if you don’t change the $CAMPAIGN.csv but if you have to rerun a sub-campaign you may decided to redo it.
You copy the $CAMPAIGN_checklist.csv into a google doc and use it to keep track of your submissions.
When you are ready to run some jobs:
make_pass1.py <tag>makes a script that submits the pass1 jobs for<tag>If you run it without a tag, it lists the available tags.
Shifter instructions¶
Make your campaign directory
Run the setup procedure described in detail above and setup up a campaign directory under merge-utils/campaigns
Once that is done, every time you login, get an apptainer, set TOP_MERGE_DIR, go to $TOP_MERGE_DIR/$CAMPAIGN and run setup.sh and you should be ready to go.
Set up your base csv file
$CAMPAIGN.csv. Each row represents a sub-campaign which can have different fcl, yaml and datasets but not different code versions. Each sub-campaign needs a unique tag. The yaml file needs to contain the correct fcl file. Rows can share yaml files if you are running the same config on different datasets.
Each campaign directory should containe a unique $CAMPAIGN.csv with the same suffix as the directory name.
Reminder from above - fields are:
TAG,DUNESW,FCL,CONFIG,CAMPAIGN,NAMESPACE,BATCH,DATASET
use
build_jobsto build the$CAMPAIGN_jobs.csvand$CAMPAIGN_checklist.csv
python -m build_jobs
makes $CAMPAIGN_jobs.csv and $CAMPAIGN_checklist.csv
It will complain if you have not set up the right DUNESW version or if your yaml files are inconsistent with the fcl and campaign names.
When you are ready to run jobs you can then use this to generate sub-campaigns for each
fclfile/dataset
python -m make_pass1 <tag>
will make a long list of potential merge submissions, each of which has length --limit and skips by --skip
those merge submission commands are stored in <TAG.sh> so you can use them later.
This is what they look like:
merge -l -vv -c /Users/schellma/Dropbox/merge-utils/campaigns/trigprim-2026-03/triggerana_tree_1x2x2_simpleThr_production.yaml --skip=0 --limit=2 --tag="TEST-TRGSIM_CC_v1" dataset
produces:
fardet-hd:fardet-hd__trg_mc_2025a__detector-simulated__v10_06_00d01__detsim_dune10kt_1x2x2_notpcsigproc__prodmarley_nue_flat_cc_dune10kt_1x2x2__out1__v1_official > TEST-TRGSIM_CC_v1_20260321T161849_0.log 2>&1
-lmeans run locally (remove to run as batch)--retrythis checks every single file to see if it is a duplicate. Not done by default-vvmeans run verbose-cpoints to the config gile--skipskips n input files--limitlimits the # of files--tagshould be a unique tag for this workflow - you use this to check for duplicates and make the final merged file datasetdataset(could also bequeryorlist) specifies what you will run over
I have set this up to write to log files.
run some of those merge scripts. You likely have to get a new token
justin time
justin get-token
the merge scripts will check your metadata for validity and set up jobscripts to run locally or through justIn.
The last 2-3 lines of the merge scripts have the actual submission commands which look like:
interactive ‘/exp/dune/data/users/schellma/prod/merge-utils/tmp/TEST_hd_atmos_l000002_20260408T235613/run.sh’
batch
/exp/dune/data/users/schellma/prod/merge-utils/tmp/TEST_hd_atmos_v4_l000500_20260417T174112/submit.sh/exp/dune/data/users/schellma/prod/merge-utils/tmp/TEST_hd_atmos_v4_s000500_l000500_20260417T174145/submit.sh
note the timestamp 20260417T174145, you will need to use it.
You can pull them out by doing
grep submit *<timestamp>*.log
and then issue those commands and record the workflow numbers that come back from each one.
track your jobs. Each campaign has a spreadsheet `$CAMPAIGN_checklist.csv$ which was generated when you run build_jobs
upload the partially filled spreadsheet into google sheets. It should have a line for every workflow you submitted.
Fill in
TAG - the sub-campaign, already filled
SKIP - already filled, index within the group of sub-campaigns - you can get it from the submit command:
TEST_hd_atmos_v4_s000500_l000500_20260417T174145/submit.sh
TIMESTAMP - the timestamp for the sumbit commands - from the submit command:
TEST_hd_atmos_v4_s000500_l000500_20260417T174145/submit.sh
PASS - pass1 or 2
WORKFLOW iD - returned when you submit
# of jobs - look in justin workflow to see # of jobs
status - once jobs runs
#event/job
#files
success fraction
total # of events
volume (GB)
comments
DUNESW - already filled
NFILES - already filled
SIZE_GB - already filled
BATCH - already filled
FCL - already filled
CONFIG - already filled
CAMPAIGN - already filled
NAMESPACE - already filled
DATASET - already filled
What if I made a mistake or things went very badly.
You can track your jobs at the justin workflow page
You can check pass1 by using:
python -m pass1_check <tag>
You should see something like this if you check midway.
python -m pass1_check TEST_hd_atmos_v3
nfiles 980
pass1 this tag had 600 parents and 63500 events, spread across 62 pass1 files
600 980
ERROR: final number of files 600 is not = the input 980
if a workflow got paused, you can
justin restart-workflow workflow-id=<workflow-id>
if it still doesn’t complete properly, wait a few hours and do a rerun of
make_pass1for the workflows that did not finish.
The scripts have a –retry option that will ignore files that are already processed.
if all else fails, you can increment the tag version and rerun the whole sub-campaign again. Generally it is a better idea to copy the relevant lines in the jobs and checklist files and increment the version rather than replace the tag.