PLEASE USE THE NEW JUSTIN SYSTEM INSTEAD OF POMS
The JustIn Tutorial is currently in docdb at: JustIn Tutorial
The JustIn system is described in detail at:
Note More documentation coming soon
justIN
- is the new workflow system replacing POMS
- It can be used to process several input files by submitting batch jobs on the grid
- justIN is a workflow system that processes data by satisfying the requirements of data location/data catalog, rapid code distribution service and job submission to the grid.
justIN ties together:
-
MetaCat search queries that obtain lists of files to process
-
Rucio knowledge of where replicas of files are
-
a table of site-to-storage distances to make best choices about where to run each type of job
To process data using justIN:
You need to provide a jobscript(shell script) with some basic tasks:
-
Setup software environment
-
Use rucio to ind where the data is
-
Process the data
-
Save the output in a defined location
##
justin simple-workflow <args...>
once you run the command, you get the workflow ID
.
In case of any problem, you can stop your workflow by running
finish-workflow --workflow-id <ID>
Next topics:
-
Understand how a jobscriptis structured
-
Process data using standard code
-
Process data using customized fclfiles and/or customized code
-
Select the input dataset
-
Specify where your output should go (jobs writing to scratch)
Examples of jobscripts are provided in the GitHub production repository.
A jobscripts checklist is available in the backup
Two general remarks:
Note ALWAYS test code and jobscriptbefore sending jobs to the grid
For any large processing (MC or DATA) producing large output that has to be shared within the Collaboration, please contact the production group.
Things you can do
-
Process data (submit a job to the grid) if you are using code from the base release and you don’t actually modify any of it
-
Once you have identified what data you want to process, you can see the most recent data (official data) sets available at:
https://wiki.dunescience.org/wiki/Data_Collections_Manager/data_sets
Example: Let’s say you want to run mergeanafor electron neutrinos,
First: Where is the data?
- In DUNE we provided datasets to easily identify a collection of files
for example:
fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nue_dune10kt_1x2x6__out1__validation
Dataset names tend to be self explanatory and includes the type of detector, which fcl files were used to produce it, the software version, data tier, and a tag, in this case, the tag is validation
.
-
Lets try to process mergeanain the first 100 files that in the data sets,
-
MetaCat relies on Metacat Query Language (MQL) queries to select a collection of files. In this case to select the first 100 files of a given data set. The query would be something like:
"files from fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nue_dune10kt_1x2x6__out1__validation ordered limit 100 "
- The flag ‘ordered’ is crucial to ensure reproducibility
example jobscript
https://github.com/DUNE/dune-prod-utils/blob/main/justIN-examples/submit_ana.jobscript
# fcl file and DUNE software version/qualifier to be used
FCL_FILE=${FCL_FILE:-standard_ana_dune10kt_1x2x6.fcl}
DUNE_VERSION=${DUNE_VERSION:-v09_81_00d02}
DUNE_QUALIFIER=${DUNE_QUALIFIER:-e26:prof}
a bit further down
# Setup DUNE environment
source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
setup dunesw "$DUNE_VERSION" -q "$DUNE_QUALIFIER"
and here is how you do the actual processing:
# Here is where the LArSoft command is called
(
# Do the scary preload stuff in a subshell!
export LD_PRELOAD=${XROOTD_LIB}/libXrdPosixPreload.so
echo "$LD_PRELOAD"
lar -c $FCL_FILE $events_option -o $outFile "$pfn" > ${fname}_ana_${now}.log 2>&1
)
The scary preload is to allow xroot
to read hdf5
files.
‘Process data (submit a job to the grid) if you are just using code from the base release and you don’t actually modify any of it
$ USERF=$USER $ FNALURL=’https://fndcadoor.fnal.gov:2880/dune/scratch/users’ $ justinsimple-workflow –mql”files from fardet-hd:fardet-hd__fd_mc_2023a_reco2_full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nu_dune10kt_1x2x6__out1__validation skip 5 limit 5 ordered “ –jobscriptsubmit_ana.jobscript–rss-mb 4000 –output-pattern ‘*_ana*.root:$FNALURL/$USERF” ‘You can look at your job status by using justIN dashboard https://justin.dune.hep.ac.uk/dashboard/?method=list-workflows
Custom fcl file
-
Process data (submit a job to the grid) if you are using code from the base release and you want to use a customized FCL file
-
To do that, the best is to use the Rapid Code Distribution Service (RCDS) via cvmfs as explained in the tutorial
-
Let’s say you have a customized FCL file that you need to run over some datasets. As per instruction from the DUNE justINtutorial you need to tar the files needed and put them in cvmfs.
$ tar cvzmy_fcls.tar my_fcls
$ source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
$ setup justin
$ rm -f /tmp/x509up_u`id -u`
$ kx509
$INPUT_TAR_DIR_LOCAL=`justin-cvmfs-upload my_fcls.tar`
Wait a few minutes to check the files
$ ls -l $INPUT_TAR_DIR_LOCAL
You can look at the example at https://github.com/DUNE/dune- prod-utils/blob/main/justIN-examples/submit_local_fcl.jobscript
‘The key part of the code is the following
justin simple-workflow --mql "files from fardet-hd:fardet-hd__fd_mc_2023a_reco2__full-reconstructed__v09_81_00d02__standard_reco2_dune10kt_nu_1x2x6__prodgenie_nu_dune10kt_1x2x6__out1__validation skip 5 limit 5 ordered ' --jobscript submit_local_fcl.jobscript --rss-mb 4000 --env INPUT_TAR_DIR_LOCAL="$INPUT_TAR_DIR_LOCAL"
Things you can do
Image
‘Process data (submit a job to the grid) if you are NOT using code from the base release and you want to use customized code
‘Probably you are developing some reconstruction algand you want to check the results in a large sample, before committing your software to GitHub
‘You can use your customized software (e.g. local installation of dunereco) and use justINto process the data with your new LArSoftmodule
‘Similar to the previous part, you will need to provide all pieces in a tar file and put them in cvmfs
$ tar cvz my_code.tar my_code ‘Here my_code.tar includes a directory with my_fcls files and one with my local products (e.g. local Products_larsoft_v09_85_00_e26_prof) this is similar to what you used to do when using jobsub and using customized code/
Things you can do
how to ‘navigate’ in justINdashboard. Example: you want to check outputs/logs for jobs from workflow 1850
To access full statistics: -sites where jobs ran -storage used for input/output
To access details of each job (see next page)
To access log files
For each file, you see where it was processed and which RucioStorage Element it came from.
How it looks like if there are failed jobs
To list storage elements (where data can be)
backup
How to setup MetaCat, Rucioand justIN(on dunegpvm)
first run:
/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainershell –shell=/bin/bash -B /cvmfs,/exp,/nashome,/pnfs/dune,/opt,/run/user,/etc/hostname,/etc/hosts,/etc/krb5.conf –ipc–pid/cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest
Then:
source /cvmfs/dune.opensciencegrid.org/products/dune/setup_dune.sh
setup python v3_9_15
setup rucio
kx509
export RUCIO_ACCOUNT= $USER
export METACAT_SERVER_URL=https://metacat.fnal.gov:9443/dune_meta_prod/app
export METACAT_AUTH_SERVER_URL=https://metacat.fnal.gov:8143/auth/dune
setup metacat
setup justin
justinversion
rm -f /var/tmp/justin.session.id-u
justintime
Links MetacatWEB interface: https://metacat.fnal.gov:9443/dune_meta_prod/app/auth/login
justIN: https://justin.dune.hep.ac.uk/docs/
Slack channels: #workflow