justIN Grid Job Submission (UNDER CONSTRUCTION)
Overview
Teaching: 65 min
Exercises: 0 minQuestions
How to submit grid jobs?
Objectives
Submit a basic batchjob and understand what’s happening behind the scenes
Monitor the job and look at its outputs
Review best practices for submitting jobs (including what NOT to do)
The video from the two day version of this training in May 2022 is provided here as a reference. –>
Once you have practiced basic justIn commands, please look at the instructions for running your own code below:
First learn the basics of Justin Submit a job
Go to The justIN Tutorial
and work up to “run some hello world jobs”
Quiz
- What is your workflow ID?
Then work through
- View your workflow on the justIN web dashboard
- Jobs with inputs and outputs
- Fetching files from Rucio managed storage
- (skip for now) Jobs using GPUs
- Jobs writing to scratch
Submit a job using the tarball containing custom code
First off, a very important point: for running analysis jobs, you may not actually need to pass an input tarball, especially if you are just using code from the base release and you don’t actually modify any of it. In that case, it is much more efficient to use everything from the release and refrain from using a tarball. All you need to do is set up any required software from CVMFS (e.g. dunetpc and/or protoduneana), and you are ready to go. If you’re just modifying a fcl file, for example, but no code, it’s actually more efficient to copy just the fcl(s) you’re changing to the scratch directory within the job, and edit them as part of your job script (copies of a fcl file in the current working directory have priority over others by default).
Sometimes, though, we need to run some custom code that isn’t in a release. We need a way to efficiently get code into jobs without overwhelming our data transfer systems. We have to make a few minor changes to the scripts you made in the previous tutorial section, generate a tarball, and invoke the proper jobsub options to get that into your job. There are many ways of doing this but by far the best is to use the Rapid Code Distribution Service (RCDS), as shown in our example.
Temporary short version of an example for custom code.
We’re working on a long version of this but please look at these instructions for running a justIN workflow using your own code for now.
Cool justIN feature
justIN has a very useful interactive test command.
Here is a test from the short submission example.
# actual submission
# tarball is in a local area on my machine (could also set to cvmfs location
export INPUT_TAR_DIR_LOCAL=$DUNEDATA
source ./job_config.sh
# these are things you need to set ahead of time to run/create metadata - see job_config.sh
export NUM_EVENTS=2
echo "DIRECTORY=$DIRECTORY"
echo "DUNE_VERSION=$DUNE_VERSION"
echo "DUNE_QUALIFIER=$DUNE_QUALIFIER"
echo "FCL_FILE=$FCL_FILE"
echo "MQL=${MQL}"
echo "APP_TAG=$APP_TAG"
echo "USERF=$USERF"
echo "NUM_EVENTS=$NUM_EVENTS"
echo "DESCRIPTION=$DESCRIPTION"
echo "INPUT_TAR_DIR_LOCAL=$INPUT_TAR_DIR_LOCAL"
echo "tardir $INPUT_TAR_DIR_LOCAL"
export HERE=$PWD
justin-test-jobscript \
--mql "$MQL" \
--jobscript submit_local_code.jobscript.sh --env PROCESS_TYPE=${PROCESS_TYPE} --env DIRECTORY=${DIRECTORY} --env INPUT_TAR_DIR_LOCAL=${INPUT_TAR_DIR_LOCAL} --env DUNE_VERSION=${DUNE_VERSION} --env DUNE_QUALIFIER=${DUNE_QUALIFIER} --env FCL_FILE=${FCL_FILE} --env NUM_EVENTS=${NUM_EVENTS} --env USERF=${USER} --env APP_TAG=${APP_TAG} --env NAMESPACE=${NAMESPACE}
it reads in a tarball from an area $DUNEDATA and writes output to a tmp area on your interactive machine. It works very well at emulating a grid job.
Did your job work?
If not please ask over at #computing-questions in Slack
Key Points
When in doubt, ask! Understand that policies and procedures that seem annoying, overly complicated, or unnecessary (especially when compared to running an interactive test) are there to ensure efficient operation and scalability. They are also often the result of someone breaking something in the past, or of simpler approaches not scaling well.
Send test jobs after creating new workflows or making changes to existing ones. If things don’t work, don’t blindly resubmit and expect things to magically work the next time.
Only copy what you need in input tar files. In particular, avoid copying log files, .git directories, temporary files, etc. from interactive areas.
Take care to follow best practices when setting up input and output file locations.
Always, always, always prestage input datasets. No exceptions.