This lesson is being piloted (Beta version)

DUNE Computing Training December 2021 edition: Glossary

Key Points

Workshop Welcome and Introduction
  • This workshop is brought to you by the DUNE Computing Consortium.

  • The goals are to give you the computing basis to work on DUNE.

Storage Spaces
  • Home directories are centrally managed by Computing Division and meant to store setup scripts and text files.

  • Home directories are NOT for storage of certificates or tokens.

  • Network attached storage (NAS) /dune/app is primarily for code development.

  • The NAS /dune/data is for store ntuples and small datasets.

  • dCache volumes (tape, resilient, scratch, persistent) offer large storage with various retention lifetime.

  • The tool suites idfh and XRootD allow for accessing data with appropriate transfer method and in a scalable way.

Data Management
  • SAM and Rucio are data handling systems used by the DUNE collaboration to retrieve data.

  • Staging is a necessary step to make sure files are on disk in dCache (as opposed to only on tape).

  • Xrootd allows user to stream data file.

  • The Unix Product Setup (UPS) is a tool to ensure consistency between different software versions and reproducibility.

  • The multi-repository build (mrb) tool allows code modification in multiple repositories, which is relevant for a large project like LArSoft with different cases (end user and developers) demanding consistency between the builds.

  • CVMFS distributes software and related files without installing them on the target computer (using a VM, Virtual Machine).

SAM by Schellman
  • SAM is a data catalog originally designed for the D0 and CDF experiments at FNAL and is now used widely by HEP experiments.

Quiz on Storage Spaces and Data Management
  • Practice makes perfect.

Grid Job Submission and Common Errors
  • When in doubt, ask! Understand that policies and procedures that seem annoying, overly complicated, or unnecessary (especially when compared to running an interactive test) are there to ensure efficient operation and scalability. They are also often the result of someone breaking something in the past, or of simpler approaches not scaling well.

  • Send test jobs after creating new workflows or making changes to existing ones. If things don’t work, don’t blindly resubmit and expect things to magically work the next time.

  • Only copy what you need in input tar files. In particular, avoid copying log files, .git directories, temporary files, etc. from interactive areas.

  • Take care to follow best practices when setting up input and output file locations.

  • Always, always, always prestage input datasets. No exceptions.

Quiz on Grid Job Submission
  • Practice makes perfect.

Code-makeover - Submit with POMS
  • Always, always, always prestage input datasets. No exceptions.

Closing Remarks
  • The DUNE Computing Consortium has presented this workshop so as to broaden the use of software tools used for analysis.

Glossary

FIXME