This lesson is still being designed and assembled (Pre-Alpha version)

Conditions Database Tutorial

Introduction To The Conditions Database

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What is the conditions database?

  • What kind of data is stored in this database?

Objectives
  • Learn the basic structure and what information is stored on the conditions database

The Conditions Database

HEP experiments take bast amounts of data with the main detectors, but they also have a lot of data, also known as metadata, coming from other sources. This metadata describes the data coming from the read-out of the primary detectors.

Conditions data

The conditions data is a subset of the experimental metadata, specifically referring to the metadata required for offline data analysis and reconstruction. The conditions data of ProtoDUNE is stored in a dedicated database, the Conditions Database.

The conditions database consists of two PostgreSQL databases.

  1. The master store of metadata (UConDB) database is the centralized place where all the information is stored as blobs. Adding new information is straightforward, and there is no need to have a predefined database schema.
  2. The run conditions database stores a subset of the UConDB metadata in a table. This facilitates querying the metadata with conditions on the table values, for example to get all the runs with certain characteristics, like runs with High Voltage = 175 kV.

Conditions database architecture

An illustration of both ProtoDUNE’s metadata stream and the design of the conditions database. The direction of the metadata stream is shown by the arrows. Users may obtain the metadata by interacting with the conditions database APIs.

The conditions database has the following characteristics:

The master store of metadata (UConDB) database

ProtoDUNE UConDB has a collection of folders which in turn can have a collection of objects. Each object has a unique key that should be used to retrieve the information. The key can be the run number or a timestamp.

The ucondb will store all ProtoDUNE metadata in the folder protodune_conditions. Each kind of metadata will have it’s own object, for example:

Getting started

There are different ways to access and upload/download the contents of the ucondb.

A REST API was created by Igor V Mandrichenko. It also contains a command line interface. Instructions on how to install it and how to use it can be found in the UConDB documentation.

The UConDB URL server must be provided, for ProtoDUNE data use:

export UCONDB_SERVER_URL='https://dbdata0vm.fnal.gov:9443/protodune_ucon_prod/app'

The ProtoDUNE folder should also be provided which is: protodune_conditions

Quick Access

Alternatively to the REST API, the curl command can be used to retrieve the data:

curl -o output.file "{UCONDB_SERVER_URL}/get?folder={folder_name}&object={object_name}&tv={key}"

where {UCONDB_SERVER_URL}, {folder_name}, {object_name}, and {key} must be replaced with the desired information. An example that retrieves data from the folder named test, the object named test, usesing the run key of 12008 is provided below.

curl -o output.file "https://dbdata0vm.fnal.gov:9443/protodune_ucon_prod/app/get?folder=test&object=test&tv=12008"

For ProtoDUNE data the folder name protodune_conditions should be used.

Remember

ProtoDUNE data is stored in the folder: protodune_conditions

Upload data to the database

UConDB implements strong client authentication for all requests which modify the state of the database. A username and a password must be used to upload content to the DB. Contact Ana Paula Vizcaya or Norm Buchanan from the database group to get them.

The command line interface of the python web API can be used to upload data, as explained in it’s documentation webpage.

Alternatively, the curl command can also be used:

curl -T data.file --digest -u user:password -X PUT "{UCONDB_SERVER_URL}/data/{folder_name}/{object_name}/key={key}"

where {UCONDB_SERVER_URL}, {folder_name}, {object_name}, and {key} must be replaced with the desired information. The key can be a float, usually corresponding to the run number or timestamp.

Key Points

  • There are several APIs available to access the information in the conditions database. Which to use depends on your specific needs


Run Conditions Database (Condb2)

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What is the Run Conditions Database (Ucondb2)?

  • How to access it?

Objectives
  • Learn how to upload and download data from the Run Conditions Database

The Run Conditions Database (Condb2)

Is a PostgreSQL relational database that records and keeps track of the conditions parameters of ProtoDUNE. It is based on the conditions database at FNAL

It stores a subset of the ProtoDUNE metadata, which is kept organized in tables. Some examples are:

Quick look at the data

The Conditions Database web interface shows all the table names, of the conditions database. There, it is also possible to plot some values of certain tables as shown in the following picture.

Example plot in conditions database web interface

Alternatively to the web interface, the curl command can be used to quickly access the data in the tables

curl "{CONDB_SERVER_URL}/get?folder={folder_name}&t={key}"

where {CONDB_SERVER_URL}, {folder_name}, and {key} must be replaced with the desired information. The folder_name must be replaced with the format schema.table_name (like: pdunesp.test), and the key is usually a run number or t >= 0. ProtoDUNE tables are located in the schema pdunesp. An example that retrieves data from the table pdunesp.test to get info from run 23300 is provided below.

curl "https://dbdata0vm.fnal.gov:9443/dune_runcon_prod/get?folder=pdunesp.test&t=23300"

Getting started with the condb2 api

There are two methos for installing the condb2 python API used to interact with the database.

  1. Follow the instructions on their wepabge condb2
    • Where the ConDB URL server for ProtoDUNE data is:
      CONDB_SERVER_URL='https://dbdata0vm.fnal.gov:9443/dune_runcon_prod'
      
  2. Intall it using Spack
    • On a dunegpvm machine on Alma9 run the following comands:
       source /cvmfs/larsoft.opensciencegrid.org/spack-packages/setup-env.sh
       spack load py-condb2@2.1.4
       condb2 
      

      and it should give the following output

      condb 
        create [options] <database name> <folder_name> <column>:<type> [...]
        write  [options] <database name> <folder_name> < <CSV file>
        read   [options] <database name> <folder_name>
        put    [options] <folder_name> < <CSV file>
        get    [options] <folder_name>
        tag    [options] <folder_name> <tag name>
    

How to upload data

Alert

Condb2 implements strong client authentication for all requests which modify the state of the database. A username and a password must be provided to create a table and upload content to the DB. Contact Ana Paula Vizcaya or Norm Buchanan to get them.

Depending on what information the user wants to store, there are different ways to create and fill out the tables.

Creating tables

When creating a table, the condb2 API will automatically create the following columns:

  1. channel number used to give the table another dimension.
  2. tv key of the table, its data type is a float. There are different possibilities of how to use it and I will go into more detail later on, but they usually are: run numbers, timestamp of when the row was uploaded, or the same value for all rows.
  3. tr timestamp in unix time of when the column data was uploaded, this is used to create the tags, or versions pointers of the tables
  4. data_type optional key of the table. With it, is possible to specify an extra condition, for example detector type, or data vs montecarlo

The next table shows the four columns of all conditions tables in the order in which they are created and they have to be filled, plus a ‘user_defined’ column that represents the rest of the columns that the users can add.

channel tv tr data_type user_defined
0 1 30000 TEST other

Table with run number as tv_key

In this type of table is possible to store information concerning a run. The column tv is used to store the run number. The column channel can be left with a constant value (cero), if there are no multiple channels, or it can be used as intended with one number per channel. The column tr is filled automatically with the timestamp (unix time) when the data from the row was uploaded. And the column data_type is optional and can be used as an extra key, so the user can choose to specify a string or not.

The next table is an example of a table with run number as key. The monte carlo data has just one channel and so if the data is retrieved for those run numbers it will return just one value. Since the raw data has multiple channels, if its retrieved for run 3 it will return 3 rows of data.

channel tv tr data_type user_defined
0 1 1712184741 mc_data other
0 2 1712184742 mc_data other
0 3 1712184743 raw_data other
1 3 1712184744 raw_data other
2 3 1712184745 raw_data other

Run numbers are interpolated

It is important to note that when using the run number ot tv_key to extract data, the function interpolates the tv_key values. So, if there is a run that does not have any column, the ‘get’ function will return the row with the run number closest to the value given.

Table with timestam as tv_key

In this type of table its possible to store information that changes with time, for example slow control parameters.

The column tv is used to store the timestamp in unix time when the data was taken. The column channel can be left with a constant value, if there are no multiple channels, or it can be used as intended with one number per channel. The column tr is filled automatically with the timestamp (unix time) when the data from the row was uploaded. And the column data_type is optional, so the user can choose to specify a string or not.

The next table shows an example with timestamp as tv_key.

channel tv tr data_type user_defined
0 1712184001 1712184741 mc_data other
0 1712184002 1712184742 mc_data other
0 1712184003 1712184743 raw_data other
1 1712184003 1712184744 raw_data other
2 1712184003 1712184745 raw_data other

Example of how to create a table using the condb2 python API

The following is an example of how to create a table using the condb2 api

import os, subprocess, re

# Table information
table_name = 'pdunesp.test_prueba' #schema.table_name
host = ###
port = ###
user = ####
passw = ####
r_permission = ### # DB users to grant read permissions to
w_permission = ### #DB users to grant write permissions to
database = ###

payloads = 'start_time:float run_type:text extra_condition:hstore'
com = f'condb create -h {host} -p {port} -U {user} -w {passw} -s -R {r_permission} -W {w_permission} {database} {table_name} {payloads}'

comm = re.split(' ', com)
try:
    subprocess.run(comm)
except:
    print(f'something didnt work with the creation of the table')

Uploading data to the table

Now its time to upload data to the previously created table.

The following examples use the ucondb2 API.

Using the command line interface

condb2 put [options] <folder_name> <CSV file>

  Options:
      -s <server URL>             CONDB_SERVER_URL envirinment variable can be used too
      -U <username>
      -w <password>
      -d <data type>

Using the web access python API

In the following example it is possible to fill out all the columns of the table or specify which ones to fill

from condb2 import ConDBClient

client = ConDBClient('https://dbdata0vm.fnal.gov:9443/dune_runcon_prod')
folder = 'pdune.ExampleTable'

data = [0, 18000, "test"]   #Starting with channel, tv, 
columns = [“channel”, “tv”, "user_defined"]   #Specify columns to fill or leave blank for all
client.put_data(folder, data, columns,  data_type="pdune_hd") # data type can be used as another key in the table

Tag a table

It is possible to tag a table to create different versions. The API will always returns values from the ‘newer’ version unless otherwise specified. The tag is related to the tr column of when the data was uploaded. It is also possible to leave coments when creating a new table version.

The following is an example of how to create a new version (v1.001) of the table “ExampleTable” in the namespace “pdune” using the ucondb2 API and using the web access python API

from condb2 import ConDBClient

client = ConDBClient('https://dbdata0vm.fnal.gov:9443/dune_runcon_prod')
folder = 'pdune.ExampleTable'

client.tag_state(folder, tag='v1.001', tr=None, copy_from=None, override=False)

It is also possible to look at the tags that a table already has. For that we need to use the Direct Access Python API. In general, it is better to use the web access python API, and more information can be found in the condb2 webpage. The following is an example to look at the tags.

from condb2 import ConDB

def connect(self):
    host    =  ### 
    port    =  ###
    db_name =  ###
    user    =  ###
    passw   =  ###
    connstr = f'host={host} port={port} dbname={db_name} user={user} password={passw}'
    db = ConDB(connstr = connstr)
    return db.openFolder('pdune.ExampleTable')

folder = connect()

tags = folder.tags()
for tag in tags:
            print(tag, 'This is the tag')

More in depth look at the data

It is possible to examine the data by appliying conditions to the table columns to filter resutls.

from condb2 import ConDBClient

client = ConDBClient('https://dbdata0vm.fnal.gov:9443/dune_runcon_prod')
folder = 'pdune.ExampleTable'

# get_data example - Get data from given run
run = 18000
columns, data = client.get_data(table, t0=run)
print("columns:", ','.join(columns))
for line in data:
    print(line)

# search_data example - Search data, or runs that comply with the following conditions
con = [("run_type","=",'PROD'),("buffer",">=",0)] # Example conditions on the data
col, data1 = client.search_data(folder, conditions=con)
print("columns:", ','.join(col))
for line in data1:
    print(line)

Key Points

  • There are several APIs available to access the information in the conditions database. Which to use depends on your specific needs


Run Conditions Table

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • What is the Run Conditions Table?

  • What data is stored in this table?

  • How to access the data?

Objectives
  • Learn how to access and stream the data from tables in the conditions database by using the run conditions table as an example

The Run Conditions Table

The run conditions table is stored at the ProtoDUNE conditions database. It contains the conditions for each run, and the data can be accessed through various methods to suit different user needs.

Table information

As of now, the table contains the following metadata:

  tv (run) tr data_type upload_time start_time stop_time run_type detector_id software_version
Unit N/A Unix N/A Unix Unix Unix N/A N/A N/A
Example 25034 1713497099.738875 np02_coldbox or np04_hd 1713497099.7388604 1713268519.0 1713269109.0 PROD np02_coldbox or np04_hd or np02_hermes_WIB_conf fddaq-v4.4.0-rc3-a9
Comment Run number Used for versioning         data_stream    
data_quality ac_couple baseline buffering enabled gain gain_match leak leak_10x leak_f peak_time pulse_dac
N/A N/A N/A N/A N/A mV/fC N/A pA N/A pA us N/A
good or bad dc_coupling or ac_coupling 2 0 True 14.0 True 500.0 False None 2.0 0
offline good runs   0 (900 mV), 1 (200 mV), 2 (200 mV collection, 900 mV induction) 0 (no buffer), 1 (se buffer), 2 (sedc buffer) True of FEMB should be configured and read out by WIB Options: 14, 25, 7.8, 4.7 mV/fC Enable pulser DAC gain matching   Multiply leak current by 10 if true final leak value Channel peak time selector Pulser DAC setting [0-63]
strobe_delay strobe_length strobe_skip test_cap adc_test_pattern cold detector_type pulser
N/A N/A N/A N/A N/A N/A N/A N/A
255 255 255 False False False wib_default False
64MHz periods to skip after 2MHz edge for strobe (pulser offset 0-255) Length of strobe in 64MHz periods (pulser length 0-255) 2MHz periods to skip after strobe (pulser period 0-255) Enable the test capacitor True if the COLDADC test pattern should be enabled True if the front end electronics are COLD (77k) Options: WIB default, upper APA, lower APA, CRP True if the calibration pulser should be enabled
beam_momentum beam_polarity detector_hv wire_bias_g wire_bias_u wire_bias_x lar_purity lar_top_temp_mean lar_bottom_temp_mean
GeV/c N/A V            
+5 positive or negative 175000            
indirectly calculated using magnet MBPL.022.692 current, and rounded up looking at magnet MBPL.022.692 current from sensorID: 47894774153498 which is NP04_DCS_01:Heinz_V            

How to access the data

Curl command

There are several methods available for users to access the data of the run conditions table. The most straightforward way is to access the data with a curl command on the terminal or on the web.

What you need to know

  1. The database url. For our example: https://dbdata0vm.fnal.gov:9443/dune_runcon_prod
  2. The table name. For our example: pdunesp.run_conditionstest
curl "https://dbdata0vm.fnal.gov:9443/dune_runcon_prod/get?folder=pdunesp.run_conditionstest&t=28650"

The above example queries the conditions of run “25034” or the closest run to “25034”.

Remember

  1. If the given run does not exists the previous query will return the closest run.
  2. The column representing run number is called tv.

To get the conditions of a range of runs, for example [25100,25115] use the following query:

curl "https://dbdata0vm.fnal.gov:9443/dune_runcon_prod/get?folder=pdunesp.run_conditionstest&t0=28650&t1=28655"

Python API (Condb2)

Detailed documentation about the conditions database python REST API can be found in the condb2 webpage.

To use the API, it is necessary to install it, as it is explained in the webpage. Once that is done, it is recommended to use the condb2 web access python API, instead of the direct access. For the run conditions table, the url that must be provided is:

  1. The database url: https://dbdata0vm.fnal.gov:9443/dune_runcon_prod
  2. The table/folder name: pdunesp.run_conditionstest

The web access python API includes functions for retrieving and uploading data, as well as creating table and table versions. Detailed documentation on how to use these functions can be found in the web access python API webpage. To use the functions get_data or search_data it is not necessary to provide username or password, nevertheless they are required to use the function put_data. To get a valid username and password contact Ana Paula Vizcaya or Norm Buchanan from the database team.

c++ interface and art service

The run conditions table uses the c++ interface and art service of the conditions database mentioned in the conditions database of ProtoDUNE webpage.

It is located with the dunecalib service. The github repository is dunecalib. A presentation with information on how to use the run conditions c++ interface and service can be found here.

C++ interface

To start using run conditions parameters include the following header files on your c++ file

#include "dunecalib/ConInt/RunConditionsProtoDUNE.h"
#include "nuevdb/IFDatabase/Table.h"

and set up the table to upload

condb::RunConditionsProtoDUNE* runCond = new condb::RunConditionsProtoDUNE();
runCond->SetTableURL("https://dbdata0vm.fnal.gov:9443/dune_runcon_prod/");
runCond->SetTableName("pdunesp.run_conditionstest");
runCond->SetVerbosity(0); // How much output, (0,3) - (none, more)
runCond->SetRunNumber1(0); //Change if a range of runs is desired
runCond->UpdateRN(25034); //Run Number
//runCond->SetTag(gDBTag); // If database has more than one version
runCond->LoadConditionsT();

Now you are ready to load and use the conditions parameters! The following is just one example to output some conditions on the terminal:

condb::RunCond_t rc = runCond->GetRunConditions(run);
std::cout << "\tStart time = " << rc.start_time
            << "\n\tdata type = " << rc.data_type
            << "\n\trun Number/sofw = " << rc.run_number
  	        << "\n\tupload time = " << rc.upload_t
            << "\n\tsoftware version = " << rc.software_version
            << "\n\tstop_time = " << rc.stop_time 
            << "\n\tbuffer = " << rc.buffer
            << "\n\tac_couple = " << rc.ac_couple
            << "\n\trun type = " << rc.run_type << std::endl;

A complete example script can be found in: dunecalib/dunecalib/ConInt/getRunConditionsPDUNE.cc and that example can be run as follows:

getRunConditionsPDUNE -r 25016

art service

You can also opt to use an art service. A presentation with some information on how to use it can be found here. The source scripts are located in the git dunecalib git repository: dunecalib/dunecalib/ConIntServices.

Alert

This section needs more details

To set up the art service in your fhicl file. Include the header files:

#include "runconditions_pdune.fcl"

And then in services:

services:
{
   servicerun: @local::pdune_runconditions
}

That should run the run conditinos table service!

And the fcl file with the variables that can change, such as run number: dunecalib/dunecalib/ConIntServices/runconditions_pdune.fcl. There the url and name of the run conditions table are also specified:

BEGIN_PROLOG

pdune_runconditions :
{
  service_provider: "RunConditionsServicePDUNE"

  TableURL: "https://dbdata0vm.fnal.gov:9443/dune_runcon_prod/" 
  TableName: "pdunesp.test"
  RunNumber: 23302.0
  RunNumber1: 0
  DBTag:     "v1.1"
  Verbosity: 1
}

END_PROLOG

Metacat filter

The run conditions table also works as a filter in ProtoDUNE’s metadata catalog (Metacat) queries. With it, you can search for run files using parameters from the run conditions table as filters.

What you need to know

The run conditions table filter name, for metacat, is dune_runshistdb.

To use the filter in metacat to query files, follow the next steps.

  1. Go to Metacat web interface
  2. Click on the query option on the top of the webpage
  3. Enter the query and run

The following is an example of a Metacat query to look for files of run 25016 where the run_type is PROD. This last condition looks at the data from the run conditions table.

filter dune_runshistdb() (files from hd-protodune:hd-protodune_25016) where runs_history.run_type = PROD

The first part contains the filter name ** dune_runshistdb. The code between () contains the dataset of files to look from (files from hd-protodune:hd-protodune_25016). And the final part contain the filter using the run condition parameter **runs_history.run_type = PROD.

Key Points

  • There are several APIs available to access the information in the conditions database. Which to use depends on your specific needs.