Output format

larnd-sim can generate a realistic datastream saved into an HDF5 file format. HDF5 is a widely-used, highly performant file format, see the HDF5 website for more details on the file format. Internally, larnd-sim uses h5py, an open source cython package, to manage access to the output file, which we highly recommended if you are getting started with HDF5.

Charge data

For the charge simulation output, larnd-sim uses the same datasets as generated by larpix-control to provide minimal modifications to downstream analysis code when working with data or simulation.

You can read more about this format here.

In addition to the datasets defined in the above link, larnd-sim adds true particle association data within the mc_packets_assn dataset. This dataset is a 2-dimensional array with the same first dimension as the packets dataset and a second dimension corresponding to the edep-sim track segment that contributed to the ADC value. The dataset has two fields; track_ids, the index into the tracks dataset for each entry; and fraction, the fraction of the ADC’s value that can be attributed to that edep-sim segment. Because an arbitrary number of track segments can contribute to each trigger, this dataset is a “ragged” array with null entries of track_ids == -1.

Light data

For the light simulation output, larnd-sim uses an analogous data structure to the ADC64 format generated the the DUNE ND-LAr light system readout electronics. The datasets are summarized below

  • light_trig, shape (n_triggers,): meta data associated with each light system self-trigger

    • op_channel, shape (n_optical_channels_per_trig,): 32-bit integer indicating the optical channel ids included in this trigger

    • ts_s: 64-bit double indicating the global timestamp of the trigger in seconds

    • ts_sync: 64-bit unsigned integer indicating the larpix timestamp of the trigger

  • light_wvfm, shape (n_triggers, n_optical_channels_per_trig, n_adc_samples): the ADC samples of each light system self-trigger

  • light_dat, shape (n_edepsim_segments, n_optical_channels): the true number of photoelectrons and first photon arrival time on each optical channel generated by each edep-sim track segment

    • n_photons_det: number of photoelectrons generated by the track segment

    • t0_det: arrival time of first photon from the track segment

  • light_wvfm_mc_assn, shape (n_triggers, n_optical_channels_per_trig, n_adc_samples), (only generated if full MC truth propogation is enabled): the true contribution of each edep-sim track segment to each ADC sample in the light waveform data

    • track_ids, shape (n_max_tracks,): index of true edep-sim segment contributing to ADC value

    • pe_current, shape (n_max_tracks,): true equivalent photocurrent generated by the edep-sim segment

Charge/light matching

Because there is no common trigger for the charge and light digitizations, association between the two datastreams must be done using the timestamp. A short example using numpy is provided here:

import h5py
import numpy as np

f = h5py.File(<larnd-sim file>, 'r')

charge_packets = f['packets'][:]
charge_trigger = charge_packets['packet_type'] == 7
charge_trigger_timestamp = charge_packets[charge_trigger]['timestamp']

light_trigger_timestamp = f['light_trig'][:]['ts_sync']

timestamp, packet_index, light_index = np.intersect1d(charge_trigger_timestamp, light_trigger_timestamp)

event0_packets = charge_packets[packet_index[0]:packet_index[1]]
event0_waveforms = f['light_wvfm'][light_index[0]:light_index[1]]