Output

Todo

  • optical output scheme

  • track output scheme

  • isotope, energy filtering

remage supports all output formats supported by G4AnalysisManager (HDF5, ROOT, CSV, XML), plus LH5. The file type to use is selected by the specified output file name (.h5, .root, .csv, .xml, .lh5).

Note

LH5, HDF5 and ROOT output formats require Geant4 to be explicitly compiled with support for the HDF5 or ROOT libraries, respectively.

The contents of the output files is determined by output schemes. An output scheme does not only contain functionality for the actual output description, but also might have parts of Geant4’s stacking action functionality. Output schemes, in general, are remage’s way to implement pluggable event selection, persistency and track stacking.

Selection of output schemes

Adding a sensitive detector of any type (see Registering sensitive detectors) will add the corresponding main output scheme to the list of active output schemes.

Additional output schemes might be used for filtering output. Optional output schemes can be enabled with the /RMG/Output/ActivateOutputScheme macro command:

/RMG/Output/ActivateOutputScheme [name].

Note

Adding output schemes with C++ code is possible using the RMGUserInit system of remage (access it with auto user_init = RMGManager::Instance()->GetUserInit():

  • user_init->AddOutputScheme<T>(...); adds and enables the output scheme new T(...) on each worker thread,

  • user_init->AddOptionalOutputScheme<T>("name", ...); adds a name-tag to an output scheme, that will not be enabled right away, and

  • user_init->ActivateOptionalOutputScheme("name") enables such a registered output scheme.

Output file types

The selection of the output file type depends on the file extension of the specified output file. Possible output file types include lh5, hdf5, or root—or any other file format that G4AnalysisManager can write; but these are not tested regularly. In general, we recommend choosing the LH5 format (more below).

Note

remage will not produce an output file, if no output file name is provided by the user. Specify -o none to acknowledge the warning that is emitted when output schemes are registered, but no file will be created.

In case a multithreaded simulation is requested with the -t or --threads option (see Running simulations), the output file names will be appended with the thread number. remage will produce one output file per thread appending _t$id, where $id is the thread number, before the file extension.

For example running remage with:

remage -o OUTPUT.lh5 -t 8

will result in output files OUTPUT_t0.lh5,..., OUTPUT_t7.lh5.

Geant4 automatically merges these files into a single one at the end of a run for all supported formats, except for HDF5. For the LH5 output format, remage can merge the output files before saving to disk. This feature can be enabled with the --merge-output-files (or -m) command line option.

Warning

Merging involves some additional I/O operations so for some simulations may increase run time! remage will report the amount of time spent merging the files.

Physical units

In LH5 output files, units are attached as attributes to the table columns, as specified in the LH5 spec.

For any other output format (HDF5, ROOT, etc), remage is not able to attach metadata to columns. The ntuple columns created by remage contain physical units in their names, encoded as in the legend-metadata spec (i.e. adding _in_<units> at the end of the name), where the units are expressed in the typical physical unit symbols. Unfortunately, column names cannot contain forward slashes, so units like m/s cannot be represent directly. Instead, a backslash (\) is used to encode the division symbol (for example: velocity_in_m\s).

Built-in output schemes

remage currently implements schemes to read out Germanium, Scintillator and Optical detector types.

Germanium: Germanium (HPGe) detectors

The Germanium output scheme handles the output from germanium (HPGe) detectors, but would also work for other solid state detectors (calorimeters).

HPGes have sensitivity to the topology of event interactions via the pulse shape and they also have a different response close to the detector electrodes. So when simulating HPGe’s it is advisable to save the information of the steps of particles within the detector. Then “post-processing” software such as reboost can apply the detector response model without repeating the computationally intensive simulation.

By default this output scheme writes out all steps in the registered sensitive HPGe detectors. The following properties of each hit are recorded (by default):

  • time: The global time of the hit,

  • particle: the PDG code of the particle,

  • xloc, yloc, zloc: the global position,

  • evtid: the index of the Geant4 event,

  • edep: the deposited energy,

  • dist_to_surf: the distance of the hit from the detector surface.

By default all floating point fields are saved with 64-bit (double) precision. The precision of the energy and or position / distance fields can be reduced to 32-bit with the macro commands /RMG/Output/Germanium/StoreSinglePrecisionPosition and /RMG/Output/Germanium/StoreSinglePrecisionEnergy.

It is possible to also store the track id (see link for details) and parent track id of each step with /RMG/Output/Germanium/StoreTrackID.

As mentioned earlier output schemes also provide a mechanism for filtering events. One useful option is to only write out events in which energy was deposited in a germanium detector. This is used since the other detector systems (liquid argon, water Cherenkov etc.) often act a “vetos’s”, so we are not interested in the energy deposited or steps if an event in the germanium did not occur. The macro commands /RMG/Output/Germanium/AddDetectorForEdepThreshold, /RMG/Output/Germanium/EdepCutLow and /RMG/Output/Germanium/EdepCutHigh:

/RMG/Output/Germanium/AddDetectorForEdepThreshold {UID}
/RMG/Output/Germanium/EdepCutLow {ELOW}
/RMG/Output/Germanium/EdepCutHigh {EHIGH}

implement this functionality, for every event the total energy deposited is computed. This is based on summing the energy deposited in each {UID} added, or across all registered sensitive Germanium detectors (if this macro command is not used). The event is then discarded if the energy is less than or equal to ELOW or more than EHIGH.

Note

This mechanism will remove the data from the event across all output schemes, not only the Germanium! However, OutputSchemes with their StoreAlways() function returning true, like the RMGTrackOutputScheme or the RMGVertexOutputScheme will always store their output.

Similarly, for simulations involving optical photons it is possible to discard all optical photon tracks before simulating them if no energy was deposited in germanium. This can be enabled with /RMG/Output/Germanium/DiscardPhotonsIfNoGermaniumEdep.

By default, the position saved for each step is the average of the pre and post-step point. This can be controlled with /RMG/Output/Germanium/StepPositionMode, which can be set to Average (the default), Both (saves) also the pre and post steps, or Pre/Post.

Important

For gammas the position saved is always that of the post-step, since all gamma interactions are discrete.

Typically only steps where some energy was deposited are written out to disk, to control this behaviour there is /RMG/Output/Germanium/DiscardZeroEnergyHits.

Finally, it is possible to “pre-cluster” the steps, this is used to reduce the amount of data written out to disk by combining steps very close together. Since the surface region of a HPGe detector has different properties to the bulk this clustering can be performed differently for surface and bulk hits (see data-reduction for more details).

Scintillator: calorimeters

This output scheme records stepping data in scintillating materials (e.g. liquid argon), following a calometric approach. This scheme is useful for applications where an optical detector response is applied on the simulated particle interactions. Detection of optical photons is handled by the Optical output scheme (see later). Most functionality is similar to the Germanium output scheme with a few exceptions:

  • Unlike for germanium detectors the distance to the detector surface is not calculated,

  • Stacking of optical tracks is not implemented,

  • The velocity of the particles can be saved using the /RMG/Output/Scintillator/StoreParticleVelocities command.

Optical: optical photon detectors

Todo

coming soon…

Single- versus multi-detector table layout

remage will store sensitive volume hits in separate output tables by default, one per detector. While this layout is useful if analyzing data from each detector independently, sometimes having all hits stored in the same output table can be more beneficial. The multi-table layout can be disabled by setting /RMG/Output/NtuplePerDetector to false. In this scenario, remage will organize hits by detector type in separate tables (a table named germanium for the Germanium detectors, scintillator for Scintillators, etc.).

Table naming

By default, remage will name output tables by their internal unique identifier (UID), prefixed with det (i.e. the table for a detector with UID 39 will be saved as det039). These tables are located in a directory named stp (as for “stepping data”). For example, an output LH5 file will look like:

/
└── stp · struct{det1104005,det1105600,...}
    ├── det1104005 · table{...}
    │   └── ...
    ├── det1105600 · table{...}
    │   └── ...
    └── ...

Note

The directory name can be customized with the /RMG/Output/NtupleDirectory command.

Non-stepping-data (auxiliary) tablers are stored in the same directory, due to limitations of the Geant4 analysis manager, except for the LH5 output (see later).

remage offers the possibility to name stepping data tables after their Geant4 logical volume name, instead of the UID. This feature can be enabled with the /RMG/Output/NtupleUseVolumeName command. In this case, for example, an output LH5 file will look like:

/
└── stp · struct{B00000C,B00000D,...}
    ├── B00000C · table{...}
    │   └── ...
    ├── B00000D · table{...}
    │   └── ...
    └── ...

Note

If the LH5 output format is selected, detector tables can be still accessed by UID through the symbolic stored in the /stp/__by_uid__ group.

/
└── stp · struct{B00000C,B00000D,...}
    ├── __by_uid__ · struct{det1104005,det1105600,...}
    │   ├── det1104005 -> /stp/B00000C
    │   ├── det1105600 -> /stp/B00000D
    │   └── ...
    ├── B00000C · table{...}
    │   └── ...
    ├── B00000D · table{...}
    │   └── ...
    └── ...

Data reduction methods

Often Geant4 takes steps much shorter than those that are meaningful in a HPGe or a scintillation detector. For example the typical dimension of charge clouds produced by interactions in germanium are 1-2 mm, so we are not sensitive to tracking at the micrometer level. To reduce the file size while retaining the useful information for computing observables of interest we have implemented some “pre-clustering” routines. These routines combine together steps that are very close together.

Note

The aim of this (pre)-clustering is only to make a minimal reduction of information which cannot be useful! Further, more aggressive clustering may be needed for some applications.

In order to have an efficient algorithm for pre-clustering we take use a “within-track” approach, this clusters only steps in the same G4Track, with some exceptions for very low energy tracks. In this way we only have to iterate through the steps in each event once. This also means the rows in our output are still interpretable with steps in the detector (just with a larger step length). The clustering is handled by the function RMGOutputTools::pre_cluster_hits(). This takes in the pointer to the original RMGDetectorHitsCollection returning a pointer to a new collection of clustered hits.

Note

  • The function returns a shared pointer to the hit collection, for some applications it may be necessary to extract also an unmanaged pointer, for example to make this collection look identical to that obtained directly from Geant4.

  • This design makes it easy to include additional clustering algorithms, a similar function just needs to be written.

Pre-clustering is enabled by default for the Scintillator and Germanium output schemes, it can be disabled with the command /RMG/Output/Germanium/Cluster/PreClusterOutputs and similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/PreClusterOutputs.

This clustering works by first organizing the hits by track id (the index of the G4Track within the event). Some processes in Geant4 produce a large number of secondary tracks due to atomic de-excitation, these tracks typically have a very low energy and range (however they are still produced since production cuts are not applied for most gamma interactions). Thus they are not expected to impact observables of interest. In many cases, after pre-clustering of high energy electrons, these tracks could form the majority of the output.

We implemented the possibility to merge these tracks prior to pre-clustering which can be enabled with /RMG/Output/Germanium/Cluster/CombineLowEnergyElectronTracks and similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/CombineLowEnergyElectronTracks.

Warning

This means in some cases there are steps in the output that are the combination of steps in different Geant4 tracks.

This command will select electron tracks with energy lower than a threshold, which is by default 10 keV, but can be changed with /RMG/Output/Germanium/Cluster/ElectronTrackEnergyThreshold.

and similar for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/ElectronTrackEnergyThreshold. For each track, we search for tracks which have a first pre-step point within the cluster radius of the first pre-step point of the low energy track. The low energy track is then merged with the neighbour track with the highest energy. In addition, Geant4 will sometimes associated some deposited energy with gamma tracks (due to atomic binding energy), optionally the user can request instead redistributing this energy to the secondary electron tracks with /RMG/Output/Germanium/Cluster/RedistributeGammaEnergy.

this then means the gamma tracks would not have energy deposits and do not need to be written out in the output file (unless this is explicitly requested). Or similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/RedistributeGammaEnergy.

After these two pre-processing steps the pre-clustering proceeds by looping through the steps in each track. For each step the distance to the first step in the current cluster is calculated, if this distance is less than the user defined distance, and the time difference is less than the time threshold, the step is added to the current cluster.

The distance/time thresholds used for pre-clustering can be set with the commands /RMG/Output/Germanium/Cluster/PreClusterDistance and /RMG/Output/Germanium/Cluster/PreClusterTimeThreshold, and similar for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/PreClusterDistance and /RMG/Output/Scintillator/Cluster/PreClusterTimeThreshold

For Germanium detectors, where the surface region has substantially different properties to the bulk, we give the possibility to cluster with a different threshold for the surface region of the detector. This is by default the region within 2 mm of the detector surface but can be changed with /RMG/Output/Germanium/Cluster/SurfaceThickness. Then, a threshold can be set specifically for this region with /RMG/Output/Germanium/Cluster/PreClusterDistanceSurface. This will apply this threshold for any step where the distance to surface is less than the surface thickness. With this option a new cluster will also be formed if a step moves from the surface to bulk region of the germanium (or vice-versa).

Note

By default pre-clustering is performed for both the Germanium and Scintillator output schemes with 50 \(\mu\)m distance for Germanium. By default clustering is not applied to the surface for Germanium (within the surface thickness set by default as 2 mm). For the Scintillator output scheme we use 500 \(\mu\) m cluster distance by default. For both outputs a 10\(\mu\) m time threshold is used by default.

These options provide a sophisticated mechanism for handling the surface of HPGe detectors!

For each cluster, we then compute an “effective” step:

  • the time, pre-step position, distance to surface, velocity is taken from the first step.

  • the post-step position, distance are evalauted from the last step

  • while the energy deposit is summed over all steps.

  • the average of the pre-step position and post-step position is computed.

All other fields are constant within a track and are taken from the first step.

Note

In this way the output still represents a step, just with a longer effective step length.

LH5 output

LH5 (LEGEND-HDF5) is a simple, open-source HDF5-based data format specification initially developed by the LEGEND collaboration. Good support for reading and writing LH5 files is available in Python (through legend-pydataobj) and in Julia (through LegendHDF5IO.jl). Alternatively, since the data is stored in HDF5, it can be read by any other HDF5 I/O tool. Given the advantages of the LH5 format over the others, remage adopts it as primary output format and recommends it to all users.

It is possible to directly write a LH5 file from remage, to facilitate reading output ntuples as a LH5 Table. To use this feature, simply specify an output file with a .lh5 extension, and remage will perform the file conversion automatically.

Note

If the LH5 output is selected, remage performs some post-processing steps at the end of a simulation run such as re-organizing data into more meaningful structures or adding useful information.

Reshaping output tables

For the LH5 format and Germanium or Scintillator outputs we implemented a “reshaping” of the output tables. This groups together rows in the same output table that have the same simulated Geant4 evtid and also with times with the user defined time window (more later). In this way the resulting rows of the output table represent physical interactions in a sensitive volume occurring in a window compatible with the time resolution of the detector. In the following, we will often refer to these as “hits”.

This means the columns of the output table are converted from LH5 Array’s objects to LH5 VectorOfVectors’s. However, this grouping is lossless.

Without reshaping, the output table is flat: each column (evtid, edep, …) is a one-dimensional array:

[{evtid: 0, particle: 11, edep: 20.1, time: 0.158, xloc: -0.0222, ...},
 {evtid: 0, particle: 11, edep: 50.1, time: 0.251, xloc: -0.0178, ...},
 {evtid: 0, particle: 11, edep: 74.9, time: 0.522, xloc: -0.0767, ...},
 {evtid: 2, particle: 11, edep: 0.431, time: 0.344, xloc: -0.0335, ...},
 {evtid: 2, particle: 11, edep: 109, time: 0.423, xloc: -0.0542, ...},
 {evtid: 3, particle: 11, edep: 70.2, time: 0.0545, xloc: -0.0128, ...},
 ...,
 {evtid: 44, particle: 11, edep: 88.4, time: 0.484, xloc: -0.0313, ...},
 {evtid: 49, particle: 11, edep: 33.1, time: 0.848, xloc: -0.126, ...},
 {evtid: 49, particle: 11, edep: 115, time: 0.85, xloc: -0.125, yloc: ..., ...}]

With reshaping, columns acquire one additional dimension:

[{edep: [20.1, ..., 74.9], evtid: [0, ..., 0], particle: ..., ...},
 {edep: [0.431, 109], evtid: [2, 2], particle: [11, 11], ...},
 {edep: [70.2], evtid: [3], particle: [11], time: [...], ...},
 ...,
 {edep: [88.4], evtid: [44], particle: [...], ...},
 {edep: [33.1, 115], evtid: [49, 49], particle: ..., ...}]

This behavior is enabled by default for .lh5 file outputs, it can be suppressed with the --flat-output flag to the remage executable. The time window used to group together rows can be set with the --time-window-in-us flag, the units are \(\mu\)s and by default a window of 10\(\mu\)s is used.

Warning

Reshaping involves some additional I/O operations so for some simulations may increase run time! remage will report the amount of time spent reshaping the files.

It is possible to supply both the -m and -r flags to simultaneously merge and reshape the output files.

Time-coincidence map

In the presence of multiple sensitive detectors written out as separate output tables, reconstructing event information can be a tedious operation when analyzing the simulation output. To simplify the task, remage computes a so-called “time-coincidence map” (TCM) table at the end of a simulation run with pygama.evt.tcm.build_tcm() and stores it in the output file as /tcm.

/
├── stp · struct{B00000C,B00000D,...}
│   └── ...
└── tcm · table{row_in_table,table_key}
    ├── row_in_table · array<1>{array<1>{real}}
    └── table_key · array<1>{array<1>{real}}

Every row of the TCM corresponds to a simulated event, defined by the same time window used for reshaping the output tables. For each event, the table_key column specifies the list of detectors that had hits, while the row_in_table column specifies which rows (through the row index) need to be read from the respective output tables. The detectors are labeled in row_in_table by their UID assigned at registration time (see Registering sensitive detectors). If output tables are keyed by UID, querying them based on the TCM data is straightforward. In case Geant4 logical volume names are used (see Table naming), It’s still possible to query data by UID with the symbolic links stored in /stp/__by_uid__, keyed by UID as in the default remage convention.

More information on how to use the TCM is provided in Analyzing simulation output, while more documentation about how the TCM is generated is available at pygama.evt.tcm.generate_tcm_cols().

The vertex table

remage stores data about the simulated event vertex in a table named vtx. In the LH5 output, it can be found in the HDF5 root group (/vtx).

/
├── stp · struct{B00000C,B00000D,...}
│   └── ...
├── tcm · table{row_in_table,table_key}
│   └── ...
└── vtx · table{evtid,n_part,time,xloc,yloc,zloc}
    ├── evtid · array<1>{real}
    ├── n_part · array<1>{real}
    ├── time · array<1>{real} ── {'units': 'ns'}
    ├── xloc · array<1>{real} ── {'units': 'm'}
    ├── yloc · array<1>{real} ── {'units': 'm'}
    └── zloc · array<1>{real} ── {'units': 'm'}

Each row in the table corresponds to a vertex. The columns are:

  • evtid: Geant4 event identifier

  • time: time relative to the start of the event (zero, most of the times)

  • n_part: number of generated particles

  • xloc, yloc, zloc: global coordinates of the vertex position.

The table always lists data from all vertices, i.e. the length of the table is always equal to the number of simulated events.

The vertex table is useful to reconstruct the event vertex of hits recorded in sensitive detectors by matching the information stored in the evtid column.