Output

Todo

  • one-table versus multi-table configurations

  • vertex table

  • track output scheme

  • isotope, energy filtering

remage supports all output formats supported by G4AnalysisManager (HDF5, ROOT, CSV, XML), plus LH5. The file type to use is selected by the specified output file name (.h5, .root, .csv, .xml, .lh5).

Note

LH5, HDF5 and ROOT output formats require Geant4 to be explicitly compiled with support for the HDF5 or ROOT libraries, respectively.

The contents of the output files is determined by output schemes. An output scheme does not only contain functionality for the actual output description, but also might have parts of Geant4’s stacking action functionality. Output schemes, in general, are remage’s way to implement pluggable event selection, persistency and track stacking.

Selection of output schemes

Adding a sensitive detector of any type will add the corresponding main output scheme to the list of active output schemes.

Additional output schemes might be used for filtering output. Optional output schemes can be enabled with the /RMG/Output/ActivateOutputScheme macro command:

/RMG/Output/ActivateOutputScheme [name].

Note

Adding output schemes with C++ code is possible using the RMGUserInit system of remage (access it with auto user_init = RMGManager::Instance()->GetUserInit():

  • user_init->AddOutputScheme<T>(...); adds and enables the output scheme new T(...) on each worker thread,

  • user_init->AddOptionalOutputScheme<T>("name", ...); adds a name-tag to an output scheme, that will not be enabled right away, and

  • user_init->ActivateOptionalOutputScheme("name") enables such a registered output scheme.

Output schemes are often coupled to sensitive detector types. At present, it is not possible to register detector types at runtime.

Output file types

The selection of the output file type depends on the file extension of the specified output file. Possible output file types include lh5, hdf5, or root—or any other file format that G4AnlasisManager can write; but these are not tested regularly.

Note

remage will not produce an output file, if no output file name is provided by the user. Specify -o none to acknowledge the warning that is emitted when output schemes are registered, but no file will be created.

In case a multithreaded simulation is requested with the -t or --threads option (see Running simulations), the output file names will be appended with the thread number. remage will produce one output file per thread appending _t$id, where $id is the thread number, before the file extension.

For example running remage with:

remage -o OUTPUT.lh5 -t 8

will result in output files OUTPUT_t0.lh5,..., OUTPUT_t7.lh5.

Geant4 automatically merges these files into a single one at the end of a run for all supported formats, except for HDF5. For the LH5 output format, remage can merge the output files before saving to disk. This feature can be enabled with the --merge-output-files (or -m) option.

Warning

Merging involves some additional I/O operations so for some simulations may increase run time! remage will report the amount of time spent merging the files.

LH5 output

It is possible to directly write a LH5 file from remage, to facilitate reading output ntuples as a LH5 Table. To use this feature, simply specify an output file with a .lh5 extension, and remage will perform the file conversion automatically.

Note

Additionally, the standalone tool remage-to-lh5 is provided to convert a default Geant4 HDF5 file to a LH5 file. With this, executing remage -o output.lh5 [...] is roughly equivalent to the combination of commands:

$ remage -o output.hdf5 [...]
$ remage-to-lh5 output.hdf5
$ mv output.{hdf5,lh5}

Reshaping output tables

For the LH5 output and Germanium or Scintillator output we implemented a “reshaping” of the output tables. This groups together rows in the same output table that have the same simulated Geant4 evtid and also with times with the user defined time window (more later). In this way the rows of the output table represent physical interactions in each output table.

The means the columns of the output table are converted from LH5 Array’s objects to LH5 VectorOfVectors’s. However, this grouping is lossless.

Without reshaping, the output table is flat: each column (evtid, edep, …) is a one-dimensional array:

[{evtid: 0, particle: 11, edep: 20.1, time: 0.158, xloc: -0.0222, ...},
 {evtid: 0, particle: 11, edep: 50.1, time: 0.251, xloc: -0.0178, ...},
 {evtid: 0, particle: 11, edep: 74.9, time: 0.522, xloc: -0.0767, ...},
 {evtid: 2, particle: 11, edep: 0.431, time: 0.344, xloc: -0.0335, ...},
 {evtid: 2, particle: 11, edep: 109, time: 0.423, xloc: -0.0542, ...},
 {evtid: 3, particle: 11, edep: 70.2, time: 0.0545, xloc: -0.0128, ...},
 ...,
 {evtid: 44, particle: 11, edep: 88.4, time: 0.484, xloc: -0.0313, ...},
 {evtid: 49, particle: 11, edep: 33.1, time: 0.848, xloc: -0.126, ...},
 {evtid: 49, particle: 11, edep: 115, time: 0.85, xloc: -0.125, yloc: ..., ...}]

With reshaping, columns acquire one additional dimension:

[{edep: [20.1, 50.1, ..., 74.9], evtid: [0, ..., 0], particle: ..., ...},
 {edep: [0.431, 109], evtid: [2, 2], particle: [11, 11], ...},
 {edep: [70.2], evtid: [3], particle: [11], time: [...], ...},
 ...,
 {edep: [88.4], evtid: [44], particle: [...], ...},
 {edep: [33.1, 115], evtid: [49, 49], particle: ..., ...}]

This behavior is enabled by default for .lh5 file outputs, it can be suppressed with the --flat-output flag to the remage executable. The time window used to group together rows can be set with the --time-window-in-us flag, the units are \(\mu\)s and by default a window of 10\(\mu\)s is used.

Warning

Reshaping involves some additional I/O operations so for some simulations may increase run time! remage will report the amount of time spent reshaping the files.

It is possible to supply both the -m and -r flags to simultaneously merge and reshape the output files.

Physical units

In LH5 output files, units are attached as attributes to the table columns, as specified in the LH5 spec.

For any other output format (HDF5, ROOT, etc), remage is not able to attach metadata to columns. The ntuple columns created by remage contain physical units in their names, encoded as in the legend-metadata spec (i.e. adding _in_<units> at the end of the name), where the units are expressed in the typical physical unit symbols. Unfortunately, column names cannot contain forward slashes, so units like m/s cannot be represent directly. Instead, a backslash (\) is used to encode the division symbol (for example: velocity_in_m\s).

Germanium (HPGe) detectors

The Germanium output scheme handles the output from germanium (HPGe) detectors, but would also work for other solid state detectors (calorimeters).

HPGes have sensitivity to the topology of event interactions via the pulse shape and they also have a different response close to the detector electrodes. So when simulating HPGe’s it is advisable to save the information of the steps of particles within the detector. Then “post-processing” software such as reboost can apply the detector response model without repeating the computationally intensive simulation.

By default this output scheme writes out all steps in the registered sensitive HPGe detectors. The following properties of each hit are recorded (by default):

  • time: The global time of the hit,

  • particle: the PDG code of the particle,

  • xloc, yloc, zloc: the global position,

  • evtid_: the index of the Geant4 event,

  • edep: the deposited energy,

  • dist_to_surf: the distance of the hit from the detector surface.

By default all floating point fields are saved with 64-bit (double) precision. The precision of the energy and or position / distance fields can be reduced to 32-bit with the macro commands /RMG/Output/Germanium/StoreSinglePrecisionPosition and /RMG/Output/Germanium/StoreSinglePrecisionEnergy.

It is possible to also store the track id (see link for details) and parent track id of each step with /RMG/Output/Germanium/StoreTrackID.

As mentioned earlier output schemes also provide a mechanism for filtering events. One useful option is to only write out events in which energy was deposited in a germanium detector. This is used since the other detector systems (liquid argon, water Cherenkov etc.) often act a “vetos’s”, so we are not interested in the energy deposited or steps if an event in the germanium did not occur. The macro commands /RMG/Output/Germanium/AddDetectorForEdepThreshold, /RMG/Output/Germanium/EdepCutLow and /RMG/Output/Germanium/EdepCutHigh:

/RMG/Output/Germanium/AddDetectorForEdepThreshold {DET_UID}
/RMG/Output/Germanium/EdepCutLow {ELOW}
/RMG/Output/Germanium/EdepCutHigh {EHIGH}

implement this functionality, for every event the total energy deposited is computed. This is based on summing the energy deposited in each {DET_UID} added, or across all registered sensitive Germanium detectors (if this macro command is not used). The event is then discarded if the energy is less than or equal to ELOW or less than EHIGH.

Note

This mechanism will remove the data from the event across all output schemes, not only the Germanium!

Similarly, for simulations involving optical photons it is possible to discard all optical photon tracks before simulating them if no energy was deposited in germanium. This can be enabled with /RMG/Output/Germanium/DiscardPhotonsIfNoGermaniumEdep.

By default, the position saved for each step is the average of the pre and post-step point. This can be controlled with /RMG/Output/Germanium/StepPositionMode, which can be set to Average (the default), Both (saves) also the pre and post steps, or Pre/Post.

Important

For gammas the position saved is always that of the post-step, since all gamma interactions are discrete.

Typically only steps where some energy was deposited are written out to disk, to control this behaviour there is /RMG/Output/Germanium/DiscardZeroEnergyHits.

Finally, it is possible to “pre-cluster” the steps, this is used to reduce the amount of data written out to disk by combining steps very close together. Since the surface region of a HPGe detector has different properties to the bulk this clustering can be performed differently for surface and bulk hits (see data-reduction for more details).

Scintillator detectors

This output scheme is used to record the steps in scintillation detectors (typically liquid argon), this is a calometric approach recording the energy deposited and steps. While the Optical output scheme is instead used for recording the detected optical photons. Most functionality is similar to the Germanium output scheme with a few exceptions.

  • Unlike for germanium detectors the distance to the detector surface is not calculated,

  • The stacking possibility for optical tracks is not implemented,

  • The velocity of the particles can be saved using the /RMG/Output/Scintillator/StoreParticleVelocities command.

Data reduction methods

Often Geant4 takes steps much shorter than those that are meaningful in a HPGe or a scintillation detector. For example the typical dimension of charge clouds produced by interactions in germanium are 1-2 mm, so we are not sensitive to tracking at the micrometer level. To reduce the file size while retaining the useful information for computing observables of interest we have implemented some “pre-clustering” routines. These routines combine together steps that are very close together.

Note

The aim of this (pre)-clustering is only to make a minimal reduction of information which cannot be useful! Further, more aggressive clustering may be needed for some applications.

In order to have an efficient algorithm for pre-clustering we take use a “within-track” approach, this clusters only steps in the same G4Track, with some exceptions for very low energy tracks. In this way we only have to iterate through the steps in each event once. This also means the rows in our output are still interpretable with steps in the detector (just with a larger step length). The clustering is handled by the function RMGOutputTools::pre_cluster_hits(). This takes in the pointer to the original RMGDetectorHitsCollection returning a pointer to a new collection of clustered hits.

Note

  • The function returns a shared pointer to the hit collection, for some applications it may be necessary to extract also an unmanaged pointer, for example to make this collection look identical to that obtained directly from Geant4.

  • This design makes it easy to include additional clustering algorithms, a similar function just needs to be written.

Pre-clustering is enabled by default for the Scintillator and Germanium output schemes, it can be disabled with the command /RMG/Output/Germanium/Cluster/PreClusterOutputs and similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/PreClusterOutputs.

This clustering works by by first organise the hits by track id (the index of the G4Track within the event). Some processes in Geant4 produce a large number of secondary tracks due to atomic de-excitation, these tracks typically have a very low energy and range (however they are still produced since production cuts are not applied for most gamma interactions). Thus they are not expected to impact observables of interest. In many cases, after pre-clustering of high energy electrons, these tracks could form the majority of the output.

We implemented the possibility to merge these tracks prior to pre-clustering which can be enabled with /RMG/Output/Germanium/Cluster/CombineLowEnergyElectronTracks and similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/CombineLowEnergyElectronTracks.

Warning

This means in some cases there are steps in the output that are the combination of steps in different Geant4 tracks.

This command will select electron tracks with energy lower than a threshold, which is by default 10 keV, but can be changed with /RMG/Output/Germanium/Cluster/ElectronTrackEnergyThreshold.

and similar for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/ElectronTrackEnergyThreshold. For each track, we search for tracks which have a first pre-step point within the cluster radius of the first pre-step point of the low energy track. The low energy track is then merged with the neighbour track with the highest energy. In addition, Geant4 will sometimes associated some deposited energy with gamma tracks (due to atomic binding energy), optionally the user can request instead redistributing this energy to the secondary electron tracks with /RMG/Output/Germanium/Cluster/RedistributeGammaEnergy.

this then means the gamma tracks would not have energy deposits and do not need to be written out in the output file (unless this is explicitly requested). Or similarly for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/RedistributeGammaEnergy.

After these two pre-processing steps the pre-clustering proceeds by looping through the steps in each track. For each step the distance to the first step in the current cluster is calculated, if this distance is less than the user defined distance, and the time difference is less than the time threshold, the step is added to the current cluster.

The distance/time thresholds used for pre-clustering can be set with the commands /RMG/Output/Germanium/Cluster/PreClusterDistance and /RMG/Output/Germanium/Cluster/PreClusterTimeThreshold, and similar for the Scintillator output scheme: /RMG/Output/Scintillator/Cluster/PreClusterDistance and /RMG/Output/Scintillator/Cluster/PreClusterTimeThreshold

For Germanium detectors, where the surface region has substantially different properties to the bulk, we give the possibility to cluster with a different threshold for the surface region of the detector. This is by default the region within 2 mm of the detector surface but can be changed with /RMG/Output/Germanium/Cluster/SurfaceThickness. Then, a threshold can be set specifically for this region with /RMG/Output/Germanium/Cluster/PreClusterDistanceSurface. This will apply this threshold for any step where the distance to surface is less than the surface thickness. With this option a new cluster will also be formed if a step moves from the surface to bulk region of the germanium (or vice-versa).

Note

By default pre-clustering is performed for both the Germanium and Scintillator output schemes with 50 \(\mu\)m distance for Germanium. By default clustering is not applied to the surface for Germanium (within the surface thickness set by default as 2 mm). For the Scintillator output scheme we use 500 \(\mu\) m cluster distance by default. For both outputs a 10\(\mu\) m time threshold is used by default.

These options provide a sophisticated mechanism for handling the surface of HPGe detectors!

For each cluster, we then compute an “effective” step:

  • the time, pre-step position, distance to surface, velocity is taken from the first step.

  • the post-step position, distance are evalauted from the last step

  • while the energy deposit is summed over all steps.

  • the average of the pre-step position and post-step position is computed.

All other fields are constant within a track and are taken from the first step.

Note

In this way the output still represents a step, just with a longer effective step length.