kernels.cu

This page documents the functions defined in kernels.cu. kernels.cu contains the CUDA kernels used in the simulation and some extra functions.

Functions

Functions

__global__ void generate_photons(const struct of_geom *d_geom, const double *d_p, const time_t time, unsigned long long *generated_photons_arr, double *dnmax_arr, cudaTextureObject_t besselTexObj)

Kernel that will divide the work of generating superphotons among the GPU threads.

Initialize the curand state for each thread based on a unique seed derived from the current time and thread ID.
Calls init_zone to initialize the photon generation for each zone in the simulation grid.
The loop follows a grid-stride pattern to ensure all zones are covered. Each thread \( n \) processes a sequence of zones defined by:
\[ {\rm i_n} = n + k \cdot N_{\text{total}}\]
where \( {\rm i_n}\) is the current zone \( \rm i \) that thread \( \rm n \) is analyzing, \( n \) is the global thread ID, \( N_{\text{total}} \) is the total number of threads in the grid, and \( k \in \{0, 1, 2, \dots\} \).

Parameters:

d_geom – Pointer to the device geometry structure that carries the metric information \( g_{μν}, g^{μν},\ \rm{and}\ \sqrt{- g} \) .
d_p – Pointer to the device array containing the plasma parameters of the simulation.
time – Current time used to seed the random number generator for photon generation.
generated_photons_arr – Pointer to the device array that will store the number of photons generated per zone.
dnmax_arr – Pointer to the device array that will store the maximum value of \( \frac{dN}{d\ln\nu} \) per zone.
besselTexObj – CUDA texture object for Bessel function lookups.

Returns:

void

__device__ void init_zone(const int i, const int j, const int k, unsigned long long *n2gen, double *dnmax, const struct of_geom *d_geom, const double *d_p, const int d_Ns_par, curandState *localState, cudaTextureObject_t besselTexObj)

Initializes the photon generation parameters for a specific grid zone.

This function retrieves local fluid properties (density, temperature, and magnetic field) for the zone \((i, j, k)\) and calculates the expected number of superphotons to be generated based on the local thermal synchrotron emissivity.

The expected number of superphotons \( N_{s,i} \) in zone \( i \) is governed by:

\[ N_{s,i} = \Delta t \Delta^3 x \sqrt{-g} \int \int d\nu d\Omega \frac{1}{\omega} \frac{j_\nu}{h\nu} \]

where \( \sqrt{-g} \) is the metric determinant, \( j_\nu \) is the emissivity, and \( \omega \) is the superphoton weight.

The final number of superphotons \( n_{2gen} \) is determined stochastically to handle the fractional part of the expected count \( n_z \):

\[\begin{split} n_{2gen} = \begin{cases} \lfloor n_z \rfloor + 1 & \text{if } \text{rand} < (n_z \pmod 1) \\ \lfloor n_z \rfloor & \text{otherwise} \end{cases} \end{split}\]

Parameters:

i – Radial grid index.
j – Poloidal grid index.
k – Azimuthal grid index.
n2gen – Pointer to store the integer number of superphotons to be generated.
dnmax – Pointer to store the maximum of the distribution \( dN_s/d\ln\nu \), required for subsequent rejection sampling.
d_geom – Pointer to the device geometry structure that carries the metric information \( g_{μν}, g^{μν},\ \rm{and}\ \sqrt{- g} \) .
d_p – Pointer to the device simulation plasma properties.
d_Ns_par – Target superphoton count parameter.
localState – Pointer to the curandState for the current thread.
besselTexObj – Texture object for accelerated Modified Bessel function \( K_2 \) lookups.

Returns:

void

__global__ void sample_photons_batch(struct of_photonSOA ph_init, const struct of_geom *d_geom, const double *d_p, const unsigned long long *generated_photons_arr, const double *dnmax_arr, const int max_partition_ph, const unsigned long long photons_processed_sofar, const unsigned long long *index_to_ijk, cudaTextureObject_t besselTexObj)

Samples superphoton properties for the current batch using dynamic workload allocation.

This kernel performs the rejection sampling and initial coordinate assignment for superphotons. Unlike a standard grid-stride pattern where thread work is predetermined by ID, this kernel utilizes a dynamic workload allocation strategy since some zones may generate significantly more superphotons than others, leading to extreme load imbalance. To mitigate this, threads fetch work dynamically using an atomic counter. This ensures that as soon as a thread finishes sampling a photon, it immediately pulls the next available index, keeping GPU execution units saturated regardless of the variation in workload per zone.
The kernel maps the superphoton to the grid zone that generated it \( (i, j, k) \) by performing a binary search on the cumulative distribution array index_to_ijk. Once the zone_index is retrieved, it is decomposed into 3D grid coordinates:

Parameters:

ph_init – The Structure of Arrays (SoA) where sampled photon data is stored.
d_geom – Pointer to the device geometry structure that carries the metric information \( g_{μν}, g^{μν},\ \rm{and}\ \sqrt{- g} \) .
d_p – Pointer to the device simulation plasma properties.
generated_photons_arr – Array containing the number of photons to be generated per zone.
dnmax_arr – The maximum of the distribution \( dN_s/d\ln\nu \) for each zone, used for rejection sampling.
max_partition_ph – The maximum number of photons assigned to this batch.
photons_processed_sofar – Offset for the current batch relative to the total simulation count.
index_to_ijk – Cumulative sum array mapping global photon indices to zone indices.
besselTexObj – Texture object for Modified Bessel function \( K_2 \) lookups.

Returns:

void

__device__ void sample_zone_photon(const int i, const int j, const int k, const double dnmax, struct of_photonSOA ph, const struct of_geom *d_geom, const double *d_p, const int zone_flag, const unsigned long long ph_arr_index, double (*Econ)[NDIM], double (*Ecov)[NDIM], curandState *localState, cudaTextureObject_t besselTexObj)

Samples the physical properties of a single superphoton within a zone.

It uses a series of rejection sampling loops to determine the frequency and angular distribution of emission, then transforms the resulting momentum vector from the local fluid tetrad into the global coordinate frame.

Frequency Sampling: The photon frequency \( \nu \) is sampled using rejection sampling over the range \( [\nu_{min}, \nu_{max}] \). The selection is weighted by the local distribution \( dN_s/d\ln\nu \):

Angular Distribution: The polar angle \( \theta \) (relative to the local magnetic field) is sampled based on the angular dependence of the synchrotron emissivity \( j_\nu(\theta) \). A second rejection sampling loop ensures that:

\[ \text{rand} < \frac{j_\nu(\theta)}{j_{\nu, max}} \]

where \( j_{\nu, max} \) is the emissivity at \( \theta = \pi/2 \).

Tetrad Transformation: The momentum vector \( K \) is initially constructed in a local orthonormal tetrad frame aligned with the fluid four-velocity \( u^\mu \) and the magnetic field \( b^\mu \). It is then transformed to the coordinate frame using:

\[ k^\mu = e^\mu_{(\alpha)} k^{(\alpha)} \]

where \( e^\mu_{(\alpha)} \) is the tetrad basis built by make_tetrad.

Parameters:

i – Radial grid index.
j – Poloidal grid index.
k – Azimuthal grid index.
dnmax – The maximum of the distribution \( dN_s/d\ln\nu \) for this zone.
ph – The Structure of Arrays (SoA) used for photon storage.
d_geom – Pointer to the device geometry structure that carries the metric information \( g_{μν}, g^{μν},\ \rm{and}\ \sqrt{- g} \) .
d_p – Pointer to plasma properties array.
zone_flag – Boolean flag; if true, this is the first photon in this zone for this particular thread and a tetrad must be constructed, otherwise reuse the existing tetrad.
ph_arr_index – The specific index in the SoA to store this photon.
Econ – Tetrad basis (coordinate to orthonormal).
Ecov – Dual tetrad basis (orthonormal to coordinate).
localState – Pointer to the curandState for the current thread.
besselTexObj – Texture object for \(K_2\) Bessel function lookups.

Returns:

void

__global__ void track(struct of_photonSOA ph, cudaTextureObject_t d_p, const double *d_table_ptr, struct of_photonSOA scat_ofphoton, const unsigned long long max_partition_ph, cudaTextureObject_t besselTexObj)

Assign each superphoton to a thread through dynamic load balancing to be evolved through the geodesic.

To handle the variable computational cost of different geodesics (where some photons escape quickly and others undergo multiple scatterings), this kernel uses dynamic load balancing Threads fetch a global photon index \( n \) dynamically, ensuring that as soon as a thread finishes processing a photon, it immediately pulls the next available index. This keeps all GPU execution units busy regardless of the variation in workload per photon.

Note

Progress Monitoring: Thread with global index 0 acts as a monitor, calculating and printing the simulation progress to the console every 5% completion.

Parameters:

ph – The Structure of Arrays (SoA) containing the informations of the superphotons.
d_p – Pointer or Texture Object to the device plasma parameters, depending on the DO_NOT_USE_TEXTURE_MEMORY flag.
d_table_ptr – Pointer to the scattering probability and cross-section lookup tables.
scat_ofphoton – A secondary SoA used to store the relevant local and instantaneous information of the scattered photon to be processed later.
max_partition_ph – Total number of superphotons assigned to this specific tracking batch.
besselTexObj – CUDA texture object for accelerated Modified Bessel function \( K_2 \) lookups.

Returns:

void

__global__ void track_scat(struct of_photonSOA ph, cudaTextureObject_t d_p, const double *d_table_ptr, struct of_photonSOA scat_ofphoton, const int n, cudaTextureObject_t besselTexObj, unsigned long long round_num_scat_init, unsigned long long round_num_scat_end)

Assign each scattered superphoton to a thread through dynamic load balancing to be evolved through the geodesic.

To handle the variable computational cost of different geodesics (where some photons escape quickly and others undergo multiple scatterings), this kernel uses dynamic load balancing Threads fetch a global photon index \( n \) dynamically, ensuring that as soon as a thread finishes processing a photon, it immediately pulls the next available index. This keeps all GPU execution units busy regardless of the variation in workload per photon.

Note

Progress Monitoring: Thread with global index 0 acts as a monitor, calculating and printing the simulation progress to the console every 5% completion.

Parameters:

ph – The Structure of Arrays (SoA) containing the informations of the superphotons.
d_p – Pointer or Texture Object to the device plasma parameters, depending on the DO_NOT_USE_TEXTURE_MEMORY flag.
d_table_ptr – Pointer to the scattering probability and cross-section lookup tables.
scat_ofphoton – A secondary SoA used to store the relevant local and instantaneous information of the scattered photon to be processed later.
n – Current scattering layer index.
besselTexObj – CUDA texture object for accelerated Modified Bessel function \( K_2 \) lookups.
round_num_scat_init – The starting global photon index for this scattering batch.
round_num_scat_end – The ending global photon index for this scattering batch.

Returns:

void

__global__ void record(struct of_photonSOA ph, struct of_spectrum *d_spect, const unsigned long long max_partition_ph, const int nblocks)

Final processing kernel to record escaped superphotons into the observed spectrum by dynamically allocating each superphoton to each thread.

This kernel iterates through the processed full evolved batch of photons and identifies those that have satisfied the “escape” criteria. For each successful photon, it updates the global spectrum object.

Parameters:

ph – The Structure of Arrays (SoA) containing the final state of the photons.
d_spect – Pointer to the device spectrum structure where intensity is accumulated.
max_partition_ph – The total number of photons to be checked in this batch.
nblocks – The number of CUDA blocks used in the grid.

Returns:

void

__global__ void record_scattering(struct of_photonSOA ph, struct of_spectrum *d_spect, const unsigned long long max_partition_ph, const int nblocks, const int n)

Recording kernel for photons that have been created due to scattering events by dynamically allocating each superphoton to each thread.

This kernel iterates through the processed full evolved batch of photons and identifies those that have satisfied the “escape” criteria. For each successful photon, it updates the global spectrum object.

Note

Unlike the primary record kernel, this function must calculate the correct memory offsets into the scattering SoA, as photons from different scattering orders are stored sequentially.

Parameters:

ph – The Structure of Arrays (SoA) containing the scattered superphotons.
d_spect – Pointer to the device spectrum object for intensity accumulation.
max_partition_ph – Maximum photon capacity (included for signature consistency).
nblocks – Number of CUDA blocks in the grid.
n – The current scattering order (round) being processed.

Returns:

void

__host__ void mainFlowControl(time_t time, double *p)

The main host-side controller for the GPU-accelerated Monte Carlo simulation.

This function manages the high-level execution flow, coordinating memory management, data transfers, and kernel launches. It implements a multi-pass approach to handle large photon populations through batching and manages iterative scattering rounds.

Execution Workflow:

a) Initialization: Allocates device memory for plasma properties, geometry, scattering tables, and the final spectrum.

b) Generation Phase: Determines the total number of grid superphotons to be emitted based on local fluid conditions.

c) Batch Processing Loop: Divides the total photon count into manageable partitions to prevent GPU memory overflow. For each batch:

Sampling: Initializes photon positions and momenta.
Tracking: Integrates geodesics in the curved spacetime.
Recording: Bins escaped “primary” photons into the spectrum.

d) Scattering Iterations: Enters a secondary loop to process Inverse Compton scattering. It tracks and records photons through multiple scattering “rounds” until the population of scattered photons is depleted or the MAX_LAYER_SCA limit is reached.

e) Finalization: Transfers the accumulated spectrum back to the host and prints I/O output diagnostics.

Parameters:

time – Current epoch time used for seeding the random number generators.
p – Pointer to the host-side array of primitive fluid variables.

Returns:

void