memory.cu
This section contains functions defined in the memory.cu file related to GPU memory allocation, host to device global data transfer, and serialized workload calculations.
Functions
Functions
-
__host__ void transferParams(cudaStream_t stream)
Synchronizes global host simulation parameters with device memory symbols.
Copies physics constants, grid dimensions, lookup tables and model-specific parameters from the host to the device using
cudaMemcpyToSymbol.- Parameters:
stream – CUDA stream to perform asynchronous memory transfers, allowing overlap with kernel execution for improved performance.
- Returns:
void
-
__host__ int setMaxBlocks()
Configures the optimal number of blocks to be used by CUDA kernels and generates GPU diagnostics.
It multiplies the maximum number of blocks per multiprocessor by the total number of streaming multiprocessors (SMs) to determine the total maximum number of blocks that can be launched concurrently on the GPU.
- Returns:
The calculated maximum number of blocks for kernel launches.
-
__host__ void cummulativePhotonsPerZone(unsigned long long *generated_photons_arr, unsigned long long *d_index_to_ijk)
Computes a cumulative sum of superphoton counts for zone-based photon sampling.
Generates a cumulative sum array where the value at zone i equals the total number of superphotons to be generated up to and including zone i-1.
- Parameters:
generated_photons_arr – Array containing the number of photons to generate in each zone.
d_index_to_ijk – Array storing the cumulative sum of photons per zone.
- Returns:
void
-
void symbolToDevice(const void *symbol, const void *src, size_t size, cudaStream_t stream)
-
void symbolFromDevice(void *dst, const void *symbol, size_t size, cudaStream_t stream)
-
__host__ unsigned long long photonsPerBatch(unsigned long long tot_nph, int *batch_divisions)
Calculates the batch size for GPU photon processing.
Evaluates the available GPU RAM memory to determine the optimal number of partitions. This prevents memory overflows by scaling the batch size against the footprint of the
of_photonstructure.- Parameters:
tot_nph – Total number of photons to be processed in the simulation.
batch_divisions – Calculated number of batches.
- Returns:
The number of photons assigned to each individual GPU batch.
-
__host__ void allocatePhotonData(struct of_photonSOA *ph, unsigned long long size)
Allocates device memory for photon data using a Structure of Arrays (SoA) layout.
Initializes a
of_photonSOAstructure by allocating separate memory buffers on the GPU for every photon property for both original and scattered photons at each batch.Using a Structure of Arrays (SoA) instead of an Array of Structures (AoS) is a critical optimization that enables coalesced memory access patterns, significantly increasing throughput for CUDA kernels.
- Parameters:
ph – Pointer to the
of_photonSOAstructure to be initialized.size – The number of photon slots to allocate in each array.
-
__host__ void freePhotonData(struct of_photonSOA *ph)
Deallocates device memory for a
of_photonSOAstructure.This function releases all individual GPU memory buffers that were previously allocated by
allocatePhotonData.- Parameters:
ph – Pointer to the
of_photonSOAstructure whose device members are to be freed.
-
__host__ void createdPTextureObj(cudaTextureObject_t *texObj, double *dP, cudaArray_t *cuArray)
Creates a 3D texture object from a 4D data grid for the plasma primitive properties.
Converts input
doubledata tofloat, uploads it to a 3D CUDA array, and initializes a texture object with point filtering and clamp addressing.Note
Maps 4D data into a 3D extent by combining the
nxandnydimensions.- Parameters:
texObj – Pointer to the resulting texture object that stores the plasma primitive properties.
dP – Input plasma properties.
cuArray – Pointer to the allocated 3D CUDA array resource.
-
__host__ void transferPhotonDataDevtoDev(struct of_photonSOA to, struct of_photonSOA from, unsigned long long size, cudaStream_t stream)
Function to transfer of_photonSoA structures between two different device structures.
- Parameters:
from – Pointer from where the memory is being transferred from.
to – Pointer to where the memory is being transferred to.
size – size of the arrays of the SoA
stream – CUDA stream to perform asynchronous memory transfers, allowing overlap with kernel execution for improved performance.
-
__host__ struct of_spectrum ***Malloc3D_Contiguous(int dim1, int dim2, int dim3)
Allocates a 3D array of structure of_spectrum on the host and initializes it to zero.
- Parameters:
dim1 – Size of the first dimension.
dim2 – Size of the second dimension.
dim3 – Size of the third dimension.
- Returns:
A pointer to the allocated spectrum 3D array.
-
void Free3D_Contiguous(struct of_spectrum ***ptr, int dim1)
Frees a previously allocated spectrum 3D array.
- Parameters:
ptr – Pointer to the 3D array to be freed.
dim1 – Size of the first dimension.
dim2 – Size of the second dimension.
dim3 – Size of the third dimension.
- Returns:
void