bnmf-algs
Functions
bnmf_algs::cuda::kernel Namespace Reference

Namespace containing CUDA kernel and device functions. More...

Functions

template<typename Scalar >
__global__ void sum_tensor3D (cudaPitchedPtr tensor, Scalar *out, size_t out_pitch, size_t axis, size_t n_rows, size_t n_cols, size_t n_layers)
 Sum the given 3D tensor along the given axis and write the results to the corresponding index of the given 2D pitched matrix memory. More...
 

Detailed Description

Namespace containing CUDA kernel and device functions.

Function Documentation

template<typename Scalar >
__global__ void bnmf_algs::cuda::kernel::sum_tensor3D ( cudaPitchedPtr  tensor,
Scalar *  out,
size_t  out_pitch,
size_t  axis,
size_t  n_rows,
size_t  n_cols,
size_t  n_layers 
)

Sum the given 3D tensor along the given axis and write the results to the corresponding index of the given 2D pitched matrix memory.

This CUDA kernel computes the sum of a 3D tensor and writes the results to the given 2D memory. Both the given tensor and the matrix object must be row-major. Sum operation is performed by assigning a thread to each element of the tensor face along the sum axis. For example, if we are summing along the 2nd dimension, a thread is assigned to all the elements to the matrix formed from the 0th and 1st dimension. Then, each thread computes the sum in linear time and writes the result to their corresponding entry in the output matrix.

CUDA grid and block dimensions must be chosen so that each thread is assigned the corresponding element of the 0th face of the tensor along the sum axis. Additionally, n_rows, n_cols and n_layers parameters must be set according to the sum axis. n_rows is the number of rows of the tensor face . n_cols is the number of columns of the tensor face. n_layers is the depth of the sum. In these definitions, tensor face can be visualized in 3D space as the 2D plane that is orthogonal to the sum axis vector.

Template Parameters
ScalarType of the tensor and matrix entries.
Parameters
tensorCUDA pitched pointer pointing to a 3D pitched memory.
outPointer to a 2D CUDA pitched matrix memory. Dimensions of the output matrix must be chosen according to the sum axis.
out_pitchPitch of the output 2D CUDA pitched matrix memory.
axisSum axis. Must be 0, 1 or 2.
n_rowsNumber of rows of the tensor face.
n_colsNumber of columns of the tensor face.
n_layersDepth of the sum, i.e. number of elements that must be summed up to compute a single entry of the output matrix.