bnmf-algs
Namespaces | Classes | Functions
bnmf_algs::cuda Namespace Reference

cuda namespace contains functions that operate on nvidia GPUs using CUDA routines. More...

Namespaces

 kernel
 Namespace containing CUDA kernel and device functions.
 

Classes

class  DeviceMemory1D
 A wrapper template class around a contiguous array of T types laid out in device memory (GPU memory). More...
 
class  DeviceMemory2D
 A wrapper template class around 2D row-major pitched memory stored in device memory (GPU memory). More...
 
class  DeviceMemory3D
 A wrapper template class around 3D row-major pitched memory stored in device memory (GPU memory). More...
 
class  HostMemory1D
 A wrapper template class around a contiguous array of T types laid out in main memory (host memory). More...
 
class  HostMemory2D
 A wrapper template class around a row-major matrix type stored in main memory (host memory). More...
 
class  HostMemory3D
 A wrapper template class around a row-major 3D tensor stored in main memory (host memory). More...
 

Functions

template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase>
constexpr cudaMemcpyKind infer_copy_kind ()
 Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D. More...
 
template<typename DstMemory1D , typename SrcMemory1D >
void copy1D (DstMemory1D &destination, const SrcMemory1D &source)
 Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy. More...
 
template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D >
void copy2D (DstPitchedMemory2D &destination, const SrcPitchedMemory2D &source)
 Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D. More...
 
template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D >
void copy3D (DstPitchedMemory3D &destination, const SrcPitchedMemory3D &source)
 Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D. More...
 
template<typename T >
void tensor_sums (const DeviceMemory3D< T > &tensor, std::array< DeviceMemory2D< T >, 3 > &result_arr)
 Sum the given 3D input tensor along each of its axes and return all 2D sum tensors. More...
 
template<typename Integer >
void init (Integer device)
 Initialize CUDA runtime. More...
 
template<typename Integer >
Integer idiv_ceil (Integer a, Integer b)
 Return ceiling of integer division between given parameters. More...
 

Detailed Description

cuda namespace contains functions that operate on nvidia GPUs using CUDA routines.

Function Documentation

template<typename DstMemory1D , typename SrcMemory1D >
void bnmf_algs::cuda::copy1D ( DstMemory1D &  destination,
const SrcMemory1D &  source 
)

Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy.

This function copies the memory wrapped around a HostMemory1D or DeviceMemory1D object to the memory wrapped around a HostMemory1D or DeviceMemory1D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters
DstMemory1DType of the destination memory. See HostMemory1D and DeviceMemory1D.
SrcMemory1DType of the source memory. See HostMemory1D and DeviceMemory1D.
Parameters
destinationDestination memory object.
sourceSource memory object.
Exceptions
Staticassertion error if one of host/device to host/device enum values could not be inferred
Assertionerror if the copying procedure is not successful
template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D >
void bnmf_algs::cuda::copy2D ( DstPitchedMemory2D &  destination,
const SrcPitchedMemory2D &  source 
)

Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D.

This function copies the memory wrapped around a HostMemory2D or DeviceMemory2D object to the memory wrapped around a HostMemory2D or DeviceMemory2D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters
DstMemory2DType of the destination memory. See HostMemory2D and DeviceMemory2D.
SrcMemory2DType of the source memory. See HostMemory2D and DeviceMemory2D.
Parameters
destinationDestination memory object.
sourceSource memory object.
Exceptions
Staticassertion error if one of host/device to host/device enum values could not be inferred
Assertionerror if the copying procedure is not successful
template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D >
void bnmf_algs::cuda::copy3D ( DstPitchedMemory3D &  destination,
const SrcPitchedMemory3D &  source 
)

Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D.

This function copies the memory wrapped around a HostMemory3D or DeviceMemory3D object to the memory wrapped around a HostMemory3D or DeviceMemory3D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters
DstMemory3DType of the destination memory. See HostMemory3D and DeviceMemory3D.
SrcMemory3DType of the source memory. See HostMemory3D and DeviceMemory3D.
Parameters
destinationDestination memory object.
sourceSource memory object.
Exceptions
Staticassertion error if one of host/device to host/device enum values could not be inferred
Assertionerror if the copying procedure is not successful
template<typename Integer >
Integer bnmf_algs::cuda::idiv_ceil ( Integer  a,
Integer  b 
)

Return ceiling of integer division between given parameters.

This function returns \(\ceil{\frac{a}{b}}\) for parameters a and b.

Template Parameters
IntegerAn integer type such as int, long, size_t, ...
Parameters
aNominator.
bDenominator.
Returns
Ceiling of \(\frac{a}{b}\) as an integer.
template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase>
constexpr cudaMemcpyKind bnmf_algs::cuda::infer_copy_kind ( )

Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D.

Template Parameters
DstMemoryDestination memory type passed to copyXD.
SrcMemorySource memory type passed to copyXD.
HostMemoryBaseA template template type representing the base of the host memories that are passed to a copyXD function. For copy1D, this should be HostMemory1D; for copy2D it should be HostMemory2D, and so on.
DeviceMemoryBaseA template template type representing the base of the device memories that are passed to a copyXD function. For copy1D, this should be DeviceMemory1D; for copy2D it should be DeviceMemory2D, and so on.
Returns
The inferred value of cudaMemcpyKind enum.
Remarks
If one of host/device to host/device values couldn't be inferred, the function returns cudaMemcpyDefault.
template<typename Integer >
void bnmf_algs::cuda::init ( Integer  device)

Initialize CUDA runtime.

This function initializes CUDA runtime so that future CUDA library calls don't incur the cost of initializing the library.

Parameters
deviceID of the GPU device to set.
template<typename T >
void bnmf_algs::cuda::tensor_sums ( const DeviceMemory3D< T > &  tensor,
std::array< DeviceMemory2D< T >, 3 > &  result_arr 
)

Sum the given 3D input tensor along each of its axes and return all 2D sum tensors.

This function computes the sum of the given 3D input tensor along each dimension by performing the computation on GPU using CUDA. Summing a 3D \(x \times y \times z\) tensor \(S\) along its first axis computes a new 2D tensor \(M\) of shape \(y \times z\) where

\[ M_{jk} = \sum_i S_{ijk} \]

Summing along the other axes is defined similarly. \(i^{th}\) entry of the output array contains the result of summing the input tensor \(S\) along its \((i + 1)^{th}\) axis.

The given 3D tensor must be previously copied to the GPU. Additionally, GPU memory for all sum tensors must be already allocated. cuda::DeviceMemory3D and cuda::DeviceMemory2D objects provide simple APIs for these tasks.

Template Parameters
TType of the entries of the input tensor.
Parameters
tensorInput tensor to sum along each of its axes.
result_arrAn array of sum tensors \((M_{y \times z}, M_{x \times z}, M_{x \times y})\).