cuda namespace contains functions that operate on nvidia GPUs using CUDA routines. More...

Namespaces
	kernel
	Namespace containing CUDA kernel and device functions.

Classes
class	DeviceMemory1D
	A wrapper template class around a contiguous array of T types laid out in device memory (GPU memory). More...

class	DeviceMemory2D
	A wrapper template class around 2D row-major pitched memory stored in device memory (GPU memory). More...

class	DeviceMemory3D
	A wrapper template class around 3D row-major pitched memory stored in device memory (GPU memory). More...

class	HostMemory1D
	A wrapper template class around a contiguous array of T types laid out in main memory (host memory). More...

class	HostMemory2D
	A wrapper template class around a row-major matrix type stored in main memory (host memory). More...

class	HostMemory3D
	A wrapper template class around a row-major 3D tensor stored in main memory (host memory). More...

Functions
template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase>
constexpr cudaMemcpyKind	infer_copy_kind ()
	Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D. More...

template<typename DstMemory1D , typename SrcMemory1D >
void	copy1D (DstMemory1D &destination, const SrcMemory1D &source)
	Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy. More...

template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D >
void	copy2D (DstPitchedMemory2D &destination, const SrcPitchedMemory2D &source)
	Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D. More...

template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D >
void	copy3D (DstPitchedMemory3D &destination, const SrcPitchedMemory3D &source)
	Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D. More...

template<typename T >
void	tensor_sums (const DeviceMemory3D< T > &tensor, std::array< DeviceMemory2D< T >, 3 > &result_arr)
	Sum the given 3D input tensor along each of its axes and return all 2D sum tensors. More...

template<typename Integer >
void	init (Integer device)
	Initialize CUDA runtime. More...

template<typename Integer >
Integer	idiv_ceil (Integer a, Integer b)
	Return ceiling of integer division between given parameters. More...

Detailed Description

cuda namespace contains functions that operate on nvidia GPUs using CUDA routines.

Function Documentation

template<typename DstMemory1D , typename SrcMemory1D >

void bnmf_algs::cuda::copy1D	(	DstMemory1D &	destination,
		const SrcMemory1D &	source
	)

Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy.

This function copies the memory wrapped around a HostMemory1D or DeviceMemory1D object to the memory wrapped around a HostMemory1D or DeviceMemory1D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters

DstMemory1D	Type of the destination memory. See HostMemory1D and DeviceMemory1D.
SrcMemory1D	Type of the source memory. See HostMemory1D and DeviceMemory1D.

Parameters

destination	Destination memory object.
source	Source memory object.

Exceptions

Static	assertion error if one of host/device to host/device enum values could not be inferred
Assertion	error if the copying procedure is not successful

template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D >

void bnmf_algs::cuda::copy2D	(	DstPitchedMemory2D &	destination,
		const SrcPitchedMemory2D &	source
	)

Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D.

This function copies the memory wrapped around a HostMemory2D or DeviceMemory2D object to the memory wrapped around a HostMemory2D or DeviceMemory2D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters

DstMemory2D	Type of the destination memory. See HostMemory2D and DeviceMemory2D.
SrcMemory2D	Type of the source memory. See HostMemory2D and DeviceMemory2D.

Parameters

destination	Destination memory object.
source	Source memory object.

Exceptions

Static	assertion error if one of host/device to host/device enum values could not be inferred
Assertion	error if the copying procedure is not successful

template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D >

void bnmf_algs::cuda::copy3D	(	DstPitchedMemory3D &	destination,
		const SrcPitchedMemory3D &	source
	)

Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D.

This function copies the memory wrapped around a HostMemory3D or DeviceMemory3D object to the memory wrapped around a HostMemory3D or DeviceMemory3D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.

See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.

Template Parameters

DstMemory3D	Type of the destination memory. See HostMemory3D and DeviceMemory3D.
SrcMemory3D	Type of the source memory. See HostMemory3D and DeviceMemory3D.

Parameters

destination	Destination memory object.
source	Source memory object.

Exceptions

Static	assertion error if one of host/device to host/device enum values could not be inferred
Assertion	error if the copying procedure is not successful

template<typename Integer >

Integer bnmf_algs::cuda::idiv_ceil	(	Integer	a,
		Integer	b
	)

Return ceiling of integer division between given parameters.

This function returns \(\ceil{\frac{a}{b}}\) for parameters a and b.

Template Parameters

Integer An integer type such as int, long, size_t, ...

Parameters

a	Nominator.
b	Denominator.

Returns: Ceiling of \(\frac{a}{b}\) as an integer.

template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase>

constexpr cudaMemcpyKind bnmf_algs::cuda::infer_copy_kind ( )

Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D.

Template Parameters

DstMemory	Destination memory type passed to copyXD.
SrcMemory	Source memory type passed to copyXD.
HostMemoryBase	A template template type representing the base of the host memories that are passed to a copyXD function. For copy1D, this should be HostMemory1D; for copy2D it should be HostMemory2D, and so on.
DeviceMemoryBase	A template template type representing the base of the device memories that are passed to a copyXD function. For copy1D, this should be DeviceMemory1D; for copy2D it should be DeviceMemory2D, and so on.

Returns: The inferred value of cudaMemcpyKind enum.

Remarks: If one of host/device to host/device values couldn't be inferred, the function returns cudaMemcpyDefault.

template<typename Integer >

void bnmf_algs::cuda::init ( Integer device )

Initialize CUDA runtime.

This function initializes CUDA runtime so that future CUDA library calls don't incur the cost of initializing the library.

Parameters

device ID of the GPU device to set.

template<typename T >

void bnmf_algs::cuda::tensor_sums	(	const DeviceMemory3D< T > &	tensor,
		std::array< DeviceMemory2D< T >, 3 > &	result_arr
	)

Sum the given 3D input tensor along each of its axes and return all 2D sum tensors.

This function computes the sum of the given 3D input tensor along each dimension by performing the computation on GPU using CUDA. Summing a 3D \(x \times y \times z\) tensor \(S\) along its first axis computes a new 2D tensor \(M\) of shape \(y \times z\) where

\[ M_{jk} = \sum_i S_{ijk} \]

Summing along the other axes is defined similarly. \(i^{th}\) entry of the output array contains the result of summing the input tensor \(S\) along its \((i + 1)^{th}\) axis.

The given 3D tensor must be previously copied to the GPU. Additionally, GPU memory for all sum tensors must be already allocated. cuda::DeviceMemory3D and cuda::DeviceMemory2D objects provide simple APIs for these tasks.

Template Parameters

T	Type of the entries of the input tensor.

Parameters

tensor	Input tensor to sum along each of its axes.
result_arr	An array of sum tensors \((M_{y \times z}, M_{x \times z}, M_{x \times y})\).

Namespaces

Classes

Functions

Detailed Description

Function Documentation