cuda namespace contains functions that operate on nvidia GPUs using CUDA routines.
More...
|
template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase> |
constexpr cudaMemcpyKind | infer_copy_kind () |
| Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D. More...
|
|
template<typename DstMemory1D , typename SrcMemory1D > |
void | copy1D (DstMemory1D &destination, const SrcMemory1D &source) |
| Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy. More...
|
|
template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D > |
void | copy2D (DstPitchedMemory2D &destination, const SrcPitchedMemory2D &source) |
| Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D. More...
|
|
template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D > |
void | copy3D (DstPitchedMemory3D &destination, const SrcPitchedMemory3D &source) |
| Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D. More...
|
|
template<typename T > |
void | tensor_sums (const DeviceMemory3D< T > &tensor, std::array< DeviceMemory2D< T >, 3 > &result_arr) |
| Sum the given 3D input tensor along each of its axes and return all 2D sum tensors. More...
|
|
template<typename Integer > |
void | init (Integer device) |
| Initialize CUDA runtime. More...
|
|
template<typename Integer > |
Integer | idiv_ceil (Integer a, Integer b) |
| Return ceiling of integer division between given parameters. More...
|
|
cuda namespace contains functions that operate on nvidia GPUs using CUDA routines.
template<typename DstMemory1D , typename SrcMemory1D >
void bnmf_algs::cuda::copy1D |
( |
DstMemory1D & |
destination, |
|
|
const SrcMemory1D & |
source |
|
) |
| |
Copy a contiguous 1D memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy.
This function copies the memory wrapped around a HostMemory1D or DeviceMemory1D object to the memory wrapped around a HostMemory1D or DeviceMemory1D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.
See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.
- Template Parameters
-
- Parameters
-
destination | Destination memory object. |
source | Source memory object. |
- Exceptions
-
Static | assertion error if one of host/device to host/device enum values could not be inferred |
Assertion | error if the copying procedure is not successful |
template<typename DstPitchedMemory2D , typename SrcPitchedMemory2D >
void bnmf_algs::cuda::copy2D |
( |
DstPitchedMemory2D & |
destination, |
|
|
const SrcPitchedMemory2D & |
source |
|
) |
| |
Copy a contiguous 2D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy2D.
This function copies the memory wrapped around a HostMemory2D or DeviceMemory2D object to the memory wrapped around a HostMemory2D or DeviceMemory2D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.
See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.
- Template Parameters
-
- Parameters
-
destination | Destination memory object. |
source | Source memory object. |
- Exceptions
-
Static | assertion error if one of host/device to host/device enum values could not be inferred |
Assertion | error if the copying procedure is not successful |
template<typename DstPitchedMemory3D , typename SrcPitchedMemory3D >
void bnmf_algs::cuda::copy3D |
( |
DstPitchedMemory3D & |
destination, |
|
|
const SrcPitchedMemory3D & |
source |
|
) |
| |
Copy a contiguous 3D pitched memory from a host/device memory to a host/device memory using CUDA function cudaMemcpy3D.
This function copies the memory wrapped around a HostMemory3D or DeviceMemory3D object to the memory wrapped around a HostMemory3D or DeviceMemory3D. The type of the memory copying to be performed is inferred from the types of memory objects at compile-time.
See cudaMemcpy function documentation to learn more about the memory copying procedure intrinsics.
- Template Parameters
-
- Parameters
-
destination | Destination memory object. |
source | Source memory object. |
- Exceptions
-
Static | assertion error if one of host/device to host/device enum values could not be inferred |
Assertion | error if the copying procedure is not successful |
template<typename DstMemory , typename SrcMemory , template< typename > class HostMemoryBase, template< typename > class DeviceMemoryBase>
constexpr cudaMemcpyKind bnmf_algs::cuda::infer_copy_kind |
( |
| ) |
|
Infer the value of cudaMemcpyKind enum to be used with CUDA copying functions from the types of the memory objects passed to copy1D, copy2D, copy3D.
- Template Parameters
-
DstMemory | Destination memory type passed to copyXD. |
SrcMemory | Source memory type passed to copyXD. |
HostMemoryBase | A template template type representing the base of the host memories that are passed to a copyXD function. For copy1D, this should be HostMemory1D; for copy2D it should be HostMemory2D, and so on. |
DeviceMemoryBase | A template template type representing the base of the device memories that are passed to a copyXD function. For copy1D, this should be DeviceMemory1D; for copy2D it should be DeviceMemory2D, and so on. |
- Returns
- The inferred value of cudaMemcpyKind enum.
Sum the given 3D input tensor along each of its axes and return all 2D sum tensors.
This function computes the sum of the given 3D input tensor along each dimension by performing the computation on GPU using CUDA. Summing a 3D \(x \times y \times z\) tensor \(S\) along its first axis computes a new 2D tensor \(M\) of shape \(y \times z\) where
\[ M_{jk} = \sum_i S_{ijk} \]
Summing along the other axes is defined similarly. \(i^{th}\) entry of the output array contains the result of summing the input tensor \(S\) along its \((i + 1)^{th}\) axis.
The given 3D tensor must be previously copied to the GPU. Additionally, GPU memory for all sum tensors must be already allocated. cuda::DeviceMemory3D and cuda::DeviceMemory2D objects provide simple APIs for these tasks.
- Template Parameters
-
T | Type of the entries of the input tensor. |
- Parameters
-
tensor | Input tensor to sum along each of its axes. |
result_arr | An array of sum tensors \((M_{y \times z}, M_{x \times z}, M_{x \times y})\). |