How to multiply two sparse matrix using cuSparse? - cuda

cuSparse only has a function api for multiplying a sparse matrix with a dense matrix. How to do multiply operation for two sparse matrices using cuSparse or any other cuda liberary?

The current version of cuSPARSE (CUDA Toolkit v5.0) supports sparse matrix-sparse matrix multiplications using the cusparse<t>csrgemm functions.
For this routine, compute capability 2.0 or better is needed.

As I commented, the CUSP library is available for matrix multiplication. From the site:
Cusp is a library for sparse linear algebra and graph computations on CUDA. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.

Related

How to access sparse tensor core functionality in CUDA?

Tensor cores can be programmatically accessed through the WMMA interface in CUDA (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma and https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/) . Recently, in the Ampere generation of cards, Nvidia announced the ability to perform sparse tensor operations with sparse matrices, as seen here: https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
The format presented appears to take in pairs of elements and their order within four element segments (2 bit indices). However looking at the wmma documentation I can't find any mention of this, or how to access those special tensor core operations. This is not illuminated by the announcement page of this functionality either AFAICT.
How do I access sparse tensor core functionality in cuda?
The blog post in your question links to the following paper:
Accelerating Sparse Deep Neural Networks https://arxiv.org/pdf/2104.08378.pdf
In Section 3.2 it says
It is the application’s responsibility to ensure that the first operand is a matrix
stored in the compressed 2:4 format. cuSPARSELt and other libraries provide APIs for
compression and sparse math operations, while, starting in version 8.0, the TensorRT
SDK performs these functions for 2:4 sparse weights automatically. NVIDIA libraries
require that input dimensions of a sparse matrix multiplication be multiples of 16 and
32 for 16-bit (FP16/BF16) and 8b-integer formats, respectively.
Sparse tensor operations can manually be performed using ptx mma.sp which is explained in the ptx documentation Section 9.7.13.5 : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-for-sparse-mma

Efficient diagonalization of sparse matrices using GNU Octave

Does GNU Octave have any routine (e.g. Lanczos Algorithm) to find eigenvalues and eigenvectors of sparse matrices, which will be more efficient than the default eig?
If this is not yet available in Octave, is something similar available in MATLAB or Mathematica?
What about reading the manual:
https://octave.org/doc/v5.2.0/Sparse-Functions.html#Sparse-Functions
https://octave.org/doc/v5.2.0/Sparse-Linear-Algebra.html#index-eigs

cuSOLVER automatically parallelizes computation on many matrices?

I have to make same computations (for example, get eigenvalues of A1, A2, ...) on many(>10^15) matrices, so I want use threads as many as possible.
But I couldn't find cuBLAS or cuSOLVER codes stating number of threads. Does cuSOLVER automatically distribute resources and parallelize computations if I write code with for loop and cuSOLVER functions?
Or is there any cuSOLVER or cuBLAS API that I can control the number of threads and parallelize functions?
......
Does cuSOLVER automatically distribute resources and parallelize computations if I write code with for loop and cuSOLVER functions?
No.
Or is there any cuSOLVER or cuBLAS API that I can control the number of threads and parallelize functions?
No.
But, if you care to read the CUSOLVER documentation, you will see that there is a batched sparse QR factorization routine. This can be used to solve eigenvalue problems.

Eigenvalues and eigenvectors of a complex, non-symmetric matrix using CUDA

How do I use the CUDA cuSOLVER to find the eigenvalues and eigenvectors of a dense, (double precision) complex, non-symmetric matrix?
Looking at the documentation, there are CUDA routines and example code for solving a dense symmetric matrix, using 'syevd'. I've come across another GPU-enabled package, MAGMA, which has the relevant function (magma_zgeev).
Is it possible to find these eigenvalues/vectors using plain CUDA (SDK v8), or do I need an alternate library like MAGMA?
As of the CUDA 11 release, cuSolver continues to offer only routines for obtaining the eigenvalues of symmetric matrices. There are no non-symmetric eigensolvers in cuSolver.

Cholesky factorization in cublas

I am new to cuda programming. I want to perform Cholesky factorization for small matrics(8*8) .Is there any algorithm to get it by using CUBLAS batch functions
cuda version-6.5
Thank you
Your matrix sizes are too small to see any benefit from GPU.
However, the new cuda 7 version provides cuSolver module ( http://docs.nvidia.com/cuda/cusolver/index.html ) that can do what you ask. I suggest looking it up. Also, consider sticking to CPU and Eigen library.
Cholesky factorization in CUDA: http://www.ast.cam.ac.uk/~stg20/cuda/cholesky/