Eigenvalues and eigenvectors of a complex, non-symmetric matrix using CUDA - cuda

How do I use the CUDA cuSOLVER to find the eigenvalues and eigenvectors of a dense, (double precision) complex, non-symmetric matrix?
Looking at the documentation, there are CUDA routines and example code for solving a dense symmetric matrix, using 'syevd'. I've come across another GPU-enabled package, MAGMA, which has the relevant function (magma_zgeev).
Is it possible to find these eigenvalues/vectors using plain CUDA (SDK v8), or do I need an alternate library like MAGMA?

As of the CUDA 11 release, cuSolver continues to offer only routines for obtaining the eigenvalues of symmetric matrices. There are no non-symmetric eigensolvers in cuSolver.

Related

How to access sparse tensor core functionality in CUDA?

Tensor cores can be programmatically accessed through the WMMA interface in CUDA (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma and https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/) . Recently, in the Ampere generation of cards, Nvidia announced the ability to perform sparse tensor operations with sparse matrices, as seen here: https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
The format presented appears to take in pairs of elements and their order within four element segments (2 bit indices). However looking at the wmma documentation I can't find any mention of this, or how to access those special tensor core operations. This is not illuminated by the announcement page of this functionality either AFAICT.
How do I access sparse tensor core functionality in cuda?
The blog post in your question links to the following paper:
Accelerating Sparse Deep Neural Networks https://arxiv.org/pdf/2104.08378.pdf
In Section 3.2 it says
It is the application’s responsibility to ensure that the first operand is a matrix
stored in the compressed 2:4 format. cuSPARSELt and other libraries provide APIs for
compression and sparse math operations, while, starting in version 8.0, the TensorRT
SDK performs these functions for 2:4 sparse weights automatically. NVIDIA libraries
require that input dimensions of a sparse matrix multiplication be multiples of 16 and
32 for 16-bit (FP16/BF16) and 8b-integer formats, respectively.
Sparse tensor operations can manually be performed using ptx mma.sp which is explained in the ptx documentation Section 9.7.13.5 : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-for-sparse-mma

Efficient diagonalization of sparse matrices using GNU Octave

Does GNU Octave have any routine (e.g. Lanczos Algorithm) to find eigenvalues and eigenvectors of sparse matrices, which will be more efficient than the default eig?
If this is not yet available in Octave, is something similar available in MATLAB or Mathematica?
What about reading the manual:
https://octave.org/doc/v5.2.0/Sparse-Functions.html#Sparse-Functions
https://octave.org/doc/v5.2.0/Sparse-Linear-Algebra.html#index-eigs

Does cuSolverDN or another CUDA library have a batched-version of QR decomposition for dense matrices to solve A*x = b?

I'm trying to solve A*x = b where A has complex values and is dense.
I used cusolverDnCgeqrf() method from cuSolverDN library to do the QR decomposition for one linear set of equations. However, I want to do this several times to speed up the processing.
Is there a "batched" version of this method? Or is there another CUDA library I can use?
You can use Magma batched QR:
http://icl.cs.utk.edu/projectsfiles/magma/doxygen/group__group__qr__batched.html#details
Or Nvidia batched library:
https://devblogs.nvidia.com/parallelforall/parallel-direct-solvers-with-cusolver-batched-qr/
I am not sure if there are python wrappers for them yet.
I want to add that batched version of many solvers are currently available, either through Magma or Nvidia.
There is not a single standard yet, but it is underway, it is discussed in batched blas workshops:
here
http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/Batched-BLAS-2017/
and here:
http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/Batched-BLAS-2016/
The draft is ready and I hope there would be a standard Batched BLAS soon.

Cholesky factorization in cublas

I am new to cuda programming. I want to perform Cholesky factorization for small matrics(8*8) .Is there any algorithm to get it by using CUBLAS batch functions
cuda version-6.5
Thank you
Your matrix sizes are too small to see any benefit from GPU.
However, the new cuda 7 version provides cuSolver module ( http://docs.nvidia.com/cuda/cusolver/index.html ) that can do what you ask. I suggest looking it up. Also, consider sticking to CPU and Eigen library.
Cholesky factorization in CUDA: http://www.ast.cam.ac.uk/~stg20/cuda/cholesky/

How to multiply two sparse matrix using cuSparse?

cuSparse only has a function api for multiplying a sparse matrix with a dense matrix. How to do multiply operation for two sparse matrices using cuSparse or any other cuda liberary?
The current version of cuSPARSE (CUDA Toolkit v5.0) supports sparse matrix-sparse matrix multiplications using the cusparse<t>csrgemm functions.
For this routine, compute capability 2.0 or better is needed.
As I commented, the CUSP library is available for matrix multiplication. From the site:
Cusp is a library for sparse linear algebra and graph computations on CUDA. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.