Efficient diagonalization of sparse matrices using GNU Octave - octave

Does GNU Octave have any routine (e.g. Lanczos Algorithm) to find eigenvalues and eigenvectors of sparse matrices, which will be more efficient than the default eig?
If this is not yet available in Octave, is something similar available in MATLAB or Mathematica?

What about reading the manual:
https://octave.org/doc/v5.2.0/Sparse-Functions.html#Sparse-Functions
https://octave.org/doc/v5.2.0/Sparse-Linear-Algebra.html#index-eigs

Related

How to access sparse tensor core functionality in CUDA?

Tensor cores can be programmatically accessed through the WMMA interface in CUDA (see https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma and https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/) . Recently, in the Ampere generation of cards, Nvidia announced the ability to perform sparse tensor operations with sparse matrices, as seen here: https://developer.nvidia.com/blog/accelerating-inference-with-sparsity-using-ampere-and-tensorrt/
The format presented appears to take in pairs of elements and their order within four element segments (2 bit indices). However looking at the wmma documentation I can't find any mention of this, or how to access those special tensor core operations. This is not illuminated by the announcement page of this functionality either AFAICT.
How do I access sparse tensor core functionality in cuda?
The blog post in your question links to the following paper:
Accelerating Sparse Deep Neural Networks https://arxiv.org/pdf/2104.08378.pdf
In Section 3.2 it says
It is the application’s responsibility to ensure that the first operand is a matrix
stored in the compressed 2:4 format. cuSPARSELt and other libraries provide APIs for
compression and sparse math operations, while, starting in version 8.0, the TensorRT
SDK performs these functions for 2:4 sparse weights automatically. NVIDIA libraries
require that input dimensions of a sparse matrix multiplication be multiples of 16 and
32 for 16-bit (FP16/BF16) and 8b-integer formats, respectively.
Sparse tensor operations can manually be performed using ptx mma.sp which is explained in the ptx documentation Section 9.7.13.5 : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-for-sparse-mma

Is it possible to use HYB or ELL sparse matrix multiplication(SPMV) in CUSPARSE 11?

I am updating my CUDA code with sparse matrix multiplication(SPMV). I found the HYB or ELL format sparse matrix related functions in cuSparse 11 are removed. But for my practical problem, HYB format SPMV runs better than CSR format. Is there any method to reuse HYB format in cuSparse 11? Like including some other library? Or I must write these kernel code myself?
I know this is not a specific code issue. But I really need some advice.
Is it possible to use HYB or ELL sparse matrix multiplication(SPMV) in CUSPARSE 11?
No it is not possible. Those formats were deprecated in CUDA 10.x and are no longer supported.
Reformat your matrix storage to use a supported format. If you believe there is a performance issue, file a bug with a demonstrator.

Eigenvalues and eigenvectors of a complex, non-symmetric matrix using CUDA

How do I use the CUDA cuSOLVER to find the eigenvalues and eigenvectors of a dense, (double precision) complex, non-symmetric matrix?
Looking at the documentation, there are CUDA routines and example code for solving a dense symmetric matrix, using 'syevd'. I've come across another GPU-enabled package, MAGMA, which has the relevant function (magma_zgeev).
Is it possible to find these eigenvalues/vectors using plain CUDA (SDK v8), or do I need an alternate library like MAGMA?
As of the CUDA 11 release, cuSolver continues to offer only routines for obtaining the eigenvalues of symmetric matrices. There are no non-symmetric eigensolvers in cuSolver.

Special CUDA Double Precision trig functions for SFU

I was wondering how I would go about using __cos(x) (and respectively __sin(x)) in the kernel code with CUDA. I looked up in the CUDA manual that there is such a device function however when I implement it the compiler just says that I cannot call a host function in the device.
However, I found that there are two sister functions cosf(x) and __cosf(x) the latter of which runs on the SFU and is overall much faster than the original cosf(x) function. The compiler does not complain about the __cosf(x) function of course.
Is there a library I'm missing? Am I mistaken about this trig function?
As the SFU only supports certain single-precision operations, there are no double-precision __cos() and __sin() device functions. There are single-precision __cosf() and __sinf() device functions, as well as other functions detailed in table C-4 of the CUDA 4.2 Programming Manual.
I assume you are looking for faster alternatives to the double-precision versions of the standard math functions sin() and cos()? If sine and cosine of the same argument are needed, sincos() should be used for a significant performance boost. If the argument of sine or cosine is multiplied by π, you would want to use sinpi(), cospi(), or sincospi() instead, for even more performance. For example, sincospi() is very useful when implementing the Box-Muller algorithm for generating normally distributed random numbers. Also, check out the CUDA 5.0 preview for best possible performance (note that the preview provides alpha-release quality).

How to multiply two sparse matrix using cuSparse?

cuSparse only has a function api for multiplying a sparse matrix with a dense matrix. How to do multiply operation for two sparse matrices using cuSparse or any other cuda liberary?
The current version of cuSPARSE (CUDA Toolkit v5.0) supports sparse matrix-sparse matrix multiplications using the cusparse<t>csrgemm functions.
For this routine, compute capability 2.0 or better is needed.
As I commented, the CUSP library is available for matrix multiplication. From the site:
Cusp is a library for sparse linear algebra and graph computations on CUDA. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.