Using ARPACK with PARDISO - numerical-methods

This is a question similar to a question here.
I wonder is there already an open source implementation or example of ARPACK Eigensolver that works well with PARDISO solver and Intel mkl library?

I've implemented it already, in C#.
The idea is that one must convert the matrix format in CSR format. Then, one can use MKL to compute linear equation solving algorithm ( using pardiso solver), the matrix-vector manipulation.

Related

Is it possible to use HYB or ELL sparse matrix multiplication(SPMV) in CUSPARSE 11?

I am updating my CUDA code with sparse matrix multiplication(SPMV). I found the HYB or ELL format sparse matrix related functions in cuSparse 11 are removed. But for my practical problem, HYB format SPMV runs better than CSR format. Is there any method to reuse HYB format in cuSparse 11? Like including some other library? Or I must write these kernel code myself?
I know this is not a specific code issue. But I really need some advice.
Is it possible to use HYB or ELL sparse matrix multiplication(SPMV) in CUSPARSE 11?
No it is not possible. Those formats were deprecated in CUDA 10.x and are no longer supported.
Reformat your matrix storage to use a supported format. If you believe there is a performance issue, file a bug with a demonstrator.

Check if GPU is shared

When the GPU is shared with other processes (e.g. Xorg or other CUDA procs), a CUDA process better should not consume all remaining memory but dynamically grow its usage instead.
(There are various errors you might get indirectly from this, like Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR. But this question is not about that.)
(In TensorFlow, you would use allow_growth=True in the GPU options to accomplish this. But this question is not about that.)
Is there a simple way to check if the GPU is currently used by other processes? (I'm not asking whether it is configured to be used for exclusive access.)
I could parse the output nvidia-smi and look for other processes. But that seems somewhat hacky and maybe not so reliable, and not simple enough.
(My software is using TensorFlow, so if TensorFlow provides such a function, nice. But if not, I don't care if this would be a C API or Python function. I would prefer to avoid other external dependencies though, except those I'm anyway using, like CUDA itself, or TensorFlow. I'm not afraid to use ctypes. So consider this question language invariant.)
There is nvmlDeviceGetComputeRunningProcesses and nvmlDeviceGetGraphicsRunningProcesses. (Documentation.)
This is a C API, but I could use pynvml if I don't care about the extra dependency.
Example usage (via).

Does cuSolverDN or another CUDA library have a batched-version of QR decomposition for dense matrices to solve A*x = b?

I'm trying to solve A*x = b where A has complex values and is dense.
I used cusolverDnCgeqrf() method from cuSolverDN library to do the QR decomposition for one linear set of equations. However, I want to do this several times to speed up the processing.
Is there a "batched" version of this method? Or is there another CUDA library I can use?
You can use Magma batched QR:
http://icl.cs.utk.edu/projectsfiles/magma/doxygen/group__group__qr__batched.html#details
Or Nvidia batched library:
https://devblogs.nvidia.com/parallelforall/parallel-direct-solvers-with-cusolver-batched-qr/
I am not sure if there are python wrappers for them yet.
I want to add that batched version of many solvers are currently available, either through Magma or Nvidia.
There is not a single standard yet, but it is underway, it is discussed in batched blas workshops:
here
http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/Batched-BLAS-2017/
and here:
http://www.netlib.org/utk/people/JackDongarra/WEB-PAGES/Batched-BLAS-2016/
The draft is ready and I hope there would be a standard Batched BLAS soon.

convert cudaFuncSetCacheConfig to OpenCL way

I am trying to convert a CUDA benchmark to OpenCLbenchmark.
But I am currently stuck in converting cudaFuncSetCacheConfig to OpenCL.
As I searched on Google, I found that there is not a function in OpenCL, directly equivalent to cudaFuncSetCacheConfig.
So, I want to ask there is an alternative way to use that function in OpenCL?
It is not possible to use CUDA API calls to effect an OpenCL context, or vice versa. Therefore, unless a specific NVIDIA hardware feature is supported via a vendor specific extension (and in this case it apparently is not), there is no way to do this.

What is the best way to implement STFT (Short-time Fourier transform) in Julia

So, I'm wondering how to implement a STFT in Julia, possibly using a Hamming window. I can't find anything on the internet.
What is the best way to do it? I'd prefer not to use Python libraries, but pure native Julia if possible. Maybe it's a feature still being developed in Juila...?
Thanks!
I'm not aware of any native STFT implementation in Julia. As David stated in his comment, you will have to implement this yourself. This is fairly straightforward:
1) Break up your signal into short time sections
2) Apply your window of choice (in your case, Hamming)
3) Take the FFT using Julia's fft function.
All of the above are fairly standard mathematical operations and you will find a lot of references online. The below code generates a Hamming window if you don't have one already (watch out for Julia's 1-indexing when using online references as a lot of the signal processing reference material likes to use 0-indexing when describing window generation).
Wb = Array(Float64, N)
## Hamming Window Generation
for n in 1:N
Wb[n] = 0.54-0.46cos(2pi*(n-1)/N)
end
You could also use the Hamming window function from DSP.jl.
P.S If you are running Julia on OS X, check out the Julia interface to Apple's Accelerate framework. This provides a very fast Hamming window implementation, as well as convolution and elementwise multiplication functions that might be helpful.