I am trying to compute A^TA using cuSparse. A is a large but sparse matrix. The proper function to use based on the documentation is cusparseDcsrgemm2. However, this is one of the few cuSparse operations that doesn't support an optional built-in transpose for the input matrix. There's a line in the documentation that said
Only the NN version is supported. For other modes, the user has to
transpose A or B explicitly.
The problem is I couldn't find a function in cuSparse that can perform a transpose. I know I can transpose in CPU and copy it to the GPU but that will slow down the application. Am I missing something? What is the right way to use cuSparse to compute A^TA?
For matrices that are in CSR (or CSC) format:
The CSR sparse representation of a matrix has identical format/memory layout as the CSC sparse representation of its transpose.
Therefore, if we use the cusparse provided function to convert a CSR format matrix into a CSC format, that resultant CSC-format matrix is actually the same as the CSR representation of the transpose of the original matrix. Therefore this CSR-to-CSC conversion routine could be used to find the transpose of a CSR format sparse matrix. (It can similarly be used to find the transpose of a CSC format sparse matrix.)
How to write two column vectors as an analytic convolution so that the discrete FFT may be used. MATLAB syntax is used.
Consider:
a set of vectors which, when sorted into a step function appears as any of the following:
[1,1,1,1,0,0,0,0], or [1,1,1,1,1,0,0,0], or [1,1,1,1,1,1,0,0]
(...the location at which the function "steps up" varies over members of this set)
The other is random vec=[1,0,1,0,1,1,1,0], and obviously both contain only 0s and 1s.
Is it possible to write these vectors as an analytic convolution? I would like the 1st, 2nd, 3rd, 4th... entries of the convolution to have values of:
sum(vec.*[1,0,0,0,0,0,0,0])
sum(vec.*[1,1,0,0,0,0,0,0])
sum(vec.*[1,1,1,0,0,0,0,0])
sum(vec.*[1,1,1,1,0,0,0,0])
...
sum(vec.*[1,1,1,1,1,1,1,1])
For speed, I am trying to avoid use of a for-loop. I cannot vectorize because this requires terabytes of RAM. (I work with vectors that are not of length 8, but rather length nearly a million).
The convolution theorem gives the function R from the convolution of functions L and 1/w from the Fourier transform F and its inverse F-1 as,
Clearly, the function 1/(w-w') in the convolution is from 1/w under F; it's as if you just set w'=0. But if I use analogous reasoning in my [1,1,1,1,0,0,0,0], I get either [1,1,1,1,1,1,1,1], the identity under .* in MATLAB or [0,0,0,0,0,0,0,0](a very boring result).
What is the mistake in reasoning I've made?
Suppose I have a real 2D matrix A(MxN), by using the FFTW3 r2c transform I take the matrix into Fourier space where B is the complex array B=fft(A(Mx(N/2+1))).
I know that B has Hermetian redundancy, so I perform some operations (left-right, up-down flips and complex conjugates) to recover the Hermetian symmetry to obtain the full complex matrix B'.
Now I perform some operations on the full complex matrix B' (such that it is no longer symmetric and want to take the inverse using c2r, how do I do this since the c2r transform is now expecting a symmetric half matrix?
Since B' is not symmetric, its inverse transform is not real. You cannot use c2r meaningfully on this matrix. Use the regular complex-to-complex inverse transform.
Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks
This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.
I want to take inverse fourier transform of a signal. I use this
fftw_plan_dft_c2r_1d
however, the output vector is required to be double. The question is isn't that the IFFT of a signal gives complex number result?
The definition of the Discrete Fourier Transform (DFT) allows for the time-domain input to be a complex valued signal and produces a frequency-domain result which is in general also complex valued. Correspondingly the inverse transform of that complex-valued frequency-domain result would produce the same (or at least within the available implementation's numerical precision) original time-domain signal. The result of the inverse transform could thus be complex.
Just the same, if we restrict time-domain inputs of the forward transform to real-valued signals then we would similarly expect the inverse transform of the corresponding frequency-domain spectrum to give us back the same original real-valued signal.
Note that the forward transform of those real-valued signal produces a frequency domain result that exhibits Hermitian symmetry (and conversely, if a frequency domain signal has Hermitian symmetry it must be the transform of a real valued signal). fftw_plan_dft_c2r_1d then computes the inverse transform (which is real-valued and stored in double) under the assumption that the frequency domain spectrum indeed has Hermitian symmetry.