Least squares with very large sparse matrix

Least squares with very large sparse matrix - octave

What is the best function to obtain a least squares minimum solution from a linear problem like Ax = b in Octave, with A very large but sparse?
x = A\b gives the error:
SparseQR: sparse matrix QR factorization filled" that I don't understand.

Related

Broadcasting mulitplication/division with sparse matrix in Octave

In Octave, one can do an element-wise multiplication between a full matrix and compatible (broadcastable) vector (i.e. MxN .* 1xN or MxN .* Mx1). But this does not seem to be applicable for sparse matrix.
Consider the following example,
v = (1:5)';
s = spdiags(v,0,5,5); % simple sparse matrix
s .* v; % <--- error 'nonconformant arguments (op1 is 5x5, op2 is 5x1)'
full(s) .* v; % <--- works but defeats sparse matrix
In the above simple case, with a diagonal sparse matrix, converting to full matrix can be avoided by converting v to diagonal matrix i.e.
s * diag(v); % <--- returns desired result
diag(v) * s; % <--- also results desired result
but for other cases, i.e. non-diagonal sparse matrix, it gets unnecessarily complicated by operand-order.
Is there a trick to doing broadcastable operation with sparse matrix? ...else is this a bug or feature (i.e. necessary)?

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Recently I learn DM_Script for TEM image processing
I needed Gaussian blur process and I found one whose name is 'Gaussian Blur' in http://www.dmscripting.com/recent_updates.html
This code implements Gaussian blur algorithm by multiplying the fast fourier transform(FFT) of source image by the FFT of Gaussian-kernel image and finally doing inverse fourier transform of it.
Here is the part of the code,
// Carry out the convolution in Fourier space
compleximage fftkernelimg:=realFFT(kernelimg) (-> FFT of Gaussian-kernel image)
compleximage FFTSource:=realfft(warpimg) (-> FFT of source image)
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
realimage invFFT:=realIFFT(FFTProduct)
The point I want to ask is this
compleximage FFTProduct:=FFTSource*fftkernelimg.modulus().sqrt()
Why does the FFT of Gaussian-kernel need '.modulus().sqrt()' for the convolution?
It is related to the fact that the fourier transform of a Gaussian function becomes another Gaussian function?
Or It is related to a sort of limitation of discrete fourier transform?
Please answer me
Thanks

This is related to the general precision limitation of any floating point numeric computing. (see f.e. here, or more in depth here)
A rotational (real-valued) Gaussian of stand.dev. sigma should be transformed into a 100% real-values rotational Gaussioan of 1/sigma. However, doing this numerically will show you deviations: Just try the following:
number sigma = 30
number A0 = 1
realimage first := RealImage( "First", 8, 256, 256 )
first = A0 * exp( - (iradius**2/(2*sigma*sigma) ))
first.showimage()
complexImage second := FFT(first)
second.Showimage()
image nonZeroImaginaryMask = ( 0 != second.Imaginary() )
nonZeroImaginaryMask.Showimage()
nonZeroImaginaryMask.SetLimits(0,1)
When you then multiply these complex images (before back-transferring) you are introducing even more errors. By using modulus, one ensures that the forward transformed kernel is purely real and hence a better "damping" curve.
A better implementation of a FFT filtering code would actually create the FFT(Gaussian) directly with a std.dev of 1/sigma, as this is the analytically correct result. Doing a FFT of the kernel only makes sense if the kernel (or its FFT) is not analytically known.
In general: When implementing any "maths" into a program code, it can pay hugely to think it through with numerical computation limits in the back of your head. Reduce actual computation whenever possible (i.e. compute analytically and use the result instead of relying on brute force numerical computation) and try to "reshape" equations when possible, f.e. avoid large sums over many small numbers, be careful about checks against exact numeric values, try to avoid expressions which are very sensitive on small numerica errors etc.

Adding up all the values for each frequency from FFTW

I have run a 3D Fourier Transform using FFTW (fftw_plan_dft_r2c_3d) and I would like to sum up the (log of the) values of the transform at every frequency, including the repeated frequencies that aren't actually stored in the output array (I understand the size is Nx x Ny x (Nz/2 + 1)). How do I do this without double counting?

Great question. Sorry of my answer is a little long-winded, I want to make sure I don’t make any mistakes. Here goes—
The sum-of-log-magnitudes of a complex-to-complex 3D FFT will be equal to the sum-of-log-magnitudes of a real-to-complex 3D FFT if you double-count all ‘slices’ (of the last dimension) of the latter that are missing from the former.
If Nz is even, that means double-count all slices other than the first and last slices.
If Nz is odd, double-count all slices except the first.
(This is because an even-length real-to-complex DFT includes the -π radians angular frequency (corresponding to a phasor of -1), whereas an odd-length one stops short of it. I never remember this pattern, so I always draw the N=4 vs N=3 phasors around the unit circle to remind myself whether odd or even includes -π rad.)
Here’s an experimental verification of the idea using Numpy/Python, whose notation for real-to-complex FFT I believe matches FFTW’s: generate an Nx = 10 by Ny = 20 by Nz = 8 real array. Compute its complex-to-complex 3D FFT (yielding an Nx by Ny by Nz complex array) and its real-to-complex 3D FFT (yielding Nx by Ny by (Nz/2+1) complex array). Verify that the sum-of-log-magnitudes of the former is the same as the sum-of-log-magnitudes of the latter if you double-count all but the first & last slices, since Nz is even.
The code:
import numpy as np
import numpy.fft as fft
Nx = 10
Ny = 20
Nz = 8
x = np.random.randn(Nx, Ny, Nz)
Xf = fft.fftn(x)
Xfr = fft.rfftn(x)
energyProduct1 = np.log10(np.abs(Xf)).sum()
lastSlice = -1 if Nz % 2 is 0 else None
energyProduct2 = np.log10(np.abs(np.dstack((Xfr, Xfr[:, :, 1:lastSlice])))).sum()
print('Difference: %g' % (energyProduct1 - energyProduct2))
# Difference: -4.54747e-13
If you re-run this with odd Nz, you will see that the difference between the complex-to-complex and the real-to-complex remains within machine precision of 0.
That np.dstack((Xfr, Xfr[:, :, 1:lastSlice)) (docs for dstack, fft.rfftn) stacks the rfftn output with its 2nd to penultimate slices in the 3rd dimension—penultimate because Nz is even, and you don’t want to double-count the 0 or -π DFT bins.
Of course, another way to do this is to compute the sum-of-log-magnitudes over the real-to-complex array, double it, then subtract the sum-of-log-magnitudes over the first slice and (if Nz is even) the last slice.
tl;dr Sum the log-magnitudes over the real-to-complex output. Double it. Subtract from this result the sum-log-magnitudes of the very first slice (in the 3rd dimension). If Nz is odd, you’re done. If Nz is even, also subtract the sum-log-magnitudes of the very last slice.

How can solve SVD from row-major matrix using cusolver gesvd function

I'm beginner for cuda. I want to try to solve svd for row-major matrix using cusolver API. but I'm confusing about leading dimension for matrix A.
I have a row-major matrix 100x10.(e.g, I have 100 data which is in the 10 dimensional space.)
As the CUDA documentation, cusolverDnDgesvd function needs lda parameter(leading dimenstion for matrix A). My matrix is row-major so I gave 10 to cusolver gesvd function. But function was not working. This function indicated that my lda parameter was wrong.
Ok, I gave 100 to cusolver gesvd function. Function was working but the results of function (U, S, Vt) seems to be wrong. I mean, I can't get the matrix A from USVt.
As my knowledge, cuSolver API assume all matrix is column-major.
If I changed my matrix into column-major, m is lower than n(10x100). But gesvd function only works for m >= n.
Yes, I'm in trouble. How can I solve this problem?

Row-major, col-major and leading dimension are concepts related to the storage. A matrix can be stored in either scheme, while representing the same mathematical matrix.
To get correct result, you could use cublasDgeam() to change your row-major 100x10 matrix into a col-major 100x10 matrix, which is equivalent to matrix transpose while keeing the storage order, before calling cusolver.
There are many sources talking about storage ordering,
https://en.wikipedia.org/wiki/Row-major_order
https://fgiesen.wordpress.com/2012/02/12/row-major-vs-column-major-row-vectors-vs-column-vectors/
https://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html
Confusion between C++ and OpenGL matrix order (row-major vs column-major)
as well as leading dimension
http://www.ibm.com/support/knowledgecenter/SSFHY8_5.3.0/com.ibm.cluster.essl.v5r3.essl100.doc/am5gr_leaddi.htm
You should google them.

zero padded FFT using FFTW

To interpolate a signal in frequency domain, one can pad zeros in time domain and do an FFT.
Suppose the number of elements in a given vector X is N and Y is the same as X but padded one sided with N zeros. Then the following give the same result.
$$\hat{x}(k)=\sum_{n=0}^{2N-1} Y(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1,$$
$$\hat{x}(k)=\sum_{n=0}^{ N-1} X(n)e^{i2\pi k n/2N},\quad k=0,...,2N-1.$$
Now if we use FFTW package, the first equation needs 2N memory space for the input vector while the second one needs only N memory space (I do not know if it is even possible to do in the existing FFTW package)! Also the computational complexity lowers from 2N^2log(2N) to 2N^2log(N). The problem is worse whenever we do a 2D FFT or 3D FFT. Is it possible to do the second approach using FFTW package? This is fairly easy to do in MATLAB though.

If x is a 2N signal padded with zeros above N , its DFT writes :
If k is even :
Hence, the coefficients of even frequencies arise from the N-point discrete Fourier transform of x(n).
if k is odd :
Hence, the coefficients of odd frequencies arise from the N-point discrete Fourier transform of x(n)exp(i*M_PI*n/N).
Thus, the discrete Fourier transform of a zero-padded 2N signal resumes to two DFT of signals of length N and fftw can be used to compute them.
The overall computation time will be 2*c*N*ln(N), where c is a constant. It is expected to be faster than the direct computation of the DFT c*2*N*ln(2*N). Remember that ln(2*N)=ln(2)+ln(N) : as N gets large, the extra work in case of direct computation is negligible compared to ln(N) : the trick becomes useless, even if the dimension is larger than one. It does not affect complexity.
Moreover, FFTW is really efficient, using lots of features of your PC if it is correctly installed, and it will be hard to do better than this in any case, even if the presented trick is used. Finally, if the input signal is real, you may use fftw_plan fftw_plan_dft_r2c_2d : only half the coefficients in the Fourier space are computed and stored.
Regarding memory requirements, if you are really short of memory, you can use the FFTW_IN_PLACE flag and use the same array for input and output. Yet, it is slightly slower.
The procedure presented above can be extended to compute the DFT of a LN signal of a N-point signal padded with (L-1)N zeros : it resumes to the computation of L DFTs of length N.
Do you have any reference showing how MATLAB handles and optimizes the DFT of padded signals compared to FFTW ?
EDIT : Further research about the 3D case :
The 3D DFT of a padded 3D signal x(n,m,p) is :
If k_n, k_m and k_p are even :
If k_n and k_m are even and k_p is odd :
...There are 8 cases.
So, the computation of the 3d dft of a 3D x of size NxNxN padded to 2Nx2Nx2N resumes to the computation of 8 3d dft of size NxNxN. Size a 3d dft is a combination of 3 1d dft, the total number of dft of size N is 3x8xNxN while the direct computation requires 3x(2N)*(2N) dft of size 2N. Computational time is 24cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...
Yet, instead of using a black-box 3d fft, let's compute the 8 dfts of size N at once, by performing the 1d dfts in each direction.
1d dft along N : 2 cases, NxN dfts => 2cN^3ln(N)
1d dft along M : 2 cases, 2NxN dfts => 4cN^3ln(N)
1d dft along P : 2 cases, 2Nx2N dfts => 8cN^3ln(N)
Hence, the total computation time is expected to be 14cN^3ln(N) against 24cN^3ln(2N) : a small gain is possible...Again fftw is fast...
Moreover, the computation of
requires only a single call to exp : first compute w=exp(I*M_PI/N) then update wn=wn*w; x(n)=x(n)*wn or use pow if precision becomes an issue.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Least squares with very large sparse matrix - octave

What is the best function to obtain a least squares minimum solution from a linear problem like Ax = b in Octave, with A very large but sparse? x = A\b gives the error: SparseQR: sparse matrix QR factorization filled" that I don't understand.

Related

Broadcasting mulitplication/division with sparse matrix in Octave

DM Script, why does the fourier transform of gaussian-kenel needs modulus

Adding up all the values for each frequency from FFTW

How can solve SVD from row-major matrix using cusolver gesvd function

zero padded FFT using FFTW

Categories

Resources