CUSP function to generate a matrix with random values - cuda

I was wondering if CUSP library provides a function that creates a matrix with a specific number of columns, rows, and any random values?
I found poisson5pt function but it doesn't return back a matrix with the dimensions I specify!
Thanks in advance

In the CUSP matrix gallery you will find random.h which almost does what you want:
template <class MatrixType>
void random(size_t num_rows, size_t num_cols, size_t num_samples, MatrixType& output)
This will produce a matrix of the dimensions you specify with the number of random locations you request filled with 1.
It would be trivial to modify that to use a random value rather than unity, although I don't understand why you would ever want such a matrix. It will not be guaranteed to have any of the properties you probably need in a test matrix if you have plans to use such a matrix in any linear algebra operations.

Related

How to transpose a sparse matrix in cuSparse?

I am trying to compute A^TA using cuSparse. A is a large but sparse matrix. The proper function to use based on the documentation is cusparseDcsrgemm2. However, this is one of the few cuSparse operations that doesn't support an optional built-in transpose for the input matrix. There's a line in the documentation that said
Only the NN version is supported. For other modes, the user has to
transpose A or B explicitly.
The problem is I couldn't find a function in cuSparse that can perform a transpose. I know I can transpose in CPU and copy it to the GPU but that will slow down the application. Am I missing something? What is the right way to use cuSparse to compute A^TA?
For matrices that are in CSR (or CSC) format:
The CSR sparse representation of a matrix has identical format/memory layout as the CSC sparse representation of its transpose.
Therefore, if we use the cusparse provided function to convert a CSR format matrix into a CSC format, that resultant CSC-format matrix is actually the same as the CSR representation of the transpose of the original matrix. Therefore this CSR-to-CSC conversion routine could be used to find the transpose of a CSR format sparse matrix. (It can similarly be used to find the transpose of a CSC format sparse matrix.)

Is it possible to get pointers to the real and imaginary parts of a cuDoubleComplex?

Consider a cuDoubleComplex array a in device memory. Is it possible get pointers to the real and imaginary parts of a without allocating and doing a deep copy into two new double arrays?
something like this:
real_a = //points to real part of a
imag_a = //points to imaginary part of a
instead of something like:
/*allocate real_a and imag_a here */
for(int j=0; j<numElements; j++){
real_a[j]= a[j].x;
imag_a[j]= a[j].y;
}
CUDA does have something like this for numbers, but not for arrays/pointers.
The reason is that I would like to be able to call cuBLAS D rather than Z functions on the real and imaginary parts separately. For example,
cublasDgemm(...,real_a,...,somearray,...,anotherarray,...)
Is it possible get pointers to the real and imaginary parts of a
without allocating and doing a deep copy into two new double arrays?
That can be done:
double* real_a = reinterpret_cast<double*>(&a[0].x); //points to real part of a
double* imag_a = reinterpret_cast<double*>(&a[0].y); //points to imaginary part of a
but note that you need to use a stride of 2 when accessing the pointers to get the correct real or imaginary elements.
The reason is that I would like to be able to call cuBLAS D rather
than Z functions on the real and imaginary parts separately.
This will work with BLAS functions which operate on your real or imaginary pointers as vectors, because those BLAS routines allow a stride to be passed (which must be two in this case).
For example,
cublasDgemm(...,real_a,...,somearray,...,anotherarray,...)
That won't work with the pointers you can directly get as I have shown here. BLAS functions which would treat the array as a matrix do support strided source and destination data, but that stride is applied to the start of each column with the flattened matrix, but not to elements within a column, which is what you would need to make this work correctly.

How can solve SVD from row-major matrix using cusolver gesvd function

I'm beginner for cuda. I want to try to solve svd for row-major matrix using cusolver API. but I'm confusing about leading dimension for matrix A.
I have a row-major matrix 100x10.(e.g, I have 100 data which is in the 10 dimensional space.)
As the CUDA documentation, cusolverDnDgesvd function needs lda parameter(leading dimenstion for matrix A). My matrix is row-major so I gave 10 to cusolver gesvd function. But function was not working. This function indicated that my lda parameter was wrong.
Ok, I gave 100 to cusolver gesvd function. Function was working but the results of function (U, S, Vt) seems to be wrong. I mean, I can't get the matrix A from USVt.
As my knowledge, cuSolver API assume all matrix is column-major.
If I changed my matrix into column-major, m is lower than n(10x100). But gesvd function only works for m >= n.
Yes, I'm in trouble. How can I solve this problem?
Row-major, col-major and leading dimension are concepts related to the storage. A matrix can be stored in either scheme, while representing the same mathematical matrix.
To get correct result, you could use cublasDgeam() to change your row-major 100x10 matrix into a col-major 100x10 matrix, which is equivalent to matrix transpose while keeing the storage order, before calling cusolver.
There are many sources talking about storage ordering,
https://en.wikipedia.org/wiki/Row-major_order
https://fgiesen.wordpress.com/2012/02/12/row-major-vs-column-major-row-vectors-vs-column-vectors/
https://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html
Confusion between C++ and OpenGL matrix order (row-major vs column-major)
as well as leading dimension
http://www.ibm.com/support/knowledgecenter/SSFHY8_5.3.0/com.ibm.cluster.essl.v5r3.essl100.doc/am5gr_leaddi.htm
You should google them.

How do I calculate variance of gpu_array?

I am trying to compute the variance of a 2D gpu_array. A reduction kernel sounds like a good idea:
http://documen.tician.de/pycuda/array.html
However, this documentation implies that reduction kernels just reduce 2 arrays into 1 array. How do I reduce a single 2D array into a single value?
I guess the first step is to define variance for this case. In matlab, the variance function on a 2D array returns a vector (1D-array) of values. But it sounds like you want a single-valued variance, so as others have already suggested, probably the first thing to do is to treat the 2D-array as 1D. In C we won't require any special steps to accomplish this. If you have a pointer to the array you can index into it as if it were a 1D array. I'm assuming you don't need help on how to handle a 2D array with a 1D index.
Now if it's the 1D variance you're after, I'm assuming a function like variance(x)=sum((x[i]-mean(x))^2) where the sum is over all i, is what you're after (based on my read of the wikipedia article ). We can break this down into 3 steps:
compute the mean (this is a classical reduction - one value is produced for the data set - sum all elements then divide by the number of elements)
compute the value (x[i]-mean)^2 for all i - this is an element by element operation producing an output data set equal in size (number of elements) to the input data set
compute the sum of the elements produced in step 2 - this is another classical reduction, as one value is produced for the entire data set.
Both steps 1 and 3 are classical reductions which are summing all elements of an array. Rather than cover that ground here, I'll point you to Mark Harris' excellent treatment of the topic as well as some CUDA sample code. For step 2, I'll bet you could figure out the kernel code on your own, but it would look something like this:
#include <math.h>
__global__ void var(float *input, float *output, unsigned N, float mean){
unsigned idx=threadIdx.x+(blockDim.x*blockIdx.x);
if (idx < N) output[idx] = __powf(input[idx]-mean, 2);
}
Note that you will probably want to combine the reductions and the above code into a single kernel.

Apply PCA on very large sparse matrix

I am doing a text classification task with R, and I obtain a document-term matrix with size 22490 by 120,000 (only 4 million non-zero entries, less than 1% entries). Now I want to reduce the dimensionality by utilizing PCA (Principal Component Analysis). Unfortunately, R cannot handle this huge matrix, so I store this sparse matrix in a file in the "Matrix Market Format", hoping to use some other techniques to do PCA.
So could anyone give me some hints for useful libraries (whatever the programming language), which could do PCA with this large-scale matrix with ease, or do a longhand PCA by myself, in other words, calculate the covariance matrix at first, and then calculate the eigenvalues and eigenvectors for the covariance matrix.
What I want is to calculate all PCs (120,000), and choose only the top N PCs, who accounts for 90% variance. Obviously, in this case, I have to give a threshold a priori to set some very tiny variance values to 0 (in the covariance matrix), otherwise, the covariance matrix will not be sparse and its size would be 120,000 by 120,000, which is impossible to handle with one single machine. Also, the loadings (eigenvectors) will be extremely large, and should be stored in sparse format.
Thanks very much for any help !
Note: I am using a machine with 24GB RAM and 8 cpu cores.
The Python toolkit scikit-learn has a few PCA variants, of which RandomizedPCA can handle sparse matrices in any of the formats supported by scipy.sparse. scipy.io.mmread should be able to parse the Matrix Market format (I never tried it, though).
Disclaimer: I'm on the scikit-learn development team.
EDIT: the sparse matrix support from RandomizedPCA has been deprecated in scikit-learn 0.14. TruncatedSVD should be used in its stead. See the documentation for details.
Instead of running PCA, you could try Latent Dirichlet Allocation (LDA), which decomposes the document-word matrix into a document-topic and topic-word matrix. Here is a link to an R implementation: http://cran.r-project.org/web/packages/lda/ - there are quite a few implementations out there, though if you google.
With LDA you need to specify a fixed number of topics (similar to principle components) in advance. A potentially better alternative is HDP-LDA (http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/npbayes-r21.tgz), which learns the number of topics that form a good representation of your corpus.
If you can fit our dataset in memory (which it seems like you can), then you also shouldn't have a problem running the LDA code.
As a number of people on the scicomp forum pointed out, there should be no need to compute all of the 120k principle components. Algorithms like http://en.wikipedia.org/wiki/Power_iteration calculate the largest eigenvalues of a matrix, and LDA algorithms will converge to a minimum-description-length representation of the data given the number of topics specified.
In R big.PCA of bigpca package http://cran.r-project.org/web/packages/bigpca/bigpca.pdf does the job.
text classification task
I resolved almost same problem using a technique for PCA of sparse matrix .
This technique can handle very large sparse matrix.
The result shows such simple PCA outperfoms word2vec.
It intends the simple PCA outperforms LDA.
I suppose you wouldn't be able to compute all principle components. But still you can obtain reduced dimension version of your dataset matrix. I've implemented a simple routine in MATLAB, which can easily be replicated in python.
Compute the covariance matrix of your input dataset, and convert it to a dense matrix. Assuming S is you input 120,000 * 22490 sparse matrix, this would be like:
Smul=full(S.'*S);
Sm=full(mean(S));
Sm2=120000*Sm.'*Sm;
Scov=Smul-Sm2;
Apply eigs function on the covariance matrix to obtain the first N dominant eigenvectors,
[V,D] = eigs(Scov,N);
And obtain pcs by projecting the zero centered matrix on eigenvectors,
Sr=(S-Sm)*V;
Sr is the reduced dimension version of S.