cusparse sparse-dense multiplication x4 larger cost over x100 more time - cuda

Using cusparse, I first tried a sparse-dense multiplication with the following sizes:
C1 [8692 x 8692] = A1 [8692 x 7000 sparse] x B1 [7000 x 8692]
It takes only 0.3 seconds. Then I did another one with the following sizes:
C2 [8820 x 8820] = A2 [8820 x 32000 sparse] x B2 [32000 x 8820]
The time it takes varies depending on what's in the sparse matrix, but it ranges from 30 seconds to 90 seconds. Is there anything I can do to accelerate it? I can chop the matrices in different ways if that can reduce the running time, but I'm not sure what the performance issues are here.
The sparse matrices A1 and A2 are stored in CSR formats and they do have a bad sparsity pattern, but they are equally bad. The two figures below show where the non-zero elements are in A1 and A2 respectively. The non-zero elements per column in both cases are controlled to be fixed at 127.

From the sparsity pattern of the matrix, you should split the matrix A1 in 2 parts with a matrix A11 containing about the first 8000 rows and A12 the remaining rows and use the csrmv twice. This way, cusparse will choose a better heuristic for the numbers of threads per row.
You should also consider using the new version in CUSPARSE 6.0 csrmv2 with the transpose case . You would need to transpose B first ( using cublasgeam ) and do :
C = A1 * (B')'
The transpose case is much better because the access to B are all coalesced.
Another option would be to densify A1 ( using cusparsecsr2dense ) and use cublas

Related

How to calculate Moran's I and GWR when given duplicated factor data

I'm trying to do the statistical analysis by using the Moran's I, but it leads me into a serious problem.
Suppose the data look like:
y
indep1
indep2
coord_x
coord_y
District
y1
indep1 1
indep2 1
coord_x 1
coord_y 1
A
y2
indep1 2
indep2 2
coord_x 1
coord_y 1
A
y3
indep1 3
indep2 3
coord_x 1
coord_y 1
A
y4
indep1 4
indep2 4
coord_x 2
coord_y 2
B
y5
indep1 5
indep2 5
coord_x 2
coord_y 2
B
Note that the data is given as form of *.shp.
This dataframe has 2 unique districts, but total 5 rows of data. If we want to calculate the Moran's I, then the weighted matrix W is 2 by 2.
But If I run the code
ols <- lm(y~indep1+indep2, data=dataset)
We cannot calculate the
lm.morantest(ols, w) #w:weighted matrix
# returns "different length"
How can we solve that problem? IF the total number of data and the number of unique districts are different, then how can we check the spatial auto-correlation between the districts and how can we apply GWR(geographically weighted regression?) Any reference papers or your advises could be helpful.
Thank you for your help in advance.
I tried to calculate the moran's I by making the number our dependent variables same as the number of unique districts. It can be possible when aggregating the sum of dependent variables with respect to the districts.
library(reshape2)
y_agg <- dcast(shp#data, district~., value.var="dependent variable", sum)
y_agg <- y_agg$.
moran.test(y_agg, W)
But I think it is not the right way to analyze the spatial regression since all the other independent variables are ignored. How can I solve that problem? Is there any way for solving that problem without making the aggregated independent variables of my data?
Thank you.

Efficiently joining over interval ranges in SQL

Suppose I have two tables as follows (data taken from this SO post):
Table d1:
x start end
a 1 3
b 5 11
c 19 22
d 30 39
e 7 25
Table d2:
x pos
a 2
a 3
b 3
b 12
c 20
d 52
e 10
The first row in both tables are column headers. I'd like to extract all the rows in d2 where column x matches with d1 and pos1 falls within (including boundary values) d1's start and end columns. That is, I'd like the result:
x pos start end
a 2 1 3
a 3 1 3
c 20 19 22
e 10 7 25
The way I've seen this done so far is:
SELECT * FROM d1 JOIN d2 USING (x) WHERE pos BETWEEN start AND end
But what is not clear to me is if this operation is done as efficient as it can be (i.e., optimised internally). For example, computing the entire join first is not really a scalable approach IMHO (in terms of both speed and memory).
Are there any other efficient query optimisations (ex: using interval trees) or other algorithms that can handle ranges efficiently (again, in terms of both speed and memory) in SQL that I can make use of? It doesn't matter if it's using SQLite, PostgreSQL, mySQL etc..
What is the most efficient way to perform this operation in SQL?
Thank you very much.
Not sure how it all works out internally, but depending on the situation I would advice to play around with a table that 'rolls out' all the values from d1 and then join on that one. This way the query engine can pinpoint the right record 'exactly' instead of having to find a combination of boundaries that match the value being looked for.
e.g.
x value
a 1
a 2
a 3
b 5
b 6
b 7
b 8
b 9
b 10
b 11
c 19 etc..
given an index on the value column (**), this should be quite a bit faster than joining with the BETWEEN start AND end on the original d1 table IMHO.
Off course, each time you make changes to d1, you'll need to adjust the rolled out table too (trigger?). If this happens frequently you'll spend more time updating the rolled out table than you gained in the first place! Additionally, this might take quite a bit of (disk)space quickly if some of the intervals are really big; and also, this assumes we don't need to look for non-whole numbers (e.g. what if we look for the value 3.14 ?)
(You might consider experimenting with a unique one on (value, x) here...)

Number of samples for doing FFT

I have a set of 10006 samples which resembles 10 period s of a 50 hz signal which is sampled with 50 KHZ.
as you know the freqeuncy of bins are calculated via SF/N where SF is sampling frequency and N is the number of samples.
I want to have the magnitudes of the frequency in integer multiples of 5 HZ up to 9 KHZ (for example: 5 , 10 , ..., 1025, 1030...,8000, 80005..9000).
so if I do the fft with 10006 samples my frequency bins are not any more the integer multiples of 5 and instead they are integer multiples of 50000/10006.
and if I truncate my samples then i will have integer multiples of 5 Hz bins but my samples are not any more resembling exactly 10 periods which means I have leakge effec !
so I am wondering how to have exactlu 5 HZ bins and with out having the spectrum distorted by leakage effect ?!!
You can truncate and then window to reduce "leakage" effects. Or zero-pad to 20k points without windowing and resample the FFT result.

Binary and Bits

Here's a question I've come across:
Assume each X represents one bit, either 0 or 1. Consider the 8-bit unsigned binary numbers A = 1XXX XXXX and B = 0XXX XXXX. Which of the following are true (you may tick more than one answer):
A B > A
B A > 127
C Can't tell which one A or B is larger
D B < 127
E A > B
Explanations needed (0 understanding on this). Thanks!
The key to the answer is in the word unsigned. This means that the MSB (left most bit) is not being used to indicate the results sign. Processors perform mathematical operations such as add, subtract and comparison on numbers using twos compliment, this means that to know what the numeric value of a binary word is we must know if it is signed (can contain negative values) or unsigned (positive numbers only).
So in the above case the values are unsigned, which means A is always greater than B and that A has the MSB of an 8 bit value set to 1 so must be at least 128.
In the same way that we count in units of 10s binary works in units of two:
Binary
128 64 32 16 8 4 2 1
Decimal
1000 100 10 1
However if the binary value were signed the left most bit would be used to express positve (0) or negative (1) and when negative we need to invert the value and add one to get back to the (Negative) result.

ORDER BY two variable parts of a string

I'm stuck and don't know how to proceed further. How do I order my results accordingly?
10 x 2 ml
10 x 10 ml
4 x 20 ml
10 x 2 ml should come first because 2 ml is smaller than 10 ml.
And then order by the number that comes before the multiplication sign.
This is how I solved my own question:
ORDER BY SUBSTR(size, INSTR(size, 'x') + 2) + 0, size + 0
You could try this, but it's really ugly, especially if the tables are big and you need performance:
ORDER BY TRIM(REPLACE(REPLACE(field_name,CONCAT(SUBSTRING_INDEX(field_name,'x',1),'x'),''),'ml',''))
it replaces SUBSTRING_INDEX('ABCx123ml','x',1); //ABC and ml with blanks, triming it, leaving only the value needed for order...