index in threadId in Cuda - cuda

Here is a short code I get from "Introduction to parallel computation" in udacity. The index in this code confuse me.
__global__ void use_shared_memory_GPU(float *array)
{
int i, index = threadIdx.x;
float average, sum=0.0f;
__shared__ float sh_arr[128];
sh_arr[index] = array[index];
__syncthreads();
// Now it begins to confuse me
for(i=0; i<index; i++) { sum += sh_arr[i]; } // what is the index here?
average = sum / (index + 1.0f); // what is the index here?
// why add 1.0f?
if(array[index] > average) {array[index] = average;}
}
The index is created as the Id for each thread, which I can understand. But when calculate the average, the index is used as number of threads. The first index used as a parallel computation id for arrays, while the second index is used just as common c. I repeat this procedure in my program, but the result doesn't repeat.
What's the trick behind the index? I print it in cuda-gdb, it just shows 0. Any detailed explanation for this?
Add one point. When calculate the average, why it adds 1.0f?

This code is computing prefix sums. A prefix sum for an array of values looks like this:
array: 1 2 4 3 5 7
prefix-sums: 1 3 7 10 15 22
averages: 1 2 2.33 2.25 3 3.67
index: 0 1 2 3 4 5
Each prefix sum is the sum of elements in the value array up to that position.
The code is also computing the "average" which is the prefix sum divided by the number of elements used to compute the sum.
In the code you have shown, each thread is computing a different element of the prefix-sum array (and a separate average).
Therefore, to compute the given average in each thread, we take the prefix-sum and divide by the index, but we must add 1 to the index, since adding 1 to the index gives us the number of elements used to compute the prefix-sum (and average) for that thread.

Related

In SQL - how can I count the number of times Bit(0), Bit(1), ... Bit(N) are high for a decimal number?

I am dealing with a table of decimal values that represent binary numbers. My goal is to count the number of times Bit(0), Bit(1),... Bit(n) are high.
For example, if a table entry is 5 this converts to '101' which can be done using the BIN() function.
What I would like to do is increment a variable 'bit0Count' and 'bit2Count'
I have looked into the BIT_COUNT() function however this would only return 2 for the above example.
Any insight would be greatly appreciated.
SELECT SUM(n & (1<<2) > 0) AS bit2Count FROM ...
The & operator is a bitwise AND.
1<<2 is a number with only 1 bit set, left-shifted by two places, so it is binary 100. Using bitwise AND against you column n is either binary 100 or binary 000.
Testing that with > 0 returns either 1 or 0, since in MySQL, boolean results are literally the integers 1 for true and 0 for false (note this is not standard in other implementations of SQL).
Then you can SUM() these 1's and 0's to get a count of the occurrences where the bit was set.
To tell if bit N is set, use 1 << N to create a mask for that bit and then use bitwise AND to test it. So (column & (1 << N)) != 0 will be 1 if bit N is set, 0 if it's not set.
To total these across rows, use the SUM() aggregation function.
If you need to do this frequently, you could define a stored function:
CREATE FUNCTION bit_set(UNSIGNED INT val, TINYINT which) DETERMINISTIC
RETURN (val & (1 << which)) != 0;

How can I make my select statement deterministically match only 1/n of my dataset?

I'm processing data from a MySQL table where each row has a UUID associated with it. EDIT: the "UUID" is in fact an MD5 hash (VARCHAR) of the job text.
My select query looks something like:
SELECT * FROM jobs ORDER BY priority DESC LIMIT 1
I am only running one worker node right now, but would like to scale it out to several nodes without altering my schema.
The issue is that the jobs take some time, and scaling out beyond one right now would introduce a race condition where several nodes are working on the same job before it completes and the row is updated.
Is there an elegant way to effectively "shard" the data on the client-side, by specifying some modifier config value per worker node? My first thought was to use the MOD function like this:
SELECT * FROM jobs WHERE UUID MOD 2 = 0 ORDER BY priority DESC LIMIT 1
and SELECT * FROM jobs WHERE UUID MOD 2 = 1 ORDER BY priority DESC LIMIT 1
In this case I would have two workers configured as "0" and "1". But this isn't giving me an even distribution (not sure why) and feels clunky. Is there a better way?
The problem is you're storing the ID as a hex string like acbd18db4cc2f85cedef654fccc4a4d8. MySQL will not convert the hex for you. Instead, if it starts with a letter you get 0. If it starts with a number, you get the starting numbers.
select '123abc' + 0 = 123
select 'abc123' + 0 = 0
6 out of 16 will start with a letter so they will all be 0 and 0 mod anything is 0. The remaining 10 of 16 will be some number so will be distributed properly, 5 of 16 will be 0, 5 of 16 will be 1. 6/16 + 5/16 = 69% will be 0 which is very close to your observed 72%.
To do this right we need to convert the 128 hex string into a 64 bit unsigned integer.
Slice off 64 bits with either left(uuid, 16) or right(uuid, 16).
Convert the hex (base 16) into decimal (base 10) using conv.
cast the result to an unsigned bigint. If we skip this step MySQL appears to use a float which loses accurracy.
select cast(conv(right(uuid, 16), 16, 10) as unsigned) mod 2
Beautiful.
That will only use 64 bits of the 128 bit checksum, but for this purpose that should be fine.
Note this technique works with an MD5 checksum because it is pseudorandom. It will not work with the default MySQL uuid() function which is a UUID version 1. UUIDv1 is a timestamp + a fixed ID and will always mod the same.
UUIDv4, which is a random number, will work.
Convert the hex string to decimal before modding:
where CONV(substring(uuid, 1, 8), 16, 10) mod 2 = 1
A reasonable hashing function should distribute evenly enough for this purpose.
Use substring to convert only a small part so the conv doesn't overflow decimal range and maybe behave badly. Any subset of bits should also be well distributed.

Better way to get number of bits different between two 128-bit MySQL binary values?

I'm using a MySQL binary column (tinyblob) to store a 128-bit perceptual image hash for about 200,000 images, and then doing a SELECT query to find images whose hash value is within a certain number of bits different (the hamming distance is less than a given delta).
To count the number of bits different, you can XOR the two values and then count the number of 1 bits in the result. MySQL has a handy function called BIT_COUNT that counts the number of 1 bits in an unsigned 64-bit integer.
So I'm currently using the following query to split the 128-bit hash into two 64-bit parts, doing the two XOR and BIT_COUNT operations, and adding the results to get the total bit delta:
SELECT asset_id, dhash8
FROM assets
WHERE
BIT_COUNT(CAST(CONV(HEX(SUBSTRING(dhash8, 1, 8)), 16, 10)
AS UNSIGNED) ^ :dhash8_0) + -- high part
BIT_COUNT(CAST(CONV(HEX(SUBSTRING(dhash8, 9, 8)), 16, 10)
AS UNSIGNED) ^ :dhash8_1) -- plus low part
<= :delta -- less than threshold?
But doing a substring, and especially converting it to a hex string and back is kind of annoying (and inefficient). Is there a better way to do this using MySQL?

Excel array formula to MySQL equivalent

I've been using a pretty simple array formula in excel to crunch some datasets but they're getting too large and absolutely destroying my computers performance whenever I update the calculations.
The excel sheet and MySQL database are laid out like so:
+-Timestamp-+-value-+
| 1340816430| .02 |
---------------------
x600,000 rows
Here's the excel formula:
{=AVERAGEIFS(B:B,A:A,"<"&A1+1000,A:A,">"&A1-1000)}
That returns the average of the values, and is the third column in the excel sheet. Is there any plausible way for me to create a MySQL query that performs a similar operation and returns a column with the values that would have been in the third column had I run excel's formula?
If you are happy using Excel formulas you can speed up this calculation a lot (factor of over 3000 on my system). Assuming that Column A contains the timestamps in ASCENDING ORDER and Column B the values (if not already sorted then use Excel Sort).
in Column C put =IFERROR(MATCH(A1-1000,$A:$A,1),1) and copy down. This calculates the row number of the row 1000 timestamp less.
in Column D put =IFERROR(MATCH(A1+1000,$A:$A,1),1048576) and copy down. This calculates the row number of the row 1000 timestamp more.
in column E put =AVERAGE(OFFSET(B1,C1-ROW(),0,D1-C1+1,1)) and copy down. This calculates the average of the subset range from the first row to the last row. On my system this full calculates 1000K rows in 20 seconds. The disadvantage of this method is that its volatile so will recalculate whenever you make a change, but I assume that you are in Manual calculation mode anyway.
MySQL code:
select
a.timestamp t1,
avg(x.value) average_value
from
mydata a inner join (
select
timestamp,
value
from mydata
) x
on x.timestamp between a.timestamp - 1000 and a.timestamp + 1000
group by
a.timestamp
order by
t1
;
I would like to think that without the Excel overhead this will perform far better, but I can't promise it will be lightning fast on 600k rows. You will definitely want to index Timestamp. See also SQL Fiddle I created.
#Peter You can stick with Excel if you want to. Just use http://xllarray.codeplex.com. The formula you want is =AVERAGE(ARRAY.MASK((A:A>A1 + 1000)*(A:A<A1 - 1000), B:B). 1MM rows on my junky laptop calculate in under 1 second. Be sure to Ctrl-Shift-Enter it as an array formula.
If you don't want to build the code, you can grab the add-in and help file off my SkyDrive: http://sdrv.ms/JtaMIV
#Charles. Ah, no. It is only for one formula. Misread the spec.
If you wanted to push the calculation into C++ and expose it as an xll, here is how you might do that:
#include <algorithm>
#include <numeric>
#include "xll/xll.h"
using namespace xll;
typedef traits<XLOPER12>::xword xword;
static AddIn12 xai_windowed_average(
L"?xll_windowed_average", XLL_FP12 XLL_FP12 XLL_FP12 XLL_DOUBLE12,
L"WINDOWED.AVERAGE", L"Time, Value, Window"
);
_FP12* WINAPI
xll_windowed_average(_FP12* pt, _FP12* pv, double dt)
{
#pragma XLLEXPORT
static xll::FP12 a(size(*pt), 1);
double* bt0 = &pt->array[0];
double* bv0 = &pv->array[0];
double* bt = std::lower_bound(begin(*pt), end(*pt), *bt0 - dt);
double* et = std::lower_bound(begin(*pt), end(*pt), *bt0 + dt);
for (xword i = 0; i < size(*pt); ++i) {
a[i] = (bt == et) ? 0 : std::accumulate(bv0 + (bt - bt0), bv0 + (et - bt0), 0)/(et - bt);
// update the window
bt = std::lower_bound(bt, end(*pt), pt->array[i] - dt);
et = std::lower_bound(bt, end(*pt), pt->array[i] + dt);
}
return a.get();
}

A homework about growth rate of function

Please order the function belows by growth rate
n ^ 1.5
n ^ 0.5 + log n
n log ^ 2 n
n log ( n ^ 2 )
n log log n
n ^ 2 + log n
n log n
n
ps:
Ordering by growth rate means, as n gets larger and larger, which function will eventually be higher in value than the others.
ps2. I have ordered most of the functions:
n , n log log n, n log n, n log^2 n, n log ( n ^ 2 ), n ^ 1.5
I just do not know how to order:
n ^ 2 + log n,
n ^ 0.5 + log n,
these 2 values
Can anyone help me?
Thank you
You can figure this out fairly easily by graphing the functions and seeing which ones get larger (find a graphing calculator, check out Maxima, or try graphing the functions on Wolfram Alpha). Or, or course, you just pick some large value of n and compare the various functions, but graphs can give a bit of a better picture.
The key to the answer you seek is that when you sum two functions, their combined "growth rate" is going to be exactly that of the one with the higher growth rate of the two. So, you now know the growth rates of these two functions, since you appear (from knowing the correct ordering of all the others) to know the proper ordering of the growth rates that are in play here.
Plugging in a large number is not the correct way to approach this!
Since you have the order of growth, then you can use the following rules http://faculty.ksu.edu.sa/Alsalih/CSC311_10_11_01/3.3_GrowthofFunctionsAndAsymptoticNotations.pdf
In all of those cases, you're dealing with pairs of functions that themselves have different growth rates.
With that in mind, only the larger one really matters, since it will be most dominant even with a sum. So in each of those function sums, which is the bigger one and how does it compare to the other ones on your larger list?
If you need to proof mathematically, you should try something like this.
If you have two functions, e.g.:
f1(n) = n log n
f2(n) = n
You can simply find the limit of f3(n) = f1(n)/f2(n) when n tends to infinity.
If the result is zero, then f2(n) has a greater growth rate than f1(n).
On the other hand, if the result is infinity then f1(n) has a greater growth rate than f2(n).
n0.5 (or n1/2) is the square root of n. So, it grows more slowly than n2.
let say n = 4 then we get
n ^ 2 + log n = 16.6020599913
n ^ 1.5 = 8
n = 4
n log ( n ^ 2 ) = 4.81
n ^ 0.5 + log n = 2.60205999133
n log n = 2.4
n log ^ 2 n = ?
n log log n = -0.8