How to concurrently write and read CUDA array with unique incrementing values? - cuda

I have a shared memory array initialized as follows
#define UNDEFINED 0xffffffff
#define DEFINED 0xfffffffe
__shared__ unsigned int array[100];
__shared__ count;
// We have enough threads: blockDim.x > 100
array[threadIdx.x] = UNDEFINED;
// Initialize count
if (threadIdx.x == 0)
count = 0;
The threads have random access to array. When a thread access array, if it is UNDEFINED, it must write a unique value, count, to that element, and then read that value. If the array element is DEFINED or already has a unique value, it must just read the unique value out. The tricky part is that array and count must both be updated by only 1 thread. Atomic functions only update 1 variable not 2. Here's the method that I finally came up with for 1 thread to update both variables while blocking the other threads until it is done.
value = atomicCAS(&array[randomIndex], UNDEFINED, DEFINED);
if (value == UNDEFINED) {
value = atomicAdd(&count, 1);
array[randomIndex] = value;
}
// For case that value == DEFINED_SOURCe, wait for memory
// writes, then store value
__threadfence_block();
value = array[randomSource];
There is some tricky concurrency going on here. I'm not sure that this will work for all cases. Are there better suggestions or comments?

According to your description, the only time an array element will be written to is if it contains the value UNDEFINED. We can leverage this.
A thread will first do an atomicCAS operation on the desired array element. The atomicCAS will be configured to check for the UNDEFINED value. If it is present, it will replace it with DEFINED. If it is not present, it will not replace it.
Based on the return result from atomicCAS, the thread will know if the array element contained UNDEFINED or not. If it did, then the return result from the atomicCAS will be UNDEFINED, and the thread will then go and retrieve the desired unique value from count, and use that to modify the DEFINED value to the desired unique value.
we can do this in one line of code:
// assume idx contains the desired offset into array
if (atomicCAS(array+idx, UNDEFINED, DEFINED) == UNDEFINED) array[idx]=atomicAdd(&count, 1);
A more complete code could be like this:
value = DEFINED;
while (value == DEFINED){
value = atomicCAS(&array[randomIndex], UNDEFINED, DEFINED);
if (value == UNDEFINED) {
value = atomicAdd(&count, 1);
array[randomIndex] = value;}
}
// value now contains the unique value,
// either that was already present in array[randomIndex]
// or the value that was just written there

For have an array of incrementing values, use prefx-sum also called scan algorithms, based on binary tree ower threads. First over local block(shared memory in the name) ? then global over blocks, then add each summ back to each block.
Also it may be efficient for each block to read not one but some values, what are equal of physically "warp size" like 16 int values for example ( i apologize, because i have done this things long time ago and don't know proper sizes and proper names for this things in CUDA).
Ahh, btw,the final values, in case of equal incrementing, could be retrieved as the function from local or global thread.id, so you do not need scan at all

Related

Cuda/Thrust: remove_if doesn't change device_vector.size()?

I have a somewhat rather simple cuda question that seems like it should be a straight forward operation: removing elements from 1 array based on the value of a 2nd bool array. The steps I take are:
Create a device_vector of bools with the same size as the processed input array.
Call kernel which will set some of the elements from (1) to true
Call remove_if on input array with predicate using processed array from (2).
For each value in the bool array that is set to true, remove the corresponding element from the input array.
What I am seeing is that the input array isn't changed and I am not sure why ?
struct EntryWasDeleted
{
__device__ __host__
bool operator()(const bool ifDeleted)
{ return true; }
};
//This array has about 200-300 elements
//thrust::device_vector<SomeStruct> & arrayToDelete
thrust::device_vector<bool>* deletedEntries =
new thrust::device_vector<bool>(arrayToDelete.size(), false);
cuDeleteTestEntries<<<grid, block>>>( thrust::raw_pointer_cast(arrayToDelete.data()), countToDelete, heapAccess, thrust::raw_pointer_cast(deletedEntries->data()));
cudaDeviceSynchronize();
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(), deletedEntries->begin(), EntryWasDeleted());
//I am expecting testEntries to have 0 elements
thrust::host_vector<SomeStruct> testEntries = arrayToDelete;
for( int i = 0; i<testEntries.size(); i++)
{ printf( "%d", testEntries[i].someValue); }
In this sample, I am always returning true in the predicate for testing. However, when I do: testEntries = deletedEntries and output the members. I can validate that deletedEntries is properly filled in with trues and falses.
My expectation would be that testEntries would have 0 elements. But it doesn't and I get an output as if remove_if didn't do anything. ie: the output is showing ALL elements from the input array. I am not sure why? Is there a specific way to remove elements from a device_vector?
So you need to capture the iterator that is being returned from remove_if
thrust::device_vector<SomeStruct>::iterator endIterator =
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(),
deletedEntries->begin(), EntryWasDeleted());
Then when you copy data back to the host instead of using thrusts default assignment operator between host and device do this:
thrust::host_vector<SomeStruct> testEntries(arrayToDelete.begin(),endIterator);
As a side note working with arrays of primitives can often be much more efficient. Like can you store the index of your structs in an array instead and operate on those indexes?

golang return multiple values issue

I was wondering why this is valid go code:
func FindUserInfo(id string) (Info, bool) {
it, present := all[id]
return it, present
}
but this isn't
func FindUserInfo(id string) (Info, bool) {
return all[id]
}
is there a way to avoid the temporary variables?
To elaborate on my comment, the Effective Go mentions that the multi-value assignment from accessing a map key is called the "comma ok" pattern.
Sometimes you need to distinguish a missing entry from a zero value. Is there an entry for "UTC" or is that the empty string because it's not in the map at all? You can discriminate with a form of multiple assignment.
var seconds int
var ok bool
seconds, ok = timeZone[tz]
For obvious reasons this is called the “comma ok” idiom. In this example, if tz is present, seconds will be set appropriately and ok will be true; if not, seconds will be set to zero and ok will be false.
Playground demonstrating this
We can see that this differs from calling a regular function where the compiler would tell you that something is wrong:
package main
import "fmt"
func multiValueReturn() (int, int) {
return 0, 0
}
func main() {
fmt.Println(multiValueReturn)
asgn1, _ := multiValueReturn()
asgn2 := multiValueReturn()
}
On the playground this will output
# command-line-arguments
/tmp/sandbox592492597/main.go:14: multiple-value multiValueReturn() in single-value context
This gives us a hint that it may be something the compiler is doing. Searching the source code for "commaOk" gives us a few places to look, including types.unpack
At the time of writing this it this the method's godoc reads:
// unpack takes a getter get and a number of operands n. If n == 1, unpack
// calls the incoming getter for the first operand. If that operand is
// invalid, unpack returns (nil, 0, false). Otherwise, if that operand is a
// function call, or a comma-ok expression and allowCommaOk is set, the result
// is a new getter and operand count providing access to the function results,
// or comma-ok values, respectively. The third result value reports if it
// is indeed the comma-ok case. In all other cases, the incoming getter and
// operand count are returned unchanged, and the third result value is false.
//
// In other words, if there's exactly one operand that - after type-checking
// by calling get - stands for multiple operands, the resulting getter provides
// access to those operands instead.
//
// If the returned getter is called at most once for a given operand index i
// (including i == 0), that operand is guaranteed to cause only one call of
// the incoming getter with that i.
//
The key bits of this being that this method appears to determine whether or not something is actually a "comma ok" case.
Digging into that method tells us that it will check to see if the mode of the operands is indexing a map or if the mode is set to commaok (where this is defined does give us many hints on when it's used, but searching the source for assignments to commaok we can see it's used when getting a value from a channel and type assertions). Remember the bolded bit for later!
if x0.mode == mapindex || x0.mode == commaok {
// comma-ok value
if allowCommaOk {
a := [2]Type{x0.typ, Typ[UntypedBool]}
return func(x *operand, i int) {
x.mode = value
x.expr = x0.expr
x.typ = a[i]
}, 2, true
}
x0.mode = value
}
allowCommaOk is a parameter to the function. Checking out where unpack is called in that file we can see that all callers pass false as an argument. Searching the rest of the repository leads us to assignments.go in the Checker.initVars() method.
l := len(lhs)
get, r, commaOk := unpack(func(x *operand, i int) { check.expr(x, rhs[i]) }, len(rhs), l == 2 && !returnPos.IsValid())
Since it seems that we can only use the "comma ok" pattern to get two return values when doing a multi-value assignment this seems like the right place to look! In the above code the length of the left hand side is checked, and when unpack is called the allowCommaOk parameter is the result of l == 2 && !returnPos.IsValid(). The !returnPos.IsValid() is somewhat confusing here as that would mean that the position has no file or line information associated with it, but we'll just ignore that.
Further down in that method we've got:
var x operand
if commaOk {
var a [2]Type
for i := range a {
get(&x, i)
a[i] = check.initVar(lhs[i], &x, returnPos.IsValid())
}
check.recordCommaOkTypes(rhs[0], a)
return
}
So what does all of this tell us?
Since the unpack method takes an allowCommaOk parameter that's hardcoded to false everywhere except in assignment.go's Checker.initVars() method, we can probably assume that you will only ever get two values when doing an assignment and have two variables on the left-hand side.
The unpack method will determine whether or not you actually do get an ok value in return by checking if you are indexing a slice, grabbing a value from a channel, or doing a type assertion
Since you can only get the ok value when doing an assignment it looks like in your specific case you will always need to use variables
You may save a couple of key strokes by using named returns:
func FindUserInfo(id string) (i Info, ok bool) {
i, ok = all[id]
return
}
But apart from that, I don't think what you want is possible.
Simply put: the reason why your second example isn't valid Go code is because the language specification says so. ;)
Indexing a map only yields a secondary value in an assignment to two variables. Return statement is not an assignment.
An index expression on a map a of type map[K]V used in an assignment or initialization of the special form
v, ok = a[x]
v, ok := a[x]
var v, ok = a[x]
yields an additional untyped boolean value. The value of ok is true if the key x is present in the map, and false otherwise.
Furthermore, indexing a map is not a "single call to a multi-valued function", which is one of the three ways to return values from a function (the second one, the other two not being relevant here):
There are three ways to return values from a function with a result type:
The return value or values may be explicitly listed in the "return" statement. Each expression must be single-valued and assignable to the corresponding element of the function's result type.
The expression list in the "return" statement may be a single call to a multi-valued function. The effect is as if each value returned from that function were assigned to a temporary variable with the type of the respective value, followed by a "return" statement listing these variables, at which point the rules of the previous case apply.
The expression list may be empty if the function's result type specifies names for its result parameters. The result parameters act as ordinary local variables and the function may assign values to them as necessary. The "return" statement returns the values of these variables.
As for your actual question: the only way to avoid temporary variables would be using non-temporary variables, but usually that would be quite unwise - and probably not much of an optimization even when safe.
So, why doesn't the language specification allow this kind of special use of map indexing (or type assertion or channel receive, both of which can also utilize the "comma ok" idiom) in return statements? That's a good question. My guess: to keep the language specification simple.
I'm no Go expert but I believe you are getting compile time error when you are trying to return the array i.e. return all[id]. The reason could be because the functions return type is specially mentioned as (Info, bool) and when you are doing return all[id] it can't map the return type of all[id] to (Info, bool).
However the solution mentioned above, the variables being returned i and ok are the same that are mentioned in the return type of the function (i Info, ok bool) and hence the compiler knows what it's returning as opposed to just doing (i Info, ok bool).
By default, maps in golang return a single value when accessing a key
https://blog.golang.org/go-maps-in-action
Hence, return all[id] won't compile for a function that expects 2 return values.

Couchbase rereduce questions

Here is coding from Couchbase Document and I dont understand it
function(key, values, rereduce) {
var result = {total: 0, count: 0};
for(i=0; i < values.length; i++) {
if(rereduce) {
result.total = result.total + values[i].total;
result.count = result.count + values[i].count;
} else {
result.total = sum(values);
result.count = values.length;
}
}
return(result);
}
rereduce means the current function call has already done the reduce or not. right?
the first argument of the reduce function, key, when will it be used? I saw a numbers of examples, key seems to be unused
When does rereduce return true and the array size is more than 1?
Again, When does rereduce return is false and the array size is more than 1?
Rereduce means that the reduce function is called before and now it is called again with params that were returnd as a result in first reduce call. So if we devide it into two functions it will look like:
function reduce(k,v){
// ... doing something with map results
// instead of returning result we must call rereduce function)
rereduce(null, result)
}
function rereduce(k,v){
// do something with first reduce result
}
In most cases rereduce will happen when you have 2 or more servers in cluster or you have a lot of items in your database and the calculation is done on multiple "nodes" of the B*Tree. Example with 2 servers will be easier to understand:
Let's imagine that your map function returned pairs: [key1-1, key2-2, key6-6] from 1st server and [key5-5,key7-7] from 2nd. You'll get 2 reduce function calls with:
reduce([key1,key2,key6],[1,2,6],false) and reduce([key5,key7],[5,7],false). Then if we just return values (do nothing in reduce, just return values), the reduce function will be called with such params: reduce(null, [[1,2,6],[5,7]], true). Here values will be an array of results that came from first reduce calls.
On rereduce key will be null. Values will be an array of values as returned by a previous reduce() function.
Array size depends only on your data. It not depends on rereduce variable. Same answer for 4th question.
You can just try to run examples from Views basics and Views with reduce. I.e. you can modify reduce function to see what it returns on each step:
function reduce(k,v,r){
if (!r){
// let reduce function return only one value:
return 1;
} else {
// and lets see what values have came in "rereduce"
return v;
}
}
I am also confused by the example from the official couchbase website as well, and below is what i thought.
confusion: the reduce method signature
1) its written as
function (keys, values, rereduce)
2) its written as function(key, values, rereduce)
What exactly is the first param, key or keys
For all my understand from my previous exp on the map/reduce, the key the key that emit from the map function and there is a hidden shuffle method that will aggregate the value into a value list for the same key.
So the key param can be an array under the circumstances that you emit an array as key (which you can use group by level control the level of aggregation)
So i am not agree with the example that given by #m03geek, it should not be a list of different keys, correct me if i am wrong.
My assumption:
Both reduce and rereduce work on the SAME key only.
eg:
reduce is like:
1)reduce(keyA, [1,2,3]) this is precalculated, and stored in Btree structure
2) rereduce(keyA, [6, reduce(keyA, [4,5,6])]), 6 is the sum of [1,2,3] from the first reduce method, then we add a new doc into couchbase, which will trigger the reduce method again, instead of calculating the whole thing again as the original map/reduce will do, couchbase get the precalculated data out from the btree which is 6, and run reduce from the key-value pairs from the map method (which is triggered by adding a new doc), then run re-reduce on the precalculated value + new value.

Best way to cache results of method with multiple parameters - Object as key in Dictionary?

At the beginning of a method I want to check if the method is called with these exact parameters before, and if so, return the result that was returned back then.
At first, with one parameter, I used a Dictionary, but now I need to check 3 parameters (a String, an Object and a boolean).
I tried making a custom Object like so:
var cacheKey:Object = { identifier:identifier, type:type, someBoolean:someBoolean };
//if key already exists, return it (not working)
if (resultCache[cacheKey]) return resultCache[cacheKey];
//else: create result ...
//and save it in the cache
resultCache[cacheKey] = result;
But this doesn't work, because the seccond time the function is called, the new cacheKey is not the same object as the first, even though it's properties are the same.
So my question is: is there a datatype that will check the properties of the object used as key for a matching key?
And what else is my best option? Create a cache for the keys as well? :/
Note there are two aspects to the technical solution: equality comparison and indexing.
The Cliff Notes version:
It's easy to do custom equality comparison
In order to perform indexing, you need to know more than whether one object is equal to another -- you need to know which is object is "bigger" than the other.
If all of your properties are primitives you should squash them into a single string and use an Object to keep track of them (NOT a Dictionary).
If you need to compare some of the individual properties for reference equality you're going to have a write a function to determine which set of properties is bigger than the other, and then make your own collection class that uses the output of the comparison function to implement its own a binary search tree based indexing.
If the number of unique sets of arguments is in the several hundreds or less AND you do need reference comparison for your Object argument, just use an Array and the some method to do a naive comparison to all cached keys. Only you know how expensive your actual method is, so it's up to you to decide what lookup cost (which depends on the number of unique arguments provided to the function) is acceptable.
Equality comparison
To address equality comparison it is easy enough to write some code to compare objects for the values of their properties, rather than for reference equality. The following function enforces strict set comparison, so that both objects must contain exactly the same properties (no additional properties on either object allowed) with the same values:
public static propsEqual(obj1:Object, obj2:Object):Boolean {
for(key1:* in obj1) {
if(obj2[key1] === undefined)
return false;
if(obj2[key1] != obj2[key1])
return false;
}
for(key2:* in obj2)
if(obj1[key2] === undefined)
return false;
return true;
}
You could speed it up by eliminating the second for loop with the tradeoff that {A:1, B:2} will be deemed equal to {A:1, B:2, C:'An extra property'}.
Indexing
The problem with this in your case is that you lose the indexing that a Dictionary provides for reference equality or that an Object provides for string keys. You would have to compare each new set of function arguments to the entire list of previously seen arguments, such as using Array.some. I use the field currentArgs and the method to avoid generating a new closure every time.
private var cachedArgs:Array = [];
private var currentArgs:Object;
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
currentArgs = { stringArg:stringArg, objArg:objArg, boolArg:boolArg };
var iveSeenThisBefore:Boolean = cachedArgs.some(compareToCurrent);
if(!iveSeenThisBefore)
cachedArgs.push(currentArgs);
}
function compareToCurrent(obj:Object):Boolean {
return someUtil.propsEqual(obj, currentArgs);
}
This means comparison will be O(n) time, where n is the ever increasing number of unique sets of function arguments.
If all the arguments to your function are primitive, see the very similar question In AS3, where do you draw the line between Dictionary and ArrayCollection?. The title doesn't sound very similar but the solution in the accepted answer (yes I wrote it) addresses the exact same techinical issue -- using multiple primitive values as a single compound key. The basic gist in your case would be:
private var cachedArgs:Object = {};
function yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
var argKey:String = stringArg + objArg.toString() + (boolArg ? 'T' : 'F');
if(cachedArgs[argKey] === undefined)
cachedArgs[argKey] = _yourMethod(stringArg, objArg, boolArg);
return cachedArgs[argKey];
}
private function _yourMethod(stringArg:String, objArg:Object, boolArg:Boolean):* {
// Do stuff
return something;
}
If you really need to determine which reference is "bigger" than another (as the Dictionary does internally) you're going to have to wade into some ugly stuff, since Adobe has not yet provided any API to retrieve the "value" / "address" of a reference. The best thing I've found so far is this interesting hack: How can I get an instance's "memory location" in ActionScript?. Without doing a bunch of performance tests I don't know if using this hack to compare references will kill the advantages gained by binary search tree indexnig. Naturally it would depend on the number of keys.

What is the value of a dereferenced pointer

I realized that I had some confusion regarding the value of a dereferenced pointer, as I was reading a C text with the following code snippet:
int main()
{
int matrix[3][10]; // line 3: matrix is tentatively defined
int (* arrPtr)[10] = matrix; // line 4: arrPtr is defined and initialize
(*arrPtr)[0] = 5; // line 5: what is the value of (*arrPtr) ?
My confusion is in regards to the value of *arrPtr in the last line. This is my understanding upto that point.
Line 3, matrix is declard (tentatively defined) to be an array of 3 elements of type array of 10 elements of type int.
Line 4, arrPtr is defined as a pointer to an array of 10 elements of type int. It is also initialized as a ptr to an array of 10 elements (i.e. the first row of matrix)
Now Line 5, arrPtr is dereferenced, yielding the actual array, so it's type is array of 10 ints.
MY question: Why is the value of the array, just the address of the array and not in someway related to it's elements?
The value of the array variable matrix is the array, however it (easily) "degrades" into a pointer to its first item, which you then assign to arrPtr.
To see this, use &matrix (has type int (*)[3][10]) or sizeof matrix (equals sizeof(int) * 3 * 10).
Additionally, there's nothing tentative about that definition.
Edit: I missed the question hiding in the code comments: *arrPtr is an object of type int[10], so when you use [0] on it, you get the first item, to which you then assign 5.
Pointers and arrays are purposefully defined to behave similiarly, and this is sometimes confusing (before you learn the various quirks), but also extremely versatile and useful.
I think you need to clarify your question. If you mean what is the value of printf("%i", arrPtr); then it will be the address of the array. If you mean printf("$i",(*arrPtr)[0] ); then we've got a more meaty question.
In C, arrays are pretty much just a convenience thing. All an “array” variable is is a pointer to the start of a block of data; just as an int [] equates to an int*, i.e. the location in memory of an int, an int [][] is a double pointer, an int**, which points to the location in memory of... another pointer, which in turn points to an actual particular int.