CUDA: Scatter communication pattern - cuda

I am learning CUDA from the Udacity's course on parallel programming. In a quiz, they have a given a problem of sorting a pre-ranked variable(player's height). Since, it is a one-one correspondence between input and output array, should it not be a Map communication pattern instead of a Scatter?

CUDA makes no canonical definition of these terms, that I know of. Therefore my answer is merely a suggestion of how it might be or have been interpreted.
"Since, it is a one-one correspondence between input and output array"
This statement doesn't appear to be supported by the diagram, which shows gaps in the output array, which have no corresponding input point associated with them.
If a smaller set of values are distributed into a larger array (with resultant gaps in the output array, therefore, in which no input value corresponds to the gap location(s)), then a scatter might be used to describe that operation. Both scatters and maps have maps which describe where the input values go, but it might be that the instructor has defined scatter and map in such a way as to differentiate between these two cases, such as the following plausible definitions:
Scatter: one-to-one relationship from input to output (ie. unidirectional relationship). Every input location has a corresponding output location, but not every output location has a corresponding input location.
Map: one-to-one relationship between input and output (ie. bidirectional relationship). Every input location has a corresponding output location, and every output location has a corresponding input location.
Gather: one-to-one relationship from output to input (ie. unidirection relationship). Every output location has a corresponding input location, but not every input location has a corresponding output location.

The definition of each communication pattern (map, scatter, gather, etc.) varies slightly from one language/environment/context to another, but since I have followed that same Udacity course I'll try to explain that term as I understand it in the context of the course:
The Map operation calculates each output element as a function of its corresponding input element, i.e.:
output[tid] = foo(input[tid]);
The Gather pattern calculates each output element as a function of one or more (usually more) input elements, not necessarily the corresponding one (typically these are elements from a neighborhood). For example:
output[tid] = (input[tid-1] + input[tid+1]) / 2;
Lastly, the Scatter operation has each input element contribute to one or more (again, usually more) output elements. For instance,
atomicAdd( &(output[tid-1]), input[tid]);
atomicAdd( &(output[tid]), input[tid]);
atomicAdd( &(output[tid+1]), input[tid]);
The example given in the question is clearly not a Map, because each output is calculated from an input at a different location.
Also, it is hard to see how the same example can be a scatter, because each input element only causes one write to the output, but it is indeed a scatter because each input causes a write to an output whose location is determined by the input.
In other words, each CUDA thread processes an input element at the location associated with its tid(thread ID number), and calculates where to write the result. More usually a scatter would write on several places instead of only one, so this is a particular case that might as well be named differently.

Each player has 3 properties (name, height, rank).
So I think scatter is correct, because we should consider these three things to make output.
If player has only one property like rank,
then Map is correct I think.
reference: Parallel Communication Patterns Recap in this lecture
reference: map/reduce/gather/scatter with image

Related

How to convert a directed graph to its most minimal form?

I'm dealing with rooted, directed, potentially cyclic graphs. Each vertex in the graph has a label, which might or might not be unique. Edges do not have labels. The graph has a designated root vertex from which every vertex is reachable. The order of the edges outgoing from a vertex is relevant.
For my purposes, a vertex is equal to another vertex if they share the same label, and if their outgoing edges are also considered equal (and are in the same order). Two edges are equal if they have the same direction and if the vertices at their corresponding ends are equal.
Because of the equality rules above, a graph can contain multiple "sections" that are effectively equal. For example, in the graph below, there are two isomorphic sections containing vertices with labels {1, 2, 3, 4}. The root of the graph is vertex 0.
(source: graphonline.ru)
I need to be able to identify sections that are identical, and then remove all duplication, without changing the "meaning" of the graph (with regard to the equality rules above). Using the above example as input, I need to produce this:
(source: graphonline.ru)
Is there a known way of doing this within polynomial time?
The solution that ended up working was to essentially run the recursive equality check against every pair of vertices with the same label.
Let S = all pairs of vertices with the same label
For each s in S:
Compare the two vertices a and b in s by recursively comparing their children
If they compare as equal, take all edges in the graph pointing to b, and point them to a instead

Why did the designer make vector, map, and set functions in clojure?

Rich made vector, map, and set functions, while list, and sequence are not functions.
Why cannot all these collections be function to make it consistent?
Further, why don't we make all these compose data as a function which maps position to it's internal data?
If we make all these compose data as function then there will be only function and atom data in clojure. This will minimize the fundamental elements in that language right?
I believe a minimal, best only 2, set of fundamental elements would make the language simpler, more expressive and more flexible. Is this correct?
Vectors, maps, and sets are all associative data structures. Maps are the most obvious; they simply associate arbitrary keys with arbitrary values. A vector can be thought of as a map whose key set must be the set of all nonnegative integers less than the vector's size. Finally, sets can be thought of as maps that map keys to themselves.
It's important to understand that the sequential nature of a vector and the associative nature of a vector are two orthogonal things. It's a data structure that's designed to be good at supporting both abstractions (to some extent; for instance, you can't efficiently insert at the beginning of a vector).
Lists are simpler than vectors; they are finite sequential data structures, nothing more. A list can't efficiently return the element at a particular index, so it doesn't expose that functionality as part of its core interface. Of course, you can get an element of a list by index using nth, but in that case, you're explicitly treating it as a sequence, not as an associative structure.
So to answer your question, the IFn implementations for vectors, maps, and sets are there because of the extremely close relationship between the idea of an associative data structure and the idea of a pure function. Lists and other sequences are not inherently associative, so for consistency, they do not implement IFn.
Elogent's answer is excellent. There is one more reason that it wouldn't make sense for lists to be functions:
Literal lists already have a different, very important role, so they can't also be treated as functions in the way that vectors are.
Let's start with a vector containing two functions, partial and +, and a number, 5. We can treat the vector as a function, as you know, to return the value indexed by its argument:
user=> ([partial + 5] 2)
5
So far, so good. Suppose we want to use a list (partial + 5) in place of the vector, as you suggested, to return the value 5. Will we get an error message? No! But we won't get 5 as the result, either:
user=> ((partial + 5) 2)
7
What happened? (partial + 5) returned a function--the function that adds 5 to its single argument--and then this function was applied to the argument 2.
When a list is evaluated, its first element is evaluated, and should return a function. If the first element is a symbol, it's evaluated, and then the function that's its value is applied to the arguments, which are the other elements of the list. If the first argument of a list is itself a list, then it is evaluated in the same way that it would be evaluated if it were at the top level. The entire expression in that inner list should return a function, which will then be applied to the other elements of the outer list.
Since an inner list that's the first element of list that's being evaluated already has this role, it can't also play the kind of role that vectors that are first elements play.

Algorithm to find a node with particular properties in a tree given a starting node and the approximate path

I am looking for a logic which predicts where a particular element lies in the DOM of a specific page, given that we know some general properties of the element, and the approximate path from a few fixed nodes in the template to the element (obtained by analyzing a few pages of the similar type).
Specific Example:
There are a few Wikipedia pages to be analyzed:
http://en.wikipedia.org/wiki/Econometrics
http://en.wikipedia.org/wiki/History_of_economic_thought
etc
.
.
.
The algorithm must get the right navigation box (class="vertical-navbox nowraplinks plainlist") in these pages, given the following conditions:
The class name of the element might not be same in all the pages
The path to the navbox from the header (id="firstHeading"), and some other fixed nodes, in a few pages(test cases) is available
The header(and the other fixed nodes) always has the same id in each page
Some pages might have a few extra nodes in the path (class="hatnote" in the second link)
A few properties of the box(it is in blue color, it is a table etc..) are known
Is there an algorithm for this purpose?
So, let's make some assumption and see if they are compatible with your situation.
Let's say you have a test page and in that test page you can do complete dom tree visits.
In this case we could do a series of reversal path walks, from every leaf to the root, assuming a node has a score of 0 at the beginning and adding +1 if the branch from where we came up contained the wanted node.
After we have done this for all possible paths from leafs to root, we do another full visit and we divide the previously calculated score by the number of children (sub-trees or leaves) each node has.
This means that for every node now you have a percentage telling you the probability of a random sub-tree of that node containing the desired nodes.
Now, for the prediction part, you need some way to match a node in another page to one of the nodes for which you have probabilities (and for this I'm afraid I don't have any idea how it could be done).
Once you have such a match, and assuming the test page is really predictive, you have automatically a probability factor for each node of the new page that should be meaningful, notwithstanding any possible intermediate additional node.
Note that with the matching algorithm you could do the same calculation for multiple test pages and at the end of each process calculate an overall probability for each node that, hopefully, is more precise than your original one.
Hope this is what you needed.

Possible to call subfunction in S-function level-2

I have been trying to convert my level-1 S-function to level-2 but I got stuck at calling another subfunction at function Output(block) trying to look for other threads but to no avail, do you mind to provide related links?
My output depends on a lot processing with the inputs, this is the reason I need to call the sub-function in order to calculate and then return output values, all the examples that I can see are calculating their outputs directly in "function Output(block)", in my case I thought it is not possible.
I then tried to use Interpreted Matlab Function block but failed due to the output dimension is NOT the same as input dimension, also it does not support the return of more than ONE output................
Dear Sir/Madam,
I read in S-function documentation that "S-function level-1 supports vector inputs and outputs. DOES NOT support multiple input and output ports".
Does the second sentence mean the input and output dimension MUST BE SAME?
I have been using S-function level-1 to do the following:
[a1, b1] = choose_cells(c, d);
where a1 and b1 are outputs, c and d are inputs. All the variables are having a single value, except d is an array with 6 values.
Referring to the image attached, we all know that in S-function block, the input dimension must be SAME as output dimension, else we will get error, in this case, the input dimension is 7 while the output dimension is 2, so I have to include the "Terminator" blocks in the diagram for it to work perfectly, otherwise, I will get an error.
My problem is, when the system gets bigger, the array d could contain hundreds of variables, using this method, it means I would have to add hundreds of "Terminator" blocks in order to get this work, this definitely does not sound practical.
Could you please suggest me a wise way to implement this?
Thanks in advance.
http://imgur.com/ib6BTTp
http://imageshack.us/content_round.php?page=done&id=4tHclZ2klaGtl66S36zY2KfO5co

Stream filter in cuda

I have an array of values and a linked list of indexes. Now, i only want to keep those values from the array that correspond to the indexes in the LL. is there a standard algorithm to do this. Please give example if possible
So, suppose i have an array 1,2,5,6,7,9
and i have a linked list 2->3
So, i want to keep the values at the index 2 and 3. That is keep 5 and 6.
Thus my function should return 5 and 6
In general, linked list is inherently serial. Having a parallel machine will not speed up the traversal of your list, hence the number of steps of your problem cannot go below O(n), where n is the size of the list.
However, if you have some additional way to access the list you can do something with it.
For example, all elements of the list could be stored in a fixed-size array (although, not necesairly in a consecutive way). List member could be represented in an array using the following struct.
struct ListNode {
bool isValid;
T data;
int next;
}
The value isValid sets if given cell in an array is occupied by a valid list member, or it is just an empty cell.
Now, a parallel algorithm would read all cells at once, check if it represents a valid data, and if so, do something with it.
Second part: Each thread, having a valid index idx of your input array A would have to mark A[idx] not to be deleted. Once we know which elements of A should be removed and which not - a parallel compaction algorithm can be applied.