Stream filter in cuda - cuda

I have an array of values and a linked list of indexes. Now, i only want to keep those values from the array that correspond to the indexes in the LL. is there a standard algorithm to do this. Please give example if possible
So, suppose i have an array 1,2,5,6,7,9
and i have a linked list 2->3
So, i want to keep the values at the index 2 and 3. That is keep 5 and 6.
Thus my function should return 5 and 6

In general, linked list is inherently serial. Having a parallel machine will not speed up the traversal of your list, hence the number of steps of your problem cannot go below O(n), where n is the size of the list.
However, if you have some additional way to access the list you can do something with it.
For example, all elements of the list could be stored in a fixed-size array (although, not necesairly in a consecutive way). List member could be represented in an array using the following struct.
struct ListNode {
bool isValid;
T data;
int next;
}
The value isValid sets if given cell in an array is occupied by a valid list member, or it is just an empty cell.
Now, a parallel algorithm would read all cells at once, check if it represents a valid data, and if so, do something with it.
Second part: Each thread, having a valid index idx of your input array A would have to mark A[idx] not to be deleted. Once we know which elements of A should be removed and which not - a parallel compaction algorithm can be applied.

Related

MySQL: Can we create an array (associative or not) as a MySQL server script variable? [duplicate]

This question already has answers here:
How to store arrays in MySQL?
(7 answers)
Closed 10 months ago.
I've seen a lot of SQL server script variables of this kind: #variable. But can we, in fact, store an array (associative or not) behind the #variable?
UPDATE
This question turns out to be a duplicate of this one, which suggests to consider possible using of:
SET type and JSON type, which seem to be only column types but not #variable types.
A TEMPORARY TABLE, which seem to be stored in HDD (right?).
Functions working with JSON strings (e.g., JSON_VALUE and JSON_LENGTH), which are usable entirely within MySQL server script. Although, these functions do not help to derive an array and store it in a #variable and are merely JSON walkarounds. I would accept this variant but it seems like #json_string is parsed each time we call JSON_VALUE(#json_string).
So, till now it seems that there IS an opportunity to CREATE an array (associative or not!) but there IS NO an opportunity to surely KEEP the array for its further processing!
Regarding the question mentioned in the beginning of this one. Right now I've only reached 5th and 6th answers, which are related to JSON strings. They are interesting! Be sure to check them out if you're interested in the subject!
Thanks to everyone for your time!
UPDATE
As #Panagiotis Kanavos has mentioned, fetching data by value is slower in case of arrays.
But what if:
We indeed want to simply iterate over M input arrays simultaneously and produce N output arrays? (Maybe, we are simply interested in collation of parameters along a timeline and keep the results.) Arrays are perfectly suitable for this task. But of course, in this case we can still use tables. The question is what will be faster? If our iterative process involves many requests to arrays' elements (can we rely on the server caching the M input arrays and that they'll always be at hand?) and creation of multiple result arrays (how long will it take in case of tables and how do we know that tables are created in RAM for fast access?)?
We want to create an array manually along the course of a server script and are going to only use it in C-like style (aren't going to fetch its data by value) and after the script execution there'll be no need in the array? So, this will be a classic C-like script-only array. To me, in this case putting the array directly into the RAM is what we need and will be more effective than table creation (which'll probably go to HDD), won't it?
And so, the 2nd (and more general) question arises: How far can we rely on the server's optimizations?
I understand that a huge work's been put in optimization in many ways. But has somebody met a situation when a server didn't optimize in the best way? When a programmer had to explicitly rearrange the code in order to manually bring it to the optimal state?
MySQL will implement a data type, ARRAY, to store
variable-sized arrays, in compliance with Standard
SQL (SQL:2003) array functionality.
Syntax
Add a new column data type:
ARRAY [[ may be any data type supported (except
for ARRAY itself, and REF, which MySQL does not support).
It defines the type of data that the array will contain.
-- The [] must be an unsigned integer
greater than zero. It defines the maximum cardinality of
the array, rather than its exact size. Note that the inner
set of brackets is mandatory when defining an array with
a specific size, e.g. INT ARRAY is correct for defining an
array with the default size, INT ARRAY[5] is the correct syntax
for defining an array that will contain 5 elements.
-- As shown in the syntax diagram, []
is optional. If omitted, the maximum cardinality of the
array defaults to an implementation-defined default value.
Oracle's VARRAY size is limited to the maximum number of
columns allowed in a table, so I suggest we make our default
match that maximum. Thus, if [] is omitted,
the maximum cardinality of the array defaults to 1000, which
should also be the absolute maximum cardinality. Thus:
-- [] defaults to 1000.
-- [] may range from 1 to 1000.
Function
An array is an ordered collection of elements, possibly
containing data values. The ARRAY data type will be used
to store data arrays in database tables.
Rules
-- An array is an ordered set of elements.
-- The maximum number of elements in the array is known as
the array's maximum cardinality. The maximum cardinality is
defined at the time the array is defined, as is the element
data type.
-- The actual number of elements that contain data values is
known as the array's cardinality. The cardinality of an array
may vary and is not defined at the time the array is defined.
That is, an instance of an array may always contain fewer
elements than the maximum cardinality allows.
-- Each element may contain a data value that corresponds
to the array's defined data type.
-- Each element has three states: blank (no value assigned
to the element), NULL (NULL assigned to the element), and
containing the valid value (data value assigned to the element).
-- Each element is associated with exactly one ordinal position
in the array. The first array element is found at position 1 (one),
the next at position 2 (two), and so on. Thus, assuming n is the
cardinality of an array, the ordinal position of an array element
is an integer in the range 1 (one) <= element <= n.
-- An array has a maximum cardinality and an actual cardinality.
-- It is an error if one attempts to assign a value to an array
an element whose position is greater than the maximum cardinality of
the array.
-- It is not an error if one attempts to assign values to only
some of an array's elements.
-- Privileges:
-- No special privileges are required to create a table
with the ARRAY data type, or to utilize ARRAY data.
-- Comparison:
-- See WL#2084 Add the ability to compare ARRAY data.
-- Assignment:
-- See WL#2082 Add ARRAY element reference function and
LW #2083 Add ARRAY value constructor function.
Other statements
-- Two other syntax elements must be implemented for
the ARRAY data type to be useful. See WL#2082 for
Array Element Reference syntax and WL#2083 for Array Constructor
syntax.
-- Also related:
-- CARDINALITY(). See WL#2085.
-- Array concatenation. See WL#.
An example
Create a table with the new data type:
CREATE TABLE ArrayTable (array_column INT ARRAY[3]);
Insert data:
INSERT INTO ArrayTable (array_column)
VALUES (ARRAY[10,20,30]);
Retrieve data:
SELECT array_column from ArrayTable
WHERE array_column <> ARRAY[];
-- Returns all cases where array_column is not an empty array
Reserved words
ARRAY, eventually CARDINALITY

Why did the designer make vector, map, and set functions in clojure?

Rich made vector, map, and set functions, while list, and sequence are not functions.
Why cannot all these collections be function to make it consistent?
Further, why don't we make all these compose data as a function which maps position to it's internal data?
If we make all these compose data as function then there will be only function and atom data in clojure. This will minimize the fundamental elements in that language right?
I believe a minimal, best only 2, set of fundamental elements would make the language simpler, more expressive and more flexible. Is this correct?
Vectors, maps, and sets are all associative data structures. Maps are the most obvious; they simply associate arbitrary keys with arbitrary values. A vector can be thought of as a map whose key set must be the set of all nonnegative integers less than the vector's size. Finally, sets can be thought of as maps that map keys to themselves.
It's important to understand that the sequential nature of a vector and the associative nature of a vector are two orthogonal things. It's a data structure that's designed to be good at supporting both abstractions (to some extent; for instance, you can't efficiently insert at the beginning of a vector).
Lists are simpler than vectors; they are finite sequential data structures, nothing more. A list can't efficiently return the element at a particular index, so it doesn't expose that functionality as part of its core interface. Of course, you can get an element of a list by index using nth, but in that case, you're explicitly treating it as a sequence, not as an associative structure.
So to answer your question, the IFn implementations for vectors, maps, and sets are there because of the extremely close relationship between the idea of an associative data structure and the idea of a pure function. Lists and other sequences are not inherently associative, so for consistency, they do not implement IFn.
Elogent's answer is excellent. There is one more reason that it wouldn't make sense for lists to be functions:
Literal lists already have a different, very important role, so they can't also be treated as functions in the way that vectors are.
Let's start with a vector containing two functions, partial and +, and a number, 5. We can treat the vector as a function, as you know, to return the value indexed by its argument:
user=> ([partial + 5] 2)
5
So far, so good. Suppose we want to use a list (partial + 5) in place of the vector, as you suggested, to return the value 5. Will we get an error message? No! But we won't get 5 as the result, either:
user=> ((partial + 5) 2)
7
What happened? (partial + 5) returned a function--the function that adds 5 to its single argument--and then this function was applied to the argument 2.
When a list is evaluated, its first element is evaluated, and should return a function. If the first element is a symbol, it's evaluated, and then the function that's its value is applied to the arguments, which are the other elements of the list. If the first argument of a list is itself a list, then it is evaluated in the same way that it would be evaluated if it were at the top level. The entire expression in that inner list should return a function, which will then be applied to the other elements of the outer list.
Since an inner list that's the first element of list that's being evaluated already has this role, it can't also play the kind of role that vectors that are first elements play.

Sorting by key > 10 integer sequences. with thrust

I want to perform a sort_by_key where I have a single key-sequence
and multiple value sequences.
One usually performs this with
sort_by_key(
key,
key + N,
make_zip_iterator(
make_tuple(x1 , x2 , ...)
)
)
However I want to perform a sort with > 10 sequences each of length N. Thrust does not support
tuples of size >= 10. So is there a way around this ?
Of course one can keep a separate copy of the key vector and perform
sorts on bunches of 10 sequences. But I would like to do everything in a single call.
thrust::tuple is hardcoded to always have 10 elements, so there isn't a direct way to form a zip_iterator from more than ten individual iterators, and therefore no way of sorting more than 10 distinct iterators by key in a single fused operation (and implicitly no way of passing more than 10 iterators into a user functor as well).
If you really can't think of a useful way to combine some of the individual vectors into a single iterator (for example form a vector of tuple values), then one alternative might be to use permutation iterators. If you create an array from a counting iterator and sort that, so something like:
device_vector<int> indices(N);
copy(make_counting_iterator(0), make_counting_iterator(N), indices.begin());
sort_by_key(key, key+N, indices);
indices now holds ordered indices into the vectors you would otherwise have sorted. You can then create a permutation iterator which can be used to "gather" the input data by your key as part of subsequent algorithm calls. You can make as many permutation iterators as needed, and they can be permutations of zip iterators to providing different "views" of the 12 input iterators as you need them in subsequent code.
Actually you may use the simple "scatter" operation. Perform only one "thrust::sort_by_key" operation, then for each data vector apply "thrust::scatter" operation. The values will be distributed to according locations.
thrust::sequence(indices.begin(), indices.end());
thrust::sort_by_key(keyvals.begin(), keyvals.end(), indices.begin());
//now indices keep the locations of the sorted key values
foreach ( ... ) {
thrust::scatter(data.begin(), data.end(), indices.begin(), sorteddata.begin());
}
Gather and scatter operations are quite powerful and opens many opportunities.

Associative array with int as key

In my application I want to have a dictionnary where the key is an integer.
Since it's an integer, I use normal Array :
var arr : Array = [];
arr[5] = anObject;
arr[82] = anOtherObject;
When I iterate with for each, no problem, it iterates through those 2 object. The problem is that arr.length return 83... So I have to create a variable that count the number as I modify the array.
Question 1 : Is there a best practice for that (IE: associative array with int as key)? I hesitated to use a Dictionnary.
Question 2 : Does flash allocates memory for the unused buckets of the array?
Arrays in flash are sparse (unlike Vector), so the empty entries will not be allocated. If you need to know the length, you will probably need to keep track of it manually (make a wrapper class perhaps).
Adobe says:
Arrays are sparse arrays, meaning there might be an element at index 0 and another at index 5, but nothing in the index positions between those two elements. In such a case, the elements in positions 1 through 4 are undefined, which indicates the absence of an element, not necessarily the presence of an element with the value undefined.

Generating unique codes that are different in two digits

I want to generate unique code numbers (composed of 7 digits exactly). The code number is generated randomly and saved in MySQL table.
I have another requirement. All generated codes should differ in at least two digits. This is useful to prevent errors while typing the user code. Hopefully, it will prevent referring to another user code while doing some operations as it is more unlikely to miss two digits and match another existing user code.
The generate algorithm works simply like:
Retrieve all previous codes if any from MySQL table.
Generate one code at a time.
Subtract the generated code with all previous codes.
Check the number of non-zero digits in the subtraction result.
If it is > 1, accept the generated code and add it to previous codes.
Otherwise, jump to 2.
Repeat steps from 2 to 6 for the number of requested codes.
Save the generated codes in the DB table.
The algorithm works fine, but the problem is related to performance. It takes a very long to finish generating the codes when requesting to generate a large number of codes like: 10,000.
The question: Is there any way to improve the performance of this algorithm?
I am using perl + MySQL on Ubuntu server if that matters.
Have you considered a variant of the Luhn algorithm? Luhn is used to generate a check digit for strings of numbers in lots of applications, including credit card account numbers. It's part of the ISO-7812-1 standard for generating identifiers. It will catch any number that is entered with one incorrect digit, which implies any two valid numbers differ in a least two digits.
Check out Algorithm::LUHN in CPAN for a perl implementation.
Don't retrieve the existing codes, just generate a potential new code and see if there are any conflicting ones in the database:
SELECT code FROM table WHERE abs(code-?) regexp '^[1-9]?0*$';
(where the placeholder is the newly generated code).
Ah, I missed the generating lots of codes at once part. Do it like this (completely untested):
my #codes = existing_codes();
my $frontwards_index = {};
my $backwards_index = {};
for my $code (#codes) {
index_code($code, $frontwards_index);
index_code(reverse($code), $backwards_index);
}
my #new_codes = map generate_code($frontwards_index, $backwards_index), 1..10000;
sub index_code {
my ($code, $index) = #_;
push #{ $index{ substr($code, 0, length($code)/2) } }, $code;
return;
}
sub check_index {
my ($code, $index) = #_;
my $found = grep { ($_ ^ $code) =~ y/\0//c <= 1 } #{ $index{ substr($code, 0, length($code)/2 } };
return $found;
}
sub generate_code {
my ($frontwards_index, $backwards_index) = #_;
my $new_code;
do {
$new_code = sprintf("%07d", rand(10000000));
} while check_index($new_code, $frontwards_index)
|| check_index(reverse($new_code), $backwards_index);
index_code($new_code, $frontwards_index);
index_code(reverse($new_code), $backwards_index);
return $new_code;
}
Put the numbers 0 through 9,999,999 in an augmented binary search tree. The augmentation is to keep track of the number of sub-nodes to the left and to the right. So for example when your algorithm begins, the top node should have value 5,000,000, and it should know that it has 5,000,000 nodes to the left, and 4,999,999 nodes to the right. Now create a hashtable. For each value you've used already, remove its node from the augmented binary search tree and add the value to the hashtable. Make sure to maintain the augmentation.
To get a single value, follow these steps.
Use the top node to determine how many nodes are left in the tree. Let's say you have n nodes left. Pick a random number between 0 and n. Using the augmentation, you can find the nth node in your tree in log(n) time.
Once you've found that node, compute all the values that would make the value at that node invalid. Let's say your node has value 1,111,111. If you already have 2,111,111 or 3,111,111 or... then you can't use 1,111,111. Since there are 8 other options per digit and 7 digits, you only need to check 56 possible values. Check to see if any of those values are in your hashtable. If you haven't used any of those values yet, you can use your random node. If you have used any of them, then you can't.
Remove your node from the augmented tree. Make sure that you maintain the augmented information.
If you can't use that value, return to step 1.
If you can use that value, you have a new random code. Add it to the hashtable.
Now, checking to see if a value is available takes O(1) time instead of O(n) time. Also, finding another available random value to check takes O(log n) time instead of... ah... I'm not sure how to analyze your algorithm.
Long story short, if you start from scratch and use this algorithm, you will generate a complete list of valid codes in O(n log n). Since n is 10,000,000, it will take a few seconds or something.
Did I do the math right there everybody? Let me know if that doesn't check out or if I need to clarify anything.
Use a hash.
After generating a successful code (not conflicting with any existing code), but that code in the hash table, and also put the 63 other codes that differ by exactly one digit into the hash.
To see if a randomly generated code will conflict with an existing code, just check if that code exists in the hash.
Howabout:
Generate a 6 digit code by autoincrementing the previous one.
Generate a 1 digit code by incrementing the previous one mod 10.
Concatenate the two.
Presto, guaranteed to differ in two digits. :D
(Yes, being slightly facetious. I'm assuming that 'random' or at least quasi-random is necessary. In which case, generate a 6 digit random key, repeat until its not a duplicate (i.e. make the column unique, repeat until the insert doesn't fail the constraint), then generate a check digit, as someone already said.)