What is the value of a dereferenced pointer - dereference

I realized that I had some confusion regarding the value of a dereferenced pointer, as I was reading a C text with the following code snippet:
int main()
{
int matrix[3][10]; // line 3: matrix is tentatively defined
int (* arrPtr)[10] = matrix; // line 4: arrPtr is defined and initialize
(*arrPtr)[0] = 5; // line 5: what is the value of (*arrPtr) ?
My confusion is in regards to the value of *arrPtr in the last line. This is my understanding upto that point.
Line 3, matrix is declard (tentatively defined) to be an array of 3 elements of type array of 10 elements of type int.
Line 4, arrPtr is defined as a pointer to an array of 10 elements of type int. It is also initialized as a ptr to an array of 10 elements (i.e. the first row of matrix)
Now Line 5, arrPtr is dereferenced, yielding the actual array, so it's type is array of 10 ints.
MY question: Why is the value of the array, just the address of the array and not in someway related to it's elements?

The value of the array variable matrix is the array, however it (easily) "degrades" into a pointer to its first item, which you then assign to arrPtr.
To see this, use &matrix (has type int (*)[3][10]) or sizeof matrix (equals sizeof(int) * 3 * 10).
Additionally, there's nothing tentative about that definition.
Edit: I missed the question hiding in the code comments: *arrPtr is an object of type int[10], so when you use [0] on it, you get the first item, to which you then assign 5.
Pointers and arrays are purposefully defined to behave similiarly, and this is sometimes confusing (before you learn the various quirks), but also extremely versatile and useful.

I think you need to clarify your question. If you mean what is the value of printf("%i", arrPtr); then it will be the address of the array. If you mean printf("$i",(*arrPtr)[0] ); then we've got a more meaty question.

In C, arrays are pretty much just a convenience thing. All an “array” variable is is a pointer to the start of a block of data; just as an int [] equates to an int*, i.e. the location in memory of an int, an int [][] is a double pointer, an int**, which points to the location in memory of... another pointer, which in turn points to an actual particular int.

Related

How to use Critcl ByteArray?

I want to try out Critcl to enhance memory performance using a Z-order curve for a 2d-grid. What I need from Critcl is allocation, setter, getter and some size info. Reading about the Critcl ByteArray and examples does not make me confident on how to do it.
How do I create and return a ByteArray (i.e. Z-order curve)?
Any caveats I should know about when using ByteArray?
According to the documentation, you should be using the bytes type instead (when you get a pointer to a structure that has a len field with the number of bytes in it, and an s field that is the pointer to the actual read only block of bytes. (As a char * and not an unsigned char * for reasons I don't know. And why it isn't const is another mystery to me; there are cases where that's indeed true, but you need to look at the o field to figure that out.)
To return a byte array, you use the object (or object0) result type, and make the object with, for example, Tcl_NewByteArrayObj(), or Tcl_NewObj() and Tcl_SetByteArrayLength().
Here's an example (just the command definition) that does trivial byte reversing (since I don't understand Z-order curves at all):
critcl::cproc example {bytes dataPtr} object0 {
Tcl_Obj *result = Tcl_NewObj();
unsigned char *targetBytes = Tcl_SetByteArrayLength(result, dataPtr->len);
for (int i = 0, j = dataPtr->len - 1; j >= 0; i++, j--) {
targetBytes[i] = (unsigned byte) dataPtr->s[j];
}
return result;
}
Naturally, you'll want to read the Critcl usage guide when getting this to work, and if you're going to produce errors (by returning NULL), remember to set an error message in the interpreter. You can get access to that by using Tcl_Interp* interp as your first pseudo-argument to the command you create with critcl::cproc (it's documented, but easily missed).

Cuda/Thrust: remove_if doesn't change device_vector.size()?

I have a somewhat rather simple cuda question that seems like it should be a straight forward operation: removing elements from 1 array based on the value of a 2nd bool array. The steps I take are:
Create a device_vector of bools with the same size as the processed input array.
Call kernel which will set some of the elements from (1) to true
Call remove_if on input array with predicate using processed array from (2).
For each value in the bool array that is set to true, remove the corresponding element from the input array.
What I am seeing is that the input array isn't changed and I am not sure why ?
struct EntryWasDeleted
{
__device__ __host__
bool operator()(const bool ifDeleted)
{ return true; }
};
//This array has about 200-300 elements
//thrust::device_vector<SomeStruct> & arrayToDelete
thrust::device_vector<bool>* deletedEntries =
new thrust::device_vector<bool>(arrayToDelete.size(), false);
cuDeleteTestEntries<<<grid, block>>>( thrust::raw_pointer_cast(arrayToDelete.data()), countToDelete, heapAccess, thrust::raw_pointer_cast(deletedEntries->data()));
cudaDeviceSynchronize();
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(), deletedEntries->begin(), EntryWasDeleted());
//I am expecting testEntries to have 0 elements
thrust::host_vector<SomeStruct> testEntries = arrayToDelete;
for( int i = 0; i<testEntries.size(); i++)
{ printf( "%d", testEntries[i].someValue); }
In this sample, I am always returning true in the predicate for testing. However, when I do: testEntries = deletedEntries and output the members. I can validate that deletedEntries is properly filled in with trues and falses.
My expectation would be that testEntries would have 0 elements. But it doesn't and I get an output as if remove_if didn't do anything. ie: the output is showing ALL elements from the input array. I am not sure why? Is there a specific way to remove elements from a device_vector?
So you need to capture the iterator that is being returned from remove_if
thrust::device_vector<SomeStruct>::iterator endIterator =
thrust::remove_if(arrayToDelete.begin(), arrayToDelete.end(),
deletedEntries->begin(), EntryWasDeleted());
Then when you copy data back to the host instead of using thrusts default assignment operator between host and device do this:
thrust::host_vector<SomeStruct> testEntries(arrayToDelete.begin(),endIterator);
As a side note working with arrays of primitives can often be much more efficient. Like can you store the index of your structs in an array instead and operate on those indexes?

partial deconstruction in pattern-matching (F#)

Following a minimal example of an observation (that kind of astonished me):
type Vector = V of float*float
// complete unfolding of type is OK
let projX (V (a,_)) = a
// also works
let projX' x =
match x with
| V (a, _) -> a
// BUT:
// partial unfolding is not Ok
let projX'' (V x) = fst x
// consequently also doesn't work
let projX''' x =
match x with
| V y -> fst y
What is the reason that makes it impossible to match against a partially deconstructed type?
Some partial deconstructions seem to be ok:
// Works
let f (x,y) = fst y
EDIT:
Ok, I now understand the "technical" reason of the behavior described (Thanks for your answers & comments). However, I think that language wise, this behavior feels a bit "unnatural" compared to rest of the language:
"Algebraically", to me, it seems strange to distinguish a type "t" from the type "(t)". Brackets (in this context) are used for giving precedence like e.g. in "(t * s) * r" vs "t * (s * r)". Also fsi answers accordingly, whether I send
type Vector = (int * int)
or
type Vector = int * int
to fsi, the answer is always
type Vector = int * int
Given those observations, one concludes that "int * int" and "(int * int)" denote exactly the same types and thus that all occurrences of one could in any piece of code be replaced with the other (ref. transparency)... which as we have seen is not true.
Further it seems significant that in order to explain the behavior at hand, we had to resort to talk about "how some code looks like after compilation" rather than about semantic properties of the language which imo indicates that there are some "tensions" between language semantics an what the compiler actually does.
In F#
type Vector = V of float*float
is just a degenerated union (you can see that by hovering it in Visual Studio), so it's equivalent to:
type Vector =
| V of float*float
The part after of creates two anonymous fields (as described in F# reference) and a constructor accepting two parameters of type float.
If you define
type Vector2 =
| V2 of (float*float)
there's only one anonymous field which is a tuple of floats and a constructor with a single parameter. As it was pointed out in the comment, you can use Vector2 to do desired pattern matching.
After all of that, it may seem illogical that following code works:
let argsTuple = (1., 1.)
let v1 = V argsTuple
However, if you take into account that there's a hidden pattern matching, everything should be clear.
EDIT:
F# language spec (p 122) states clearly that parenthesis matter in union definitions:
Parentheses are significant in union definitions. Thus, the following two definitions differ:
type CType = C of int * int
type CType = C of (int * int)
The lack of parentheses in the first example indicates that the union case takes two arguments. The parentheses
in the second example indicate that the union case takes one argument that is a first-class tuple value.
I think that such behavior is consistent with the fact that you can define more complex patterns at the definition of a union, e.g.:
type Move =
| M of (int * int) * (int * int)
Being able to use union with multiple arguments also makes much sense, especially in interop situation, when using tuples is cumbersome.
The other thing that you used:
type Vector = int * int
is a type abbreviation which simply gives a name to a certain type. Placing parenthesis around int * int does not make a difference because those parenthesis will be treated as grouping parenthesis.

How to concurrently write and read CUDA array with unique incrementing values?

I have a shared memory array initialized as follows
#define UNDEFINED 0xffffffff
#define DEFINED 0xfffffffe
__shared__ unsigned int array[100];
__shared__ count;
// We have enough threads: blockDim.x > 100
array[threadIdx.x] = UNDEFINED;
// Initialize count
if (threadIdx.x == 0)
count = 0;
The threads have random access to array. When a thread access array, if it is UNDEFINED, it must write a unique value, count, to that element, and then read that value. If the array element is DEFINED or already has a unique value, it must just read the unique value out. The tricky part is that array and count must both be updated by only 1 thread. Atomic functions only update 1 variable not 2. Here's the method that I finally came up with for 1 thread to update both variables while blocking the other threads until it is done.
value = atomicCAS(&array[randomIndex], UNDEFINED, DEFINED);
if (value == UNDEFINED) {
value = atomicAdd(&count, 1);
array[randomIndex] = value;
}
// For case that value == DEFINED_SOURCe, wait for memory
// writes, then store value
__threadfence_block();
value = array[randomSource];
There is some tricky concurrency going on here. I'm not sure that this will work for all cases. Are there better suggestions or comments?
According to your description, the only time an array element will be written to is if it contains the value UNDEFINED. We can leverage this.
A thread will first do an atomicCAS operation on the desired array element. The atomicCAS will be configured to check for the UNDEFINED value. If it is present, it will replace it with DEFINED. If it is not present, it will not replace it.
Based on the return result from atomicCAS, the thread will know if the array element contained UNDEFINED or not. If it did, then the return result from the atomicCAS will be UNDEFINED, and the thread will then go and retrieve the desired unique value from count, and use that to modify the DEFINED value to the desired unique value.
we can do this in one line of code:
// assume idx contains the desired offset into array
if (atomicCAS(array+idx, UNDEFINED, DEFINED) == UNDEFINED) array[idx]=atomicAdd(&count, 1);
A more complete code could be like this:
value = DEFINED;
while (value == DEFINED){
value = atomicCAS(&array[randomIndex], UNDEFINED, DEFINED);
if (value == UNDEFINED) {
value = atomicAdd(&count, 1);
array[randomIndex] = value;}
}
// value now contains the unique value,
// either that was already present in array[randomIndex]
// or the value that was just written there
For have an array of incrementing values, use prefx-sum also called scan algorithms, based on binary tree ower threads. First over local block(shared memory in the name) ? then global over blocks, then add each summ back to each block.
Also it may be efficient for each block to read not one but some values, what are equal of physically "warp size" like 16 int values for example ( i apologize, because i have done this things long time ago and don't know proper sizes and proper names for this things in CUDA).
Ahh, btw,the final values, in case of equal incrementing, could be retrieved as the function from local or global thread.id, so you do not need scan at all

How to store data of a file using thrust::host_vector or device_vector?

The format of data is something like this:
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGGAGACAGTATCTCAACCGCAATAAACCC
GTTCACGGGCCTCACGCAACGGGGCCTGGCCTAGATATTGAGGCACCCAACAGCTCTTGGCCTGAGAGTGTTGTCTCGATCACGACGCCAGT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
TGCCACAGGTTCCACACAACGGGACTTGGTTGAAATATTGAGATCCTTGGGGGTCTGTTAATCGAAGACAGTATCTCAACCGCAATAAACCT
Each line contains one sequence, I want to make a pair of (key ,value), key is one sequence and value is 1. Then use reduce_by_key to count the number of each sequence.
But I found that thrust::host_vector can only store one sequence, if I push_back the 2nd sequence the program crashed.
Here is my code:
int main()
{
ifstream input_subset("subset.txt");
thrust::host_vector < string > h_output_subset;
string s;
while (getline(input_subset, s)) {
h_output_subset.push_back(s);
}
cout << h_output_subset.size() << endl;
return 0;
}
Is that possible to store all of data in a host_vector or a device_vector? Or is there any way to solve this problem?
The host_vector segfault was confirmed as a bug in thrust::uninitialised_copy and a patch has been applied to fix it.
The problem doing this with a device_vector is a genuine limitation of CUDA (no std::string support) and can't be avoided. An alternative would be to use a fixed length char[] array as a data member in a device_vector, or use a single large device_vector to hold all the string data, with a second device_vector holding the starting index of each sub-string within the character array.