C - pass array as parameter and change size and content - mysql

UPDATE: I solved my problem (scroll down).
I'm writing a small C program and I want to do the following:
The program is connected to a mysql database (that works perfectly) and I want to do something with the data from the database. I get about 20-25 rows per query and I created my own struct, which should contain the information from each row of the query.
So my struct looks like this:
typedef struct {
int timestamp;
double rate;
char* market;
char* currency;
} Rate;
I want to pass an empty array to a function, the function should calculate the size for the array based on the returned number of rows of the query. E.g. there are 20 rows which are returned from a single SQL query, so the array should contain 20 objectes of my Rate struct.
I want something like this:
int main(int argc, char **argv)
{
Rate *rates = ?; // don't know how to initialize it
(void) do_something_with_rates(&rates);
// the size here should be ~20
printf("size of rates: %d", sizeof(rates)/sizeof(Rate));
}
How does the function do_something_with_rates(Rate **rates) have to look like?
EDIT: I did it as Alex said, I made my function return the size of the array as size_t and passed my array to the function as Rate **rates.
In the function you can access and change the values like (*rates)[i].timestamp = 123 for example.

In C, memory is either dynamically or statically allocated.
Something like int fifty_numbers[50] is statically allocated. The size is 50 integers no matter what, so the compiler knows how big the array is in bytes. sizeof(fifty_numbers) will give you 200 bytes here.
Dynamic allocation: int *bunch_of_numbers = malloc(sizeof(int) * varying_size). As you can see, varying_size is not constant, so the compiler can't figure out how big the array is without executing the program. sizeof(bunch_of_numbers) gives you 4 bytes on a 32 bit system, or 8 bytes on a 64 bit system. The only one that know how big the array is would be the programmer. In your case, it's whoever wrote do_something_with_rates(), but you're discarding that information by either not returning it, or taking a size parameter.
It's not clear how do_something_with_rates() was declared exactly, but something like: void do_something_with_rates(Rate **rates) won't work as the function has no idea how big rates is. I recommend something like: void do_something_with_rates(size_t array_size, Rate **rates). At any rate, going by your requirements, it's still a ways away from working. Possible solutions are below:
You need to either return the new array's size:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
return n; // returning the new size so that the caller knows
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size = do_something_with_rates(20, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Or pass in a size parameter for the function to set:
void do_something_with_rates(size_t old_array_size, size_t *new_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
*new_array_size = n; // setting the new size so that the caller knows
// carry out your operation on new_rates
// modifying rates
free(*rates); // releasing the memory taken up by the old array
*rates = *new_rates // make it point to the new array
}
int main() {
Rate *rates = malloc(sizeof(Rate) * 20);
size_t new_size;
do_something_with_rates(20, &new_size, &rates);
// now new_size holds the size of the new array, which may or may not be 20
return 0;
}
Why do I need to pass the old size as a parameter?
void do_something_with_rates(Rate **rates) {
// You don't know what n is. How would you
// know how many rate objects the caller wants
// you to process for any given call to this?
for (size_t i = 0; i < n; ++i)
// carry out your operation on new_rates
}
Everything changes when you have a size parameter:
void do_something_with_rates(size_t size, Rate **rates) {
for (size_t i = 0; i < size; ++i) // Now you know when to stop
// carry out your operation on new_rates
}
This is a very fundamental flaw with your program.
I want to also want the function to change the contents of the array:
size_t do_something_with_rates(size_t old_array_size, Rate **rates) {
Rate **new_rates;
*new_rates = malloc(sizeof(Rate) * n); // allocate n Rate objects
// carry out some operation on new_rates
Rate *array = *new_rates;
for (size_t i = 0; i < n; ++i) {
array[i]->timestamp = time();
// you can see the pattern
}
return n; // returning the new size so that the caller knows
}

sizeof produces a value (or code to produce a value) of the size of a type or the type of an expression at compile time. The size of an expression can therefore not change during the execution of the program. If you want that feature, use a variable, terminal value or a different programming language. Your choice. Whatever. C's better than Java.
char foo[42];
foo has either static storage duration (which is only partially related to the static keyword) or automatic storage duration.
Objects with static storage duration exist from the start of the program to the termination. Those global variables are technically called variables declared at file scope that have static storage duration and internal linkage.
Objects with automatic storage duration exist from the beginning of their initialisation to the return of the function. These are usually on the stack, though they could just as easily be on the graph. They're variables declared at block scope that have automatic storage duration and internal linkage.
In either case, todays compilers will encode 42 into the machine code. I suppose it'd be possible to modify the machine code, though that several thousands of lines you put into that task would be much better invested into storing the size externally (see other answer/s), and this isn't really a C question. If you really want to look into this, the only examples I can think of that change their own machine code are viruses... How are you going to avoid that antivirus heuristic?
Another option is to encode size information into a struct, use a flexible array member and then you can carry both the array and the size around as one allocation. Sorry, this is as close as you'll get to what you want. e.g.
struct T_vector {
size_t size;
T value[];
};
struct T_vector *T_make(struct T_vector **v) {
size_t index = *v ? (*v)->size++ : 0, size = index + 1;
if ((index & size) == 0) {
void *temp = realloc(*v, size * sizeof *(*v)->value);
if (!temp) {
return NULL;
}
*v = temp;
// (*v)->size = size;
*v = 42; // keep reading for a free cookie
}
return (*v)->value + index;
}
#define T_size(v) ((v) == NULL ? 0 : (v)->size)
int main(void) {
struct T_vector *v = NULL; T_size(v) == 0;
{ T *x = T_make(&v); x->value[0]; T_size(v) == 1;
x->y = y->x; }
{ T *y = T_make(&v); x->value[1]; T_size(v) == 2;
y->x = x->y; }
free(v);
}
Disclaimer: I only wrote this as an example; I don't intend to test or maintain it unless the intent of the example suffers drastically. If you want something I've thoroughly tested, use my push_back.
This may seem innocent, yet even with that disclaimer and this upcoming warning I'll likely see a comment along the lines of: Each successive call to make_T may render previously returned pointers invalid... True, and I can't think of much more I could do about that. I would advise calling make_T, modifying the value pointed at by the return value and discarding that pointer, as I've done above (rather explicitly).
Some compilers might even allow you to #define sizeof(x) T_size(x)... I'm joking; don't do this. Do it, mate; it's awesome!
Technically we aren't changing the size of an array here; we're allocating ahead of time and where necessary, reallocating and copying to a larger array. It might seem appealing to abstract allocation away this way in C at times... enjoy :)

Related

Using I2C_master library AVR

I am using I2C_master library for AVR, Communication works fine, but I have little problem, how can I get data.
I am using this function
uint16_t i2c_2byte_readReg(uint8_t devaddr, uint16_t regaddr, uint8_t* data, uint16_t length){
devaddr += 1;
if (i2c_start(devaddr<<1|0)) return 1;
i2c_write(regaddr >> 8);
i2c_write(regaddr & 0xFF);
if (i2c_start(devaddr<<1| 1)) return 1;
for (uint16_t i = 0; i < (length-1); i++)
{
data[i] = i2c_read_ack();
}
data[(length-1)] = i2c_read_nack();
i2c_stop();
return 0;}
And now I need to use received data, and send it by UART to PC
uint8_t* DevId;
i2c_2byte_readReg(address,REVISION_CODE_DEVID,DevId,2);
deviceH=*DevId++;
deviceL=*DevId;
UART_send(deviceH);
UART_send(deviceL);
I think that I am lost with pointers. Could you help me, how can I get received data for future use? (UART works fine for me in this case, but it sends only 0x00 with this code)
The function i2c_2byte_readReg takes as a third argument a pointer to the buffer where the data will be written. Note that it must have size bigger than the forth argument called length. Your DevId pointer doesn't point to any buffer so when calling the function you've got an access violation.
To get the data you should define an array before calling the function:
const size_t size = 8;
uint8_t data[size];
Then you can call the function passing the address of the buffer as an argument (the name of the array is converted into its address):
const uin16_t length = 2;
i2c_2byte_readReg(address, REVISION_CODE_DEVID, data, length);
Assuming that the function works well those two bytes will be saved into data buffer. Remember that size must be bigger or equal to length argument.
Then you can send the data over UART:
UART_send(data[0]);
UART_send(data[1]);

Aggregate results from CUDA threads

I am solving minimal dominant set problem on CUDA. Every thread finds some local candiate result and I need to find the best. I am using __device__ variables for the global result (dev_bestConfig and dev_bestValue).
I need to do something like this:
__device__ configType dev_bestConfig = 0;
__device__ int dev_bestValue = INT_MAX;
__device__ void findMinimalDominantSet(int count, const int *matrix, Lock &lock)
{
// here is some algorithm that finds local bestValue and bestConfig
// set device variables
if (bestValue < dev_bestValue)
{
dev_bestValue = bestValue;
dev_bestConfig = bestConfig;
}
}
I know that this does not work because more threads accesses the memory at the same time so I use this critical section:
// set device variables
bool isSet = false;
do
{
if (isSet = atomicCAS(lock.mutex, 0, 1) == 0)
{
// critical section goes here
if (bestValue < dev_bestValue)
{
dev_bestValue = bestValue;
dev_bestConfig = bestConfig;
}
}
if (isSet)
{
*lock.mutex = 0;
}
} while (!isSet);
This actually works as expected but it is really slow. For example without this critical section it takes 0.1 secodns and with this critical section it takes 1.8 seconds.
What can i do differetly to make it faster?
I actually avoided any critical sections and locking at the end. I saved local results to an array and then searched for the best one. The searching can be done sequentially or by parallel reduction.

Cost of OpenCL get_local_id()

I have a simple scan kernel, which calculates scans of several blocks in a loop. I noticed that performance somewhat rises when get_local_id() is stored inside a local variable instead of calling it inside the loop. So to summarize with code, this:
__kernel void LocalScan_v0(__global const int *p_array, int n_array_size, __global int *p_scan)
{
const int n_group_offset = get_group_id(0) * SCAN_BLOCK_SIZE;
p_array += n_group_offset;
p_scan += n_group_offset;
// calculate group offset
const int li = get_local_id(0); // *** local id cached ***
const int gn = get_num_groups(0);
__local int p_workspace[SCAN_BLOCK_SIZE];
for(int i = n_group_offset; i < n_array_size; i += SCAN_BLOCK_SIZE * gn) {
LocalScan_SingleBlock(p_array, p_scan, p_workspace, li);
p_array += SCAN_BLOCK_SIZE * gn;
p_scan += SCAN_BLOCK_SIZE * gn;
}
// process all the blocks in the array (each block size SCAN_BLOCK_SIZE)
}
Has throughput of 74 GB/s on GTX-780, while this:
__kernel void LocalScan_v0(__global const int *p_array, int n_array_size, __global int *p_scan)
{
const int n_group_offset = get_group_id(0) * SCAN_BLOCK_SIZE;
p_array += n_group_offset;
p_scan += n_group_offset;
// calculate group offset
const int gn = get_num_groups(0);
__local int p_workspace[SCAN_BLOCK_SIZE];
for(int i = n_group_offset; i < n_array_size; i += SCAN_BLOCK_SIZE * gn) {
LocalScan_SingleBlock(p_array, p_scan, p_workspace, get_local_id(0));
// *** local id polled inside the loop ***
p_array += SCAN_BLOCK_SIZE * gn;
p_scan += SCAN_BLOCK_SIZE * gn;
}
// process all the blocks in the array (each block size SCAN_BLOCK_SIZE)
}
Has only 70 GB/s on the same hardware. The only difference is whether the call to get_local_id() is inside or outside of the loop. The code in LocalScan_SingleBlock() is pretty much described in this GPU Gems article.
Now this brings some questions. I always imagined that thread id is stored inside some register, and access to it is as fast as to any thread-local variable. This doesn't seem to be the case. I always used to have habit of caching the local id in a variable with reluctance of an old "C" programmer who wouldn't call a function in a loop, had he expect it to return the same value every time, but I didn't seriously think it would make any difference.
Any ideas as to why this might be? I didn't do any checking on the compiled binary code. Does anyone have the same experience? Is it the same with threadIdx.x in CUDA? How about ATI platforms? Is this behavior described somewhere? I quickly scanned through CUDA Best Practices, but didn't find anything.
This is just a guess, but as per the Khronos page
http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_local_id.html
get_local_id() isn't defined to return a constant value (merely size_t). That may mean that, as far as the compiler is aware, it may not be allowed to perform certain optimisations compared with a constant local_id because the return of the function value may change in the eyes of the compiler (even though it wont per-thread)

What's the difference between call by reference and copy/restore

What's the difference in the outcome between call by reference and copy/restore?
Background: I'm currently studying distributed systems. Concerning the passing of reference parameters for remote procedure calls, the book states that: "the call by reference has been replaced by copy/restore. Although this is not always identical, it is good enough". I understand how call by reference and copy/restore work in principle, but I fail to see where a difference in the result may be?
Examples taken from here.
Main code:
#include <stdio.h>
int a;
int main() {
a = 3;
f( 4, &a );
printf("%d\n", a);
return 0;
}
Call by Value:
f(int x, int &y){
// x will be 3 as passed argument
x += a;
// now a is added to x so x will be 6
// but now nothing is done with x anymore
a += 2*y;
// a is still 3 so the result is 11
}
Value is passed in and has no effect on the value of the variable passed in.
Call by Reference:
f(int x, int &y){
// x will be 3 as passed argument
x += a;
// now a is added to x so x will be 6
// but because & is used x is the same as a
// meaning if you change x it will change a
a += 2*y;
// a is now 6 so the result is 14
}
Reference is passed in. Effectively the variable in the function is the same as the one outside.
Call with Copy/Restore:
int a;
void unsafe(int x) {
x= 2; //a is still 1
a= 0; //a is now 0
}//function ends so the value of x is now stored in a -> value of a is now 2
int main() {
a= 1;
unsafe(a); //when this ends the value of a will be 2
printf("%d\n", a); //prints 2
}
Value is passed in and has no effect on the value of the variable passed in UNTIL the end of the function, at which point the FINAL value of the function variable is stored in the passed in variable.
The basic difference between call by reference and copy/restore then is that changes made to the function variable will not show up in the passed in variable until after the end of the function while call by reference changes will be seen immediately.
Call by Copy/Restore is a special case of call-by-reference where the provided reference is unique to the caller. The final result on the referenced values will not be saved until the end of the function.
This type of calling is useful when a method in RPC called by reference. The actual data is sent to the server side and the final result will send to the client. This will reduce the traffic, since the server will not update the reference each time.
Call By Reference:
In call-by-reference, we pass a pointer to the called function. Any changes that happens to the data pointed by that pointer will be reflected immediately.
Suppose if there are numerous changes to be made to that data, while it wouldn’t incur much cost locally, it’ll be expensive in terms of network cost as for each change data will have to be copied back to the client.
C Code:
void addTwo(int *arr, int n){
for(int i=0;i<n;i++){
arr[i]+=2; //change is happening in the original data as well
}
}
int main(){
int arr[100]={1,2,3,...}; // assuming it to be initialised
addTwo(arr,100);
}
Call By Copy/Restore:
In call-by-copy/restore, the idea is that when the function is called with the reference to the data, only the final result of the changes made to the data is copied back to the original data(when the function is about to return) without making any changes to the original data during the function call, requiring only one transfer back to the client.
In the C code below, the data pointed by arr is copied in the function and stored back to arr after all the changes to the local data are finalised.
C Code:
void addTwo(int *arr, int n){
// copy data locally
larr = (int*)malloc(n*sizeof(int));
for(int i=0;i<n;i++){
larr[i]=arr[i];
}
for(int i=0;i<n;i++){
// change is happening to the local variable larr
larr[i]+=2;
}
//copy all the changes made to the local variable back to the original data
for(int i=0;i<n;i++){
arr[i]=larr[i];
}
}
int main(){
int arr[100]={1,2,3,...}; // assuming it to be initialised
addTwo(arr,100);
}
Note: Code shown above doesn’t represent actual RPC implementation, just an illustration of the concepts. In real RPC, complete data is passed in the message instead of pointers(addresses).

CUDA binary search implementation

I am trying to speed up the CPU binary search. Unfortunately, GPU version is always much slower than CPU version. Perhaps the problem is not suitable for GPU or am I doing something wrong ?
CPU version (approx. 0.6ms):
using sorted array of length 2000 and do binary search for specific value
...
Lookup ( search[j], search_array, array_length, m );
...
int Lookup ( int search, int* arr, int length, int& m )
{
int l(0), r(length-1);
while ( l <= r )
{
m = (l+r)/2;
if ( search < arr[m] )
r = m-1;
else if ( search > arr[m] )
l = m+1;
else
{
return index[m];
}
}
if ( arr[m] >= search )
return m;
return (m+1);
}
GPU version (approx. 20ms):
using sorted array of length 2000 and do binary search for specific value
....
p_ary_search<<<16, 64>>>(search[j], array_length, dev_arr, dev_ret_val);
....
__global__ void p_ary_search(int search, int array_length, int *arr, int *ret_val )
{
const int num_threads = blockDim.x * gridDim.x;
const int thread = blockIdx.x * blockDim.x + threadIdx.x;
int set_size = array_length;
ret_val[0] = -1; // return value
ret_val[1] = 0; // offset
while(set_size != 0)
{
// Get the offset of the array, initially set to 0
int offset = ret_val[1];
// I think this is necessary in case a thread gets ahead, and resets offset before it's read
// This isn't necessary for the unit tests to pass, but I still like it here
__syncthreads();
// Get the next index to check
int index_to_check = get_index_to_check(thread, num_threads, set_size, offset);
// If the index is outside the bounds of the array then lets not check it
if (index_to_check < array_length)
{
// If the next index is outside the bounds of the array, then set it to maximum array size
int next_index_to_check = get_index_to_check(thread + 1, num_threads, set_size, offset);
if (next_index_to_check >= array_length)
{
next_index_to_check = array_length - 1;
}
// If we're at the mid section of the array reset the offset to this index
if (search > arr[index_to_check] && (search < arr[next_index_to_check]))
{
ret_val[1] = index_to_check;
}
else if (search == arr[index_to_check])
{
// Set the return var if we hit it
ret_val[0] = index_to_check;
}
}
// Since this is a p-ary search divide by our total threads to get the next set size
set_size = set_size / num_threads;
// Sync up so no threads jump ahead and get a bad offset
__syncthreads();
}
}
Even if I try bigger arrays, the time ratio is not any better.
You have way too many divergent branches in your code so you're essentially serializing the entire process on the GPU. You want to break up the work so that all the threads in the same warp take the same path in the branch. See page 47 of the CUDA Best Practices Guide.
I'm must admit I'm not entirely sure what what your kernel does, but am I right in assuming that you are looking for just one index that satisfies your search criteria? If so then have a look at the reduction sample that comes with CUDA for some pointers on how to structure and optimize such a query. (What your are doing is essentially trying to reduce the closest index to your query)
Some quick pointers though:
You are performing an awful lot of reads and writes to global memory, which is incredibly slow. Try using shared memory instead.
Secondly remember that __syncthreads() only syncs threads in the same block, so your reads/writes to global memory won't necessarily get synced across all threads (though the latency from you global memory writes may actually make it appear as if they do)