How do a #defined function work in C? - function

I have this piece of code which output is 4. I assumed the answer be 3 because of the pre-increment. Can anyone explain this??
#include<iostream>
#include<cstdio>
#define MAX(A,B) ((A>B)? A : B)
using namespace std;
int main()
{
int i=1,j=2,k;
k= MAX(++i,++j);
cout<<k;
return 0;
}

#define does not work like a function, think of it more like a find and replace so doing the macro expansion manually you get
int main()
{
int i=1,j=2,k;
k= ((++i > ++j) ? ++i : ++j);
cout<<k;
return 0;
}
This means you increment i and j once when you compare then, and increment the larger of the two another time before assigning it to k. I generally avoid including pre and post increment instructions inside of over logic since it is harder to reason about. You are better just incrementing i and j on their own lines before using them in MAX

Macros are not functions, they just perform text substitution pre-compile time. Your line of code becomes
k=((++i>++j)? ++i : ++j);
which clearly increments j twice.

It's called the conditional operator (or ternary operator) which is used in macro substitution
#define MAX(a,b) ((a) > (b) ? (a) : (b))
Means:
if ((a) > (b)){
return a;
} else {
return b;
}
So if you would do:
int test = MAX(5,10);
test would be 10

Related

Converting uint32_t to binary in C

The main problem I'm having is to read out values in binary in C. Python and C# had some really quick/easy functions to do this, I found topic about how to do it in C++, I found topic about how to convert int to binary in C, but not how to convert uint32_t to binary in C.
What I am trying to do is to read bit by bit the 32 bits of the DR_REG_RNG_BASE address of an ESP32 (this is the address where the random values of the Random Hardware Generator of the ESP are stored).
So for the moment I was doing that:
#define DR_REG_RNG_BASE 0x3ff75144
void printBitByBit( ){
// READ_PERI_REG is the ESP32 function to read DR_REG_RNG_BASE
uint32_t rndval = READ_PERI_REG(DR_REG_RNG_BASE);
int i;
for (i = 1; i <= 32; i++){
int mask = 1 << i;
int masked_n = rndval & mask;
int thebit = masked_n >> i;
Serial.printf("%i", thebit);
}
Serial.println("\n");
}
At first I thought it was working well. But in fact it takes me out of binary representations that are totally false. Any ideas?
Your shown code has a number of errors/issues.
First, bit positions for a uint32_t (32-bit unsigned integer) are zero-based – so, they run from 0 thru 31, not from 1 thru 32, as your code assumes. Thus, in your code, you are (effectively) ignoring the lowest bit (bit #0); further, when you do the 1 << i on the last loop (when i == 32), your mask will (most likely) have a value of zero (although that shift is, technically, undefined behaviour for a signed integer, as your code uses), so you'll also drop the highest bit.
Second, your code prints (from left-to-right) the lowest bit first, but you want (presumably) to print the highest bit first, as is normal. So, you should run the loop with the i index starting at 31 and decrement it to zero.
Also, your code mixes and mingles unsigned and signed integer types. This sort of thing is best avoided – so it's better to use uint32_t for the intermediate values used in the loop.
Lastly (as mentioned by Eric in the comments), there is a far simpler way to extract "bit n" from an unsigned integer: just use value >> n & 1.
I don't have access to an Arduino platform but, to demonstrate the points made in the above discussion, here is a standard, console-mode C++ program that compares the output of your code to versions with the aforementioned corrections applied:
#include <iostream>
#include <cstdint>
#include <inttypes.h>
int main()
{
uint32_t test = 0x84FF0048uL;
int i;
// Your code ...
for (i = 1; i <= 32; i++) {
int mask = 1 << i;
int masked_n = test & mask;
int thebit = masked_n >> i;
printf("%i", thebit);
}
printf("\n");
// Corrected limits/order/types ...
for (i = 31; i >= 0; --i) {
uint32_t mask = (uint32_t)(1) << i;
uint32_t masked_n = test & mask;
uint32_t thebit = masked_n >> i;
printf("%"PRIu32, thebit);
}
printf("\n");
// Better ...
for (i = 31; i >= 0; --i) {
printf("%"PRIu32, test >> i & 1);
}
printf("\n");
return 0;
}
The three lines of output (first one wrong, as you know; last two correct) are:
001001000000000111111110010000-10
10000100111111110000000001001000
10000100111111110000000001001000
Notes:
(1) On the use of the funny-looking "%"PRu32 format specifier for printing the uint32_t types, see: printf format specifiers for uint32_t and size_t.
(2) The cast on the (uint32_t)(1) constant will ensure that the bit-shift is safe, even when int and unsigned are 16-bit types; without that, you would get undefined behaviour in such a case.
When you printing out a binary string representation of a number, you print the Most Signification Bit (MSB) first, whether the number is a uint32_t or uint16_t, so you will need to have a mask for detecting whether the MSB is a 1 or 0, so you need a mask of 0x80000000, and shift-down on each iteration.
#define DR_REG_RNG_BASE 0x3ff75144
void printBitByBit( ){
// READ_PERI_REG is the ESP32 function to read DR_REG_RNG_BASE
uint32_t rndval = READ_PERI_REG(DR_REG_RNG_BASE);
Serial.println(rndval, HEX); //print out the value in hex for verification purpose
uint32_t mask = 0x80000000;
for (int i=1; i<32; i++) {
Serial.println((rndval & mask) ? "1" : "0");
mask = (uint32_t) mask >> 1;
}
Serial.println("\n");
}
For Arduino, there are actually a couple of built-in functions that can print out the binary string representation of a number. Serial.print(x, BIN) allows you to specify the number base on the 2nd function argument.
Another function that can achieve the same result is itoa(x, str, base) which is not part of standard ANSI C or C++, but available in Arduino to allow you to convert the number x to a str with number base specified.
char str[33];
itoa(rndval, str, 2);
Serial.println(str);
However, both functions does not pad with leading zero, see the result here:
36E68B6D // rndval in HEX
00110110111001101000101101101101 // print by our function
110110111001101000101101101101 // print by Serial.print(rndval, BIN)
110110111001101000101101101101 // print by itoa(rndval, str, 2)
BTW, Arduino is c++, so don't use c tag for your post. I changed it for you.

How to bring equal elements together using thrust without sort

I have an array of elements such that each element defines the "equal to" operator only.
In other words no ordering is defined for such type of element.
Since I can't use thrust::sort as in the thrust histogram example how can I bring equal elements together using thrust?
For example:
my array is initially
a e t b c a c e t a
where identical characters represent equal elements.
After the elaboration, the array should be
a a a t t b c c e e
but it can be also
a a a c c t t e e b
or any other permutation.
I would recommend that you follow an approach such as that laid out by #m.s. in the posted answer there. As I stated in the comments, ordering of elements is an extremely useful mechanism that aids in the reduction of complexity for problems like this.
However the question as posed asks if it is possible to group like elements without sorting. With an inherently parallel processor like a GPU, I spent some time thinking about how it might be accomplished without sorting.
If we have both a large number of objects, as well as a large number of unique object types, then I think it's possible to bring some level of parallelism to the problem, however my approach outlined here will still have atrocious, scattered memory access patterns. For the case where there are only a small number of distinct or unique object types, the algorithm I am discussing here has little to commend it. This is just one possible approach. There may well be other, far better approaches:
The starting point is to develop a set of "linked lists" that indicate the matching neighbor to the left and the matching neighbor to the right, for each element. This is accomplished via my search_functor and thrust::for_each, on the entire data set. This step is reasonably parallel and also has reasonable memory access efficiency for large data sets, but it does require a worst-case traversal of the entire data set from start to finish (a side-effect, I would call it, of not being able to use ordering; we must compare every element to other elements until we find a match). The generation of two linked lists allows us to avoid all-to-all comparisons.
Once we have the lists (right-neighbor and left-neighbor) built from step 1, it's an easy matter to count the number of unique objects, using thrust::count.
We then get the starting indexes of each unique element (i.e. the leftmost index of each type of unique element, in the dataset), using thrust::copy_if stream compaction.
The next step is to count the number of instances of each of the unique elements. This step is doing list traversal, one thread per element list. If I have a small number of unique elements, this will not effectively utilize the GPU. In addition, the list traversal will result in lousy access patterns.
After we have counted the number of each type of object, we can then build a sequence of starting indices for each object type in the output list, via thrust::exclusive_scan on the numbers of each type of object.
Finally, we can copy each input element to it's appropriate place in the output list. Since we have no way to group or order the elements yet, we must again resort to list traversal. Once again, this will be inefficient use of the GPU if the number of unique object types is small, and will also have lousy memory access patterns.
Here's a fully worked example, using your sample data set of characters. To help clarify the idea that we intend to group objects that have no inherent ordering, I have created a somewhat arbitrary object definition (my_obj), that has the == comparison operator defined, but no definition for < or >.
$ cat t707.cu
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/for_each.h>
#include <thrust/transform.h>
#include <thrust/transform_scan.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/copy.h>
#include <thrust/count.h>
#include <iostream>
template <typename T>
class my_obj
{
T element;
int index;
public:
__host__ __device__ my_obj() : element(0), index(0) {};
__host__ __device__ my_obj(T a) : element(a), index(0) {};
__host__ __device__ my_obj(T a, int idx) : element(a), index(idx) {};
__host__ __device__
T get() {
return element;}
__host__ __device__
void set(T a) {
element = a;}
__host__ __device__
int get_idx() {
return index;}
__host__ __device__
void set_idx(int idx) {
index = idx;}
__host__ __device__
bool operator ==(my_obj &e2)
{
return (e2.get() == this->get());
}
};
template <typename T>
struct search_functor
{
my_obj<T> *data;
int end;
int *rn;
int *ln;
search_functor(my_obj<T> *_a, int *_rn, int *_ln, int len) : data(_a), rn(_rn), ln(_ln), end(len) {};
__host__ __device__
void operator()(int idx){
for (int i = idx+1; i < end; i++)
if (data[idx] == data[i]) {
ln[i] = idx;
rn[idx] = i;
return;}
return;
}
};
template <typename T>
struct copy_functor
{
my_obj<T> *data;
my_obj<T> *result;
int *rn;
copy_functor(my_obj<T> *_in, my_obj<T> *_out, int *_rn) : data(_in), result(_out), rn(_rn) {};
__host__ __device__
void operator()(const thrust::tuple<int, int> &t1) const {
int idx1 = thrust::get<0>(t1);
int idx2 = thrust::get<1>(t1);
result[idx1] = data[idx2];
int i = rn[idx2];
int j = 1;
while (i != -1){
result[idx1+(j++)] = data[i];
i = rn[i];}
return;
}
};
struct count_functor
{
int *rn;
int *ot;
count_functor(int *_rn, int *_ot) : rn(_rn), ot(_ot) {};
__host__ __device__
int operator()(int idx1, int idx2){
ot[idx1] = idx2;
int i = rn[idx1];
int count = 1;
while (i != -1) {
ot[i] = idx2;
count++;
i = rn[i];}
return count;
}
};
using namespace thrust::placeholders;
int main(){
// data setup
char data[] = { 'a' , 'e' , 't' , 'b' , 'c' , 'a' , 'c' , 'e' , 't' , 'a' };
int sz = sizeof(data)/sizeof(char);
for (int i = 0; i < sz; i++) std::cout << data[i] << ",";
std::cout << std::endl;
thrust::host_vector<my_obj<char> > h_data(sz);
for (int i = 0; i < sz; i++) { h_data[i].set(data[i]); h_data[i].set_idx(i); }
thrust::device_vector<my_obj<char> > d_data = h_data;
// create left and right neighbor indices
thrust::device_vector<int> ln(d_data.size(), -1);
thrust::device_vector<int> rn(d_data.size(), -1);
thrust::for_each(thrust::counting_iterator<int>(0), thrust::counting_iterator<int>(0) + sz, search_functor<char>(thrust::raw_pointer_cast(d_data.data()), thrust::raw_pointer_cast(rn.data()), thrust::raw_pointer_cast(ln.data()), d_data.size()));
// determine number of unique objects
int uni_objs = thrust::count(ln.begin(), ln.end(), -1);
// determine the number of instances of each unique object
// get object starting indices
thrust::device_vector<int> uni_obj_idxs(uni_objs);
thrust::copy_if(thrust::counting_iterator<int>(0), thrust::counting_iterator<int>(0)+d_data.size(), ln.begin(), uni_obj_idxs.begin(), (_1 == -1));
// count each object list
thrust::device_vector<int> num_objs(uni_objs);
thrust::device_vector<int> obj_type(d_data.size());
thrust::transform(uni_obj_idxs.begin(), uni_obj_idxs.end(), thrust::counting_iterator<int>(0), num_objs.begin(), count_functor(thrust::raw_pointer_cast(rn.data()), thrust::raw_pointer_cast(obj_type.data())));
// at this point, we have built object lists that have allowed us to identify a unique, orderable "type" for each object
// the sensible thing to do would be to employ a sort_by_key on obj_type and an index sequence at this point
// and use the reordered index sequence to reorder the original objects, thus grouping them
// however... without sorting...
// build output vector indices
thrust::device_vector<int> copy_start(num_objs.size());
thrust::exclusive_scan(num_objs.begin(), num_objs.end(), copy_start.begin());
// copy (by object type) input to output
thrust::device_vector<my_obj<char> > d_result(d_data.size());
thrust::for_each(thrust::make_zip_iterator(thrust::make_tuple(copy_start.begin(), uni_obj_idxs.begin())), thrust::make_zip_iterator(thrust::make_tuple(copy_start.end(), uni_obj_idxs.end())), copy_functor<char>(thrust::raw_pointer_cast(d_data.data()), thrust::raw_pointer_cast(d_result.data()), thrust::raw_pointer_cast(rn.data())));
// display results
std::cout << "Grouped: " << std::endl;
for (int i = 0; i < d_data.size(); i++){
my_obj<char> temp = d_result[i];
std::cout << temp.get() << ",";}
std::cout << std::endl;
for (int i = 0; i < d_data.size(); i++){
my_obj<char> temp = d_result[i];
std::cout << temp.get_idx() << ",";}
std::cout << std::endl;
return 0;
}
$ nvcc -o t707 t707.cu
$ ./t707
a,e,t,b,c,a,c,e,t,a,
Grouped:
a,a,a,e,e,t,t,b,c,c,
0,5,9,1,7,2,8,3,4,6,
$
In the discussion we found out that your real goal is to eliminate duplicates in a vector of float4 elements.
In order to apply thrust::unique the elements need to be sorted.
So you need a sort method for 4 dimensional data. This can be done using space-filling curves. I have previously used the z-order curve (aka morton code) to sort 3D data. There are efficient CUDA implementations for the 3D case available, however quick googling did not return a ready-to-use implementation for the 4D case.
I found a paper which lists a generic algorithm for sorting n-dimensional data points using the z-order curve:
Fast construction of k-Nearest Neighbor Graphs for Point Clouds
(see Algorithm 1 : Floating Point Morton Order Algorithm).
There is also a C++ implementation available for this algorithm.
For 4D data, the loop could be unrolled, but there might be simpler and more efficient algorithms available.
So the (not fully implemented) sequence of operations would then look like this:
#include <thrust/device_vector.h>
#include <thrust/unique.h>
#include <thrust/sort.h>
inline __host__ __device__ float dot(const float4& a, const float4& b)
{
return a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w;
}
struct identity_4d
{
__host__ __device__
bool operator()(const float4& a, const float4& b) const
{
// based on the norm function you provided in the discussion
return dot(a,b) < (0.1f*0.1f);
}
};
struct z_order_4d
{
__host__ __device__
bool operator()(const float4& p, const float4& q) const
{
// you need to implement the z-order algorithm here
// ...
}
};
int main()
{
const int N = 100;
thrust::device_vector<float4> data(N);
// fill the data
// ...
thrust::sort(data.begin(),data.end(), z_order_4d());
thrust::unique(data.begin(),data.end(), identity_4d());
}

Allocating using malloc inside a function passing the address of the pointer and using vector syntax?

What's wrong on the code below? I need to send the address of the pointer *A to the function, read some numbers with scanf inside it, return to main and print the numbers read at that function.
void create_number_vector(int **number)
{
(*number) = (int*)malloc(5*sizeof(int));
int i;
for(i=0; i<5; i++){
scanf("%d",number[i]);
}
}
int main(void){
int i, *A;
create_number_vector(&A);
for(i=0; i<5; i++){
printf("%d",A[i]);
}
return 0;
}
Except one line(concept), everything is pretty much OK.
Problamatic line is:
scanf("%d",number[i]);
And should be replace with:
scanf("%d", *number+i);
Because our allocated variable is a pointer, we should use him like that, we should go to the 'i' address inside of the allocated variable and scan into him.
Ofcourse you can keep on using the "array" style usage, with this syntax:
scanf("%d", &(*number)[i]);
P.S
Don't forget to free the allocated resources at the end of the usage, altough this kind of small program that exits at the end of the echoing, it's still a good practice to always free your resources at the end of its usage.

Thrust - accessing neighbors

I would like to use Thrust's stream compaction functionality (copy_if) for distilling indices of elements from a vector if the elements adhere to a number of constraints. One of these constraints depends on the values of neighboring elements (8 in 2D and 26 in 3D). My question is: how can I obtain the neighbors of an element in Thrust?
The function call operator of the functor for the 'copy_if' basically looks like:
__host__ __device__ bool operator()(float x) {
bool mark = x < 0.0f;
if (mark) {
if (left neighbor of x > 1.0f) return false;
if (right neighbor of x > 1.0f) return false;
if (top neighbor of x > 1.0f) return false;
//etc.
}
return mark;
}
Currently I use a work-around by first launching a CUDA kernel (in which it is easy to access neighbors) to appropriately mark the elements. After that, I pass the marked elements to Thrust's copy_if to distill the indices of the marked elements.
I came across counting_iterator as a sort of substitute for directly using threadIdx and blockIdx to acquire the index of the processed element. I tried the solution below, but when compiling it, it gives me a "/usr/include/cuda/thrust/detail/device/cuda/copy_if.inl(151): Error: Unaligned memory accesses not supported". As far as I know I'm not trying to access memory in an unaligned fashion. Anybody knows what's going on and/or how to fix this?
struct IsEmpty2 {
float* xi;
IsEmpty2(float* pXi) { xi = pXi; }
__host__ __device__ bool operator()(thrust::tuple<float, int> t) {
bool mark = thrust::get<0>(t) < -0.01f;
if (mark) {
int countindex = thrust::get<1>(t);
if (xi[countindex] > 1.01f) return false;
//etc.
}
return mark;
}
};
thrust::copy_if(indices.begin(),
indices.end(),
thrust::make_zip_iterator(thrust::make_tuple(xi, thrust::counting_iterator<int>())),
indicesEmptied.begin(),
IsEmpty2(rawXi));
#phoad: you're right about the shared mem, it struck me after I already posted my reply, subsequently thinking that the cache probably will help me. But you beat me with your quick response. The if-statement however is executed in less than 5% of all cases, so either using shared mem or relying on the cache will probably have negligible impact on performance.
Tuples only support 10 values, so that would mean I would require tuples of tuples for the 26 values in the 3D case. Working with tuples and zip_iterator was already quite cumbersome, so I'll pass for this option (also from a code readability stand point). I tried your suggestion by directly using threadIdx.x etc. in the device function, but Thrust doesn't like that. I seem to be getting some unexplainable results and sometimes I end up with an Thrust error. The following program for example generates a 'thrust::system::system_error' with an 'unspecified launch failure', although it first correctly prints "Processing 10" to "Processing 41":
struct printf_functor {
__host__ __device__ void operator()(int e) {
printf("Processing %d\n", threadIdx.x);
}
};
int main() {
thrust::device_vector<int> dVec(32);
for (int i = 0; i < 32; ++i)
dVec[i] = i + 10;
thrust::for_each(dVec.begin(), dVec.end(), printf_functor());
return 0;
}
Same applies to printing blockIdx.x Printing blockDim.x however generates no error. I was hoping for a clean solution, but I guess I am stuck with my current work-around solution.

Standard predicates for STL count_if

I'm using the STL function count_if to count all the positive values
in a vector of doubles. For example my code is something like:
vector<double> Array(1,1.0)
Array.push_back(-1.0);
Array.push_back(1.0);
cout << count_if(Array.begin(), Array.end(), isPositive);
where the function isPositive is defined as
bool isPositive(double x)
{
return (x>0);
}
The following code would return 2. Is there a way of doing the above
without writting my own function isPositive? Is there a built-in
function I could use?
Thanks!
std::count_if(v.begin(), v.end(), std::bind1st(std::less<double>(), 0)) is what you want.
If you're already using namespace std, the clearer version reads
count_if(v.begin(), v.end(), bind1st(less<double>(), 0));
All this stuff belongs to the <functional> header, alongside other standard predicates.
If you are compiling with MSVC++ 2010 or GCC 4.5+ you can use real lambda functions:
std::count_if(Array.begin(), Array.end(), [](double d) { return d > 0; });
I don't think there is a build-in function.
However, you could use boost lambda http://www.boost.org/doc/libs/1_43_0/doc/html/lambda.html
to write it :
cout << count_if(Array.begin(), Array.end(), _1 > 0);
cout<<std::count_if (Array.begin(),Array.end(),std::bind2nd (std::greater<double>(),0)) ;
greater_equal<type>() -> if >= 0