I am trying to check the set bits of an unsigned long long in c++ using below algorithm which only checks whether the bit is set or not.But my problem is the answer that I get is wrong.Please help me understand how unsigned long long is stored in binary.
Code:
#include<stdio.h>
#include<iostream>
#define CHECK_BIT(var,pos) ((var) & (1<<(pos)))
using namespace std;
int main()
{
int pos=sizeof(unsigned long long)*8;
unsigned long long a;
cin>>a;
pos=pos-1;
while(pos>=0)
{
if(CHECK_BIT(a,pos))
cout<<"1";
else
cout<<"0";
--pos;
}
}
Input :
1000000000000000000
Output:
1010011101100100000000000000000010100111011001000000000000000000
Expected Output:
110111100000101101101011001110100111011001000000000000000000
Similarly for another input:
14141
Output :
0000000000000000001101110011110100000000000000000011011100111101
Expected Output:
11011100111101
In the second example(in fact for any small number) the binary pattern just repeats itself after 32 bits.
I think what you have is an issue in the bit set macro , please replace it w/
#define CHECK_BIT(var,pos) ((var) & (1LL<<(pos)))
Related
#include <bits/stdc++.h>
using namespace std;
#define fast ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0);
#define LL long long int
#define pb push_back
#define mp make_pair
#define PII pair<int,int>
#define PLL pair<LL,LL>
#define PIS pair< int,string>
#define test int t;cin>>t;while(t--)
#define ff first // error: 'std::set<std::pair<long long int, long long int> >::iterator' has no member named 'first'
#define ss second // error: 'std::set<std::pair<long long int, long long int> >::iterator' has no member named 'second'
#define INF 1000000000
#define input(a,n) for(i=1;i<=n;i++)cin>>a[i];
#define output(a,n) for(i=1;i<=n;i++)cout<<a[i]<<" ";
vector< vector<LL> > v(3002, vector<LL>(3002,-1));
set< pair<LL, LL> > se;
set< pair<LL, LL> >::iterator it;
int vis[3002]={0};
void exmin(LL a)
{
LL x,des,val,min=INF;
for(x=0;x<v[a].size();x++)
{
if(v[a][x]<min)
{
val=v[a][x];
des=x;
min=val;
}
}
se.insert(mp(val,des));
}
int main() {
fast
LL n,m,x,i,j,k,wt=0,s;
cin>>n>>m;
vector<int> ve;
for(x=1;x<=n;x++)
ve.pb(x);
for(x=0;x<m;x++)
{
cin>>i>>j>>k;
if(v[i][j]!=-1)
{
if(v[i][j]>k)
{
v[i][j]=k;
v[j][i]=k;
}
}
else
{
v[i][j]=k;
v[j][i]=k;
}
}
cin>>s;
ve.erase(ve.begin()+s-1);
while(ve.size()!=0)
{
for(x=0;x<v[s].size();x++)
{
if(v[s][x]!=-1 && vis[x]!=1)
{
exmin(x);
}
}
/* for(x=0;x<p.size();x++)
{
}*/
it=se.begin();
wt=wt+*(it).ff;
s=*(it).ss;
vis[*(it).ss]=1;
ve.erase(ve.begin()+*(it).ss-1);
se.erase(it);
}
return 0;
}
Still facing errors.
I am trying to implement Prim's algorithm.
Was not able to include the line numbers, hence attached the errors along with the lines itself.
Sorry , but could not help including the "abhorrent part" because the error is in that part.
Edit:
Found my mistake , it was a syntactical error.
Although you have a global set s
set< pair<LL, LL> > s;
You have also defined a local variable s of type long long:
LL n,m,x,i,j,k,wt=0,s;
which hides the global s. Obviously a long long has neither begin nor erase member functions since it is a primitive type. Hence the errors:
it=s.begin(); //error: request for member 'begin' in 's', which is of non-class type 'long long int'
s.erase(it); //error: request for member 'erase' in 's', which is of non-class type 'long long int'
To refer to the global s, use ::s, i.e.
::s.erase(it);
And lastly I want to point out that this "contestese" coding style you're using is abhorrent. Feel free to use it during contests as much as you like, but please edit it away when you post questions on SO.
Found the error in my code.
wt=wt+*(it).ff;
instead of *(it) it should have been (*it)
__device__ __inline__ double ld_gbl_cg(const double *addr) {
double return_value;
asm("ld.global.cg.f64 %0, [%1];" : "=d"(return_value) : "l"(addr));
return return_value;
}
The above code is from here:
CUDA disable L1 cache only for one variable
According to the author, "d" means float, "r" means int.
I want to write a small piece of inline asm code, I want to know whats the symbol for rest of the primitive type variables (like unsigned short, unsigned long long, float-32, etc), I cannot find that from ptx isa.
I use letter "l" to represent unsigned long long, is that correct?
You can find them here, but for the sake of completeness, the letters correspond to the underlying PTX register types:
"h" = .u16 reg
"r" = .u32 reg
"l" = .u64 reg
"f" = .f32 reg
"d" = .f64 reg
So an unsigned long long maps to "l" (for a 64 bit integer PTX register).
Basically what I want is an function works like hiloint2uint64(), just join two 32 bit integer and reinterpret the outcome as an uint64.
I cannot find any function in CUDA that can do this, anyhow, is there any ptx code that can do that kind of type casting?
You can define your own function like this:
__host__ __device__ unsigned long long int hiloint2uint64(int h, int l)
{
int combined[] = { h, l };
return *reinterpret_cast<unsigned long long int*>(combined);
}
Maybe a bit late by now, but probably the safest way to do this is to do it "manually" with bit-shifts and or:
uint32_t ui_h = h;
uint32_t ui_l = l;
return (uint64_t(h)<<32)|(uint64_t(l));
Note the other solution presented in the other answer isn't safe, because the array of ints might not be 8-byte aligned (and shifting some bits is faster than memory read/write, anyway)
Use uint2 (but define the temporary variable as 64-bit value: unsigned long long int) instead of arrays to be sure of alignment.
Be careful about the order of l and h.
__host__ __device__ __forceinline__ unsigned long long int hiloint2uint64(unsigned int h, unsigned int l)
{
unsigned long long int result;
uint2& src = *reinterpret_cast<uint2*>(&result);
src.x = l;
src.y = h;
return result;
}
The CUDA registers have a size of 32 bits anyway. In the best case the compiler won't need any extra code. In the worst case it has to reorder the registers by moving a 32-bit value.
Godbolt example https://godbolt.org/z/3r9WYK9e7 of how optimized it gets.
Let us assume that we have the following strings that we need to store in a CUDA array.
"hi there"
"this is"
"who is"
How do we declare a array on the GPU to do this. I tried using C++ strings but it does not work.
Probably the best way to do this is to use structure that is similar to common compressed sparse matrix formats. Store the character data packed into a single piece of linear memory, then use a separate integer array to store the starting indices, and perhaps a third array to store the string lengths. The storage overhead of the latter might be more efficient that storing a string termination byte for every entry in the data and trying to parse for the terminator inside the GPU code.
So you might have something like this:
struct gpuStringArray {
unsigned int * pos;
unsigned int * length; // could be a smaller type if strings are short
char4 * data; // 32 bit data type will improve memory throughput, could be 8 bit
}
Note I used a char4 type for the string data; the vector type will give better memory throughput, but it will mean strings need to be aligned/suitably padded to 4 byte boundaries. That may or may not be a problem depending on what a typical real string looks like in your application. Also, the type of the (optional) length parameter should probably be chosen to reflect the maximum admissible string length. If you have a lot of very short strings, it might be worth using an 8 or 16 bit unsigned type for the lengths to save memory.
A really simplistic code to compare strings stored this way in the style of strcmp might look something like this:
__device__ __host__
int cmp4(const char4 & c1, const char4 & c2)
{
int result;
result = c1.x - c2.x; if (result !=0) return result;
result = c1.y - c2.y; if (result !=0) return result;
result = c1.z - c2.z; if (result !=0) return result;
result = c1.w - c2.w; if (result !=0) return result;
return 0;
}
__device__ __host__
int strncmp4(const char4 * s1, const char4 * s2, const unsigned int nwords)
{
for(unsigned int i=0; i<nwords; i++) {
int result = cmp4(s1[i], s2[i]);
if (result != 0) return result;
}
return 0;
}
__global__
void tkernel(const struct gpuStringArray a, const gpuStringArray b, int * result)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
char4 * s1 = a.data + a.pos[idx];
char4 * s2 = b.data + b.pos[idx];
unsigned int slen = min(a.length[idx], b.length[idx]);
result[idx] = strncmp4(s1, s2, slen);
}
[disclaimer: never compiled, never tested, no warranty real or implied, use at your own risk]
There are some corner cases and assumptions in this which might catch you out depending on exactly what the real strings in your code look like, but I will leave those as an exercise to the reader to resolve. You should be able to adapt and expand this into whatever it is you are trying to do.
You have to use C-style character strings char *str. Searching for "CUDA string" on google would have given you this CUDA "Hello World" example as first hit: http://computer-graphics.se/hello-world-for-cuda.html
There you can see how to use char*-strings in CUDA. Be aware that standard C-functions like strcpy or strcmp are not available in CUDA!
If you want an array of strings, you just have to use char** (as in C/C++). As for strcmp and similar functions, it highly depends on what you want to do. CUDA is not really well suited for string operations, maybe it would help if you would provide a little more detail about what you want to do.
I am using C MySQL API
int numr=mysql_num_rows(res);
It always returns zero, but in my table there are 4 rows are there. However, I am getting the correct fields count.
what is the problem? Am i doing anything wrong?
Just a guess:
If you use mysql_use_result(), mysql_num_rows() does not return the correct value until all the rows in the result set have been retrieved.
(from the mysql manual)
The only reason to receive a zero from mysql_num_rows(<variable_name>) is because the query did not return anything.
You haven't posted the query here that you run and then assign the result to your res variable so we can't check it.
But try running that exact query in your DB locally through whatever DB management software you use and see if you are able to achieve any results.
If the query is working fine, then it must be the way you're running the query in C, otherwise your query is broken.
Maybe post up a bit more of your code from C where you make the query and then run it.
Thanks
If you just want to count the number of rows in a table, say
SELECT COUNT(*) FROM table_name
You will get back a single column in a single row containing the answer.
I too have this problem. But I noticed that mysql.h defines mysql_num_rows() to return a "my_ulonglong". Also in the header file you will see that there is a type def for my_ulonglong. On my system size of a my_ulonglong is 8 bytes. When we try to print this out or cast this to an int we probably get the first four bytes which are zero. However I printed out the eight bytes at the address of my_ulonglong variable and it prints all zeros. So I think this function just doesn't work.
`my_ulonglong numOfRows;
MYSQL *resource;
MYSQL *connection;
mysql_query(connection,"SELECT * FROM channels");
resource = mysql_use_result(connection);
numChannels = mysql_num_rows(resource);
printf("Writing numChannels: %lu\n", numChannels); // returns 0
printf("Size of numChannels is %d.\n", sizeof(numChannels)); // returns 8
// however
unsigned char * tempChar;
tempChar = (unsigned char *) &numChannels;
for (i=0; i< (int) sizeof(numChannels); ++i) {
printf("%02x", (unsigned int) *tempChar++);
}
printf("\n");
// returned 0000000000000000 so I think its a bug.
//mysql.h typedef for my_ulonglong and function mysql_num_rows()
#ifndef _global_h
#if defined(NO_CLIENT_LONG_LONG)
typedef unsigned long my_ulonglong;
#elif defined (__WIN__)
typedef unsigned __int64 my_ulonglong;
#else
typedef unsigned long long my_ulonglong;
#endif
#endif
my_ulonglong STDCALL mysql_num_rows(MYSQL_RES *res);
`