I'm trying to convert hex into bin. If i call bits(0x101) it prints 00011, which is obviously wrong. Im pretty sure its in the for loop. Any ideas??
int hextobin (int n){
char buffer[33];
if(n==0) {
putchar('0');
return 0;
}
char *cp = buffer + 32;
*cp = 0;
for(int i =0;i <=sizeof(n); i++){
--cp;
if(n & 1) *cp = '1';
else *cp = '0';
n >>= i;
}
printf(cp);
return 0;
}
Once you shift the last 1 bit out of n, it becomes a zero, and your loop aborts, even if there's bits left to deal with.
And do yourself a favor... indent your code properly. It's oh-so-much easier to read/debug when it's formatted properly.
Related
I have a requirement where I want to parallelize the following using CUDA thrust.
std::vector<float> a, b, c; // size of each is (size.x * size.y * size.z), kind of a 3D array.
What I am trying to do is this
a[i] = 0 if b[i] < 0
a[i] = c[i] if b[i] > 0
This is the host code.
for (int i = 0; i < size.x; i++)
for (int j = 0; j < size.y; j++)
for (int z = 0; z < size.z; z++) {
a.data[get_idx(i, j, z)] = (b.data[get_idx(i, j, z)] < 0) ?
(0) : (1 * c.data[get_idx(i, j, z)]);
}
get_idx() just converts the loop indices to array indices.
What I want is an equivalent thrust::api that does this.
I have the thrust::device_vector ready with the values of the corresponding a, b, c cudaCopied to the host.
thrust::device_vector<float> dev_a, dev_b, dev_c;
What I have tried is to use thrust::for_each but I am unable to find a way to assign dev_c[i] to dev_a[i].
I would love a nudge in the right direction, maybe which thrust:api is the most suitable. Thanks in advance.
After doing some more digging around, I found the correct thrust api.
thrust::replace_copy_if
It is an overload of replace_copy_if which takes as input a 'stencil' which acts as the condition based on which value is copied.
In my case, 'b' is the stencil.
The following code works now.
struct is_less_than_zero
{
__host__ __device__ bool operator()(float x)
{
return x < 0;
}
};
is_less_than_zero pred{};
thrust::replace_copy_if(thrust::device, c.begin(), c.end(),
b.begin(), a.begin(), pred(), 0);
I have p.ntp test particles and every i-th particle has Cartesian coordinates tp.rh[i].x, tp.rh[i].y, tp.rh[i].z. Within this set I need to find CLUSTERS. It means, that I am looking for particles closer to the i-th particle less than hill2 (tp.D_rel < hill2). The number of such a members is stored in N_conv.
I use this cycle for (int i = 0; i < p.ntp; i++), which goes through the data set. For each i-th particle I calculate squared distances tp.D_rel[idx] relative to the others members in the set. Then I use first thread (idx == 0) to find the number of cases, which satisfy my condition. At the end, If are there more than 1 (N_conv > 1) positive cases I need to write out all particles forming possible cluster together (triplets, ...).
My code works well only in cases, where i < blockDim.x. Why? Is there a general way, how to find clusters in a set of data, but write out only triplets and more?
Note: I know, that some cases will be found twice.
__global__ void check_conv_system(double t, struct s_tp tp, struct s_mp mp, struct s_param p, double *time_step)
{
const uint bid = blockIdx.y * gridDim.x + blockIdx.x;
const uint tid = threadIdx.x;
const uint idx = bid * blockDim.x + tid;
double hill2 = 1.0e+6;
__shared__ double D[200];
__shared__ int ID1[200];
__shared__ int ID2[200];
if (idx >= p.ntp) return;
int N_conv;
for (int i = 0; i < p.ntp; i++)
{
tp.D_rel[idx] = (double)((tp.rh[i].x - tp.rh[idx].x)*(tp.rh[i].x - tp.rh[idx].x) +
(tp.rh[i].y - tp.rh[idx].y)*(tp.rh[i].y - tp.rh[idx].y) +
(tp.rh[i].z - tp.rh[idx].z)*(tp.rh[i].z - tp.rh[idx].z));
__syncthreads();
N_conv = 0;
if (idx == 0)
{
for (int n = 0; n < p.ntp; n++) {
if ((tp.D_rel[n] < hill2) && (i != n)) {
N_conv = N_conv + 1;
D[N_conv] = tp.D_rel[n];
ID1[N_conv] = i;
ID2[N_conv] = n;
}
}
if (N_conv > 0) {
for(int k = 1; k < N_conv; k++) {
printf("%lf %lf %d %d \n",t/365.2422, D[k], ID1[k], ID2[k]);
}
}
} //end idx == 0
} //end for cycle for i
}
As RobertCrovella mentionned, without an MCV example, it is hard to tell.
However, the tp.D_del array seems to be written to with idx index, and read-back after a __syncthreads() with full range indexing n. Note that the call to __syncthreads() will only perform synchronization within a block, not accross the whole device. As a result, some thread/block will access data that has not been calculated yet, hence the failure.
You want to review your code so that values computed by blocks do not depend one-another.
I'm not too good with C++, however; my code compiled, but the function crashes my program, the below is a short sum-up of the code; it's not complete, however the function and call is there.
void rot13(char *ret, const char *in);
int main()
{
char* str;
MessageBox(NULL, _T("Test 1; Does get here!"), _T("Test 1"), MB_OK);
rot13(str, "uryyb jbeyq!"); // hello world!
/* Do stuff with char* str; */
MessageBox(NULL, _T("Test 2; Doesn't get here!"), _T("Test 2"), MB_OK);
return 0;
}
void rot13(char *ret, const char *in){
for( int i=0; i = sizeof(in); i++ ){
if(in[i] >= 'a' && in[i] <= 'm'){
// Crashes Here;
ret[i] += 13;
}
else if(in[i] > 'n' && in[i] <= 'z'){
// Possibly crashing Here too?
ret[i] -= 13;
}
else if(in[i] > 'A' && in[i] <= 'M'){
// Possibly crashing Here too?
ret[i] += 13;
}
else if(in[i] > 'N' && in[i] <= 'Z'){
// Possibly crashing Here too?
ret[i] -= 13;
}
}
}
The function gets to "Test 1; Does get Here!" - However it doesn't get to "Test 2; Doesn't get here!"
Thank you in advanced.
-Nick Daniels.
str is uninitialised and it is being dereferenced in rot13, causing the crash. Allocate memory for str before passing to rot13() (either on the stack or dynamically):
char str[1024] = ""; /* Large enough to hold string and initialised. */
The for loop inside rot13() is also incorrect (infinte loop):
for( int i=0; i = sizeof(in); i++ ){
change to:
for(size_t i = 0, len = strlen(in); i < len; i++ ){
You've got several problems:
You never allocate memory for your output - you never initialise the variable str. This is what's causing your crash.
Your loop condition always evaluates to true (= assigns and returns the assigned value, == tests for equality).
Your loop condition uses sizeof(in) with the intention of getting the size of the input string, but that will actually give you the size of the pointer. Use strlen instead.
Your algorithm increases or decreases the values in the return string by 13. The values you place in the output string are +/- 13 from the initial values in the output string, when they should be based on the input string.
Your algorithm doesn't handle 'A', 'n' or 'N'.
Your algorithm doesn't handle any non-alphabetic characters, yet the test string you use contains two.
I am trying to call a device function from global function. This function is only declaring an array to be used by all threads. But my problem when I printed the array its elements are not in the same order as declared. Is it because of all threads are creating the array again ? I confused about threads. If it is , Can I learn which thread is run first in global function and can I only allow it to declare the array for the others. Thanks.
Here my function to create array :
__device__ float myArray[20][20];
__device__ void calculation(int no){
filterWidth = 3+(2*no);
filterHeight = 3+(2*no);
int arraySize = filterWidth;
int middle = (arraySize - 1) / 2;
int startIndex = middle;
int stopIndex = middle;
// at first , all values of array are 0
for(int i=0; i<arraySize; i++)
for (int j = 0; j < arraySize; j++)
{
myArray[i][j] = 0;
}
// until middle line of the array, required indexes are 1
for (int i = 0; i < middle; i++)
{
for (int j = startIndex; j <= stopIndex; j++)
{ myArray[i][j] = 1; sum+=1; }
startIndex -= 1;
stopIndex += 1;
}
// for middle line
for (int i = 0; i < arraySize; i++)
{myArray[middle][i] = 1; sum+=1;}
// after middle line of the array, required indexes are 1
startIndex += 1;
stopIndex -= 1;
for (int i = (middle + 1); i < arraySize; i++)
{
for (int j = startIndex; j <= stopIndex; j++)
{ myArray[i][j] = 1; sum+=1; }
startIndex +=1 ;
stopIndex -= 1;
}
filterFactor = 1.0f / sum;
}
And global function :
__global__ void FilterKernel(Format24bppRgb* imageData)
{
int tidX = threadIdx.x + blockIdx.x * blockDim.x;
int tidY = threadIdx.y + blockIdx.y * blockDim.y;
Colour Cpixel = Colour (imageData[tidX + tidY*imageWidth] );
float depthPixel = Colour(depthData[tidX + tidY*imageWidth]).Red;
float absoluteDistanceFromFocus = fabs (depthPixel - focusDepth);
if(depthPixel == 0)
return;
Colour Cresult = Cpixel;
for (int i=0;i<8;i++)
{
calculation(i);
...
...
}
}
If you really want to select and force one thread to call the function and the rest to wait for it to do so, use __shared__ memory for the array created by the device function so that all threads in a block see the same one, and you can call it with:
for (int i=0;i<8;i++)
{
if (threadIdx.x == 0 && threadIdx.y == 0)
calculation(i);
__syncthreads();
...
}
Of course, this won't work between blocks - in a globally defined function, you have no control over the order in which blocks are computed.
Instead, if you can, you should do the initialization calculation (that only 1 thread needs to do) on the CPU and memcpy it to the GPU before launching your kernel. It looks like you'll use 8x the memory for your myArray's, but it'll dramatically speed up your computation.
I am looking for an optimal algorithm to find out remaining all possible permutation
of a give binary number.
For ex:
Binary number is : ........1. algorithm should return the remaining 2^7 remaining binary numbers, like 00000001,00000011, etc.
Thanks,
sathish
The example given is not a permutation!
A permutation is a reordering of the input.
So if the input is 00000001, 00100000 and 00000010 are permutations, but 00000011 is not.
If this is only for small numbers (probably up to 16 bits), then just iterate over all of them and ignore the mismatches:
int fixed = 0x01; // this is the fixed part
int mask = 0x01; // these are the bits of the fixed part which matter
for (int i=0; i<256; i++) {
if (i & mask == fixed) {
print i;
}
}
to find all you aren't going to do better than looping over all numbers e.g. if you want to loop over all 8 bit numbers
for (int i =0; i < (1<<8) ; ++i)
{
//do stuff with i
}
if you need to output in binary then look at the string formatting options you have in what ever language you are using.
e.g.
printf("%b",i); //not standard in C/C++
for calculation the base should be irrelavent in most languages.
I read your question as: "given some binary number with some bits always set, create the remaining possible binary numbers".
For example, given 1xx1: you want: 1001, 1011, 1101, 1111.
An O(N) algorithm is as follows.
Suppose the bits are defined in mask m. You also have a hash h.
To generate the numbers < n-1, in pseudocode:
counter = 0
for x in 0..n-1:
x' = x | ~m
if h[x'] is not set:
h[x'] = counter
counter += 1
The idea in the code is to walk through all numbers from 0 to n-1, and set the pre-defined bits to 1. Then memoize the resulting number (iff not already memoized) by mapping the resulting number to the value of a running counter.
The keys of h will be the permutations. As a bonus the h[p] will contain a unique index number for the permutation p, although you did not need it in your original question, it can be useful.
Why are you making it complicated !
It is as simple as the following:
// permutation of i on a length K
// Example : decimal i=10 is permuted over length k= 7
// [10]0001010-> [5] 0000101-> [66] 1000010 and 33, 80, 40, 20 etc.
main(){
int i=10,j,k=7; j=i;
do printf("%d \n", i= ( (i&1)<< k + i >>1); while (i!=j);
}
There are many permutation generating algorithms you can use, such as this one:
#include <stdio.h>
void print(const int *v, const int size)
{
if (v != 0) {
for (int i = 0; i < size; i++) {
printf("%4d", v[i] );
}
printf("\n");
}
} // print
void visit(int *Value, int N, int k)
{
static level = -1;
level = level+1; Value[k] = level;
if (level == N)
print(Value, N);
else
for (int i = 0; i < N; i++)
if (Value[i] == 0)
visit(Value, N, i);
level = level-1; Value[k] = 0;
}
main()
{
const int N = 4;
int Value[N];
for (int i = 0; i < N; i++) {
Value[i] = 0;
}
visit(Value, N, 0);
}
source: http://www.bearcave.com/random_hacks/permute.html
Make sure you adapt the relevant constants to your needs (binary number, 7 bits, etc...)
If you are really looking for permutations then the following code should do.
To find all possible permutations of a given binary string(pattern) for example.
The permutations of 1000 are 1000, 0100, 0010, 0001:
void permutation(int no_ones, int no_zeroes, string accum){
if(no_ones == 0){
for(int i=0;i<no_zeroes;i++){
accum += "0";
}
cout << accum << endl;
return;
}
else if(no_zeroes == 0){
for(int j=0;j<no_ones;j++){
accum += "1";
}
cout << accum << endl;
return;
}
permutation (no_ones - 1, no_zeroes, accum + "1");
permutation (no_ones , no_zeroes - 1, accum + "0");
}
int main(){
string append = "";
//finding permutation of 11000
permutation(2, 6, append); //the permutations are
//11000
//10100
//10010
//10001
//01100
//01010
cin.get();
}
If you intend to generate all the string combinations for n bits , then the problem can be solved using backtracking.
Here you go :
//Generating all string of n bits assuming A[0..n-1] is array of size n
public class Backtracking {
int[] A;
void Binary(int n){
if(n<1){
for(int i : A)
System.out.print(i);
System.out.println();
}else{
A[n-1] = 0;
Binary(n-1);
A[n-1] = 1;
Binary(n-1);
}
}
public static void main(String[] args) {
// n is number of bits
int n = 8;
Backtracking backtracking = new Backtracking();
backtracking.A = new int[n];
backtracking.Binary(n);
}
}