I have p.ntp test particles and every i-th particle has Cartesian coordinates tp.rh[i].x, tp.rh[i].y, tp.rh[i].z. Within this set I need to find CLUSTERS. It means, that I am looking for particles closer to the i-th particle less than hill2 (tp.D_rel < hill2). The number of such a members is stored in N_conv.
I use this cycle for (int i = 0; i < p.ntp; i++), which goes through the data set. For each i-th particle I calculate squared distances tp.D_rel[idx] relative to the others members in the set. Then I use first thread (idx == 0) to find the number of cases, which satisfy my condition. At the end, If are there more than 1 (N_conv > 1) positive cases I need to write out all particles forming possible cluster together (triplets, ...).
My code works well only in cases, where i < blockDim.x. Why? Is there a general way, how to find clusters in a set of data, but write out only triplets and more?
Note: I know, that some cases will be found twice.
__global__ void check_conv_system(double t, struct s_tp tp, struct s_mp mp, struct s_param p, double *time_step)
{
const uint bid = blockIdx.y * gridDim.x + blockIdx.x;
const uint tid = threadIdx.x;
const uint idx = bid * blockDim.x + tid;
double hill2 = 1.0e+6;
__shared__ double D[200];
__shared__ int ID1[200];
__shared__ int ID2[200];
if (idx >= p.ntp) return;
int N_conv;
for (int i = 0; i < p.ntp; i++)
{
tp.D_rel[idx] = (double)((tp.rh[i].x - tp.rh[idx].x)*(tp.rh[i].x - tp.rh[idx].x) +
(tp.rh[i].y - tp.rh[idx].y)*(tp.rh[i].y - tp.rh[idx].y) +
(tp.rh[i].z - tp.rh[idx].z)*(tp.rh[i].z - tp.rh[idx].z));
__syncthreads();
N_conv = 0;
if (idx == 0)
{
for (int n = 0; n < p.ntp; n++) {
if ((tp.D_rel[n] < hill2) && (i != n)) {
N_conv = N_conv + 1;
D[N_conv] = tp.D_rel[n];
ID1[N_conv] = i;
ID2[N_conv] = n;
}
}
if (N_conv > 0) {
for(int k = 1; k < N_conv; k++) {
printf("%lf %lf %d %d \n",t/365.2422, D[k], ID1[k], ID2[k]);
}
}
} //end idx == 0
} //end for cycle for i
}
As RobertCrovella mentionned, without an MCV example, it is hard to tell.
However, the tp.D_del array seems to be written to with idx index, and read-back after a __syncthreads() with full range indexing n. Note that the call to __syncthreads() will only perform synchronization within a block, not accross the whole device. As a result, some thread/block will access data that has not been calculated yet, hence the failure.
You want to review your code so that values computed by blocks do not depend one-another.
I am trying to find the minimum distance between n points in cuda. I wrote the below code. This is working fine for number of points from 1 to 1024 i.e., 1 block. But if num_points is greater than 1024 i am getting wrong value for minimum distance. I am checking the gpu min value with the value I found in CPU using brute force algorithm.
The min value is stored in the temp1[0] at the end of kernel function.
I don't know what is wrong in this. Please help me out..
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#define MAX_POINTS 50000
__global__ void minimum_distance(float * X, float * Y, float * D, int n) {
__shared__ float temp[1024];
float temp1[1024];
int tid = threadIdx.x;
int bid = blockIdx.x;
int ref = tid+bid*blockDim.x;
temp[ref] = 1E+37F;
temp1[bid] = 1E+37F;
float dx,dy;
float Dij;
int i;
//each thread will take a point and find min dist to all
// points greater than its unique id(ref)
for (i = ref + 1; i < n; i++)
{
dx = X[ref]-X[i];
dy = Y[ref]-Y[i];
Dij = sqrtf(dx*dx+dy*dy);
if (temp[tid] > Dij)
{
temp[tid] = Dij;
}
}
__syncthreads();
//In each block the min value is stored in temp[0]
if(tid == 0)
{
if( bid == (n-1)/1024 ) {
int end = n - (bid) * 1024;
for (i = 1; i < end; i++ )
{
if (temp[i] < temp[tid])
temp[tid] = temp[i];
}
temp1[bid] = temp[tid];
}
else {
for (i = 1; i < 1024; i++ )
{
if (temp[i] < temp[tid])
temp[tid] = temp[i];
}
temp1[bid] = temp[tid];
}
}
__syncthreads();
//Here the min value is stored in temp1[0]
if (ref == 0)
{
for (i = 1; i <= (n-1)/1024; i++)
if( temp1[bid] > temp1[i])
temp1[bid] = temp1[i];
*D=temp1[bid];
}
}
//part of Main function
//kernel function invocation
// Invoking kernel of 1D grid and block sizes
// Vx and Vy are arrays of x-coordinates and y-coordinates respectively
int main(int argc, char* argv[]) {
.
.
blocks = (num_points-1)/1024 + 1;
minimum_distance<<<blocks,1024>>>(Vx,Vy,dmin_dist,num_points);
.
.
I'd say what's wrong is your choice of algorithm. You can certainly do better than O(n^2) - even if yours is pretty straightforward. Sure, on 5,000 points it might not seem terrible, but try 50,000 points and you'll feel the pain...
I'd think about parallelizing the construction of a Voronoi Diagram, or maybe some kind of BSP-like structure which might be easier to query with less code divergence.
I'm trying to convert hex into bin. If i call bits(0x101) it prints 00011, which is obviously wrong. Im pretty sure its in the for loop. Any ideas??
int hextobin (int n){
char buffer[33];
if(n==0) {
putchar('0');
return 0;
}
char *cp = buffer + 32;
*cp = 0;
for(int i =0;i <=sizeof(n); i++){
--cp;
if(n & 1) *cp = '1';
else *cp = '0';
n >>= i;
}
printf(cp);
return 0;
}
Once you shift the last 1 bit out of n, it becomes a zero, and your loop aborts, even if there's bits left to deal with.
And do yourself a favor... indent your code properly. It's oh-so-much easier to read/debug when it's formatted properly.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Not so long ago I was in an interview, that required solving two very interesting problems. I'm curious how would you approach the solutions.
Problem 1 :
Product of everything except current
Write a function that takes as input two integer arrays of length len, input and index, and generates a third array, result, such that:
result[i] = product of everything in input except input[index[i]]
For instance, if the function is called with len=4, input={2,3,4,5}, and index={1,3,2,0}, then result will be set to {40,24,30,60}.
IMPORTANT: Your algorithm must run in linear time.
Problem 2 : ( the topic was in one of Jeff posts )
Shuffle card deck evenly
Design (either in C++ or in C#) a class Deck to represent an ordered deck of cards, where a deck contains 52 cards, divided in 13 ranks (A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K) of the four suits: spades (?), hearts (?), diamonds (?) and clubs (?).
Based on this class, devise and implement an efficient algorithm to shuffle a deck of cards. The cards must be evenly shuffled, that is, every card in the original deck must have the same probability to end up in any possible position in the shuffled deck.
The algorithm should be implemented in a method shuffle() of the class Deck:
void shuffle()
What is the complexity of your algorithm (as a function of the number n of cards in the deck)?
Explain how you would test that the cards are evenly shuffled by your method (black box testing).
P.S. I had two hours to code the solutions
First question:
int countZeroes (int[] vec) {
int ret = 0;
foreach(int i in vec) if (i == 0) ret++;
return ret;
}
int[] mysticCalc(int[] values, int[] indexes) {
int zeroes = countZeroes(values);
int[] retval = new int[values.length];
int product = 1;
if (zeroes >= 2) { // 2 or more zeroes, all results will be 0
for (int i = 0; i > values.length; i++) {
retval[i] = 0;
}
return retval;
}
foreach (int i in values) {
if (i != 0) product *= i; // we have at most 1 zero, dont include in product;
}
int indexcounter = 0;
foreach(int idx in indexes) {
if (zeroes == 1 && values[idx] != 0) { // One zero on other index. Our value will be 0
retval[indexcounter] = 0;
}
else if (zeroes == 1) { // One zero on this index. result is product
retval[indexcounter] = product;
}
else { // No zeros. Return product/value at index
retval[indexcounter] = product / values[idx];
}
indexcouter++;
}
return retval;
}
Worst case this program will step through 3 vectors once.
For the first one, first calculate the product of entire contents of input, and then for every element of index, divide the calculated product by input[index[i]], to fill in your result array.
Of course I have to assume that the input has no zeros.
Tnilsson, great solution ( because I've done it the exact same way :P ).
I don't see any other way to do it in linear time. Does anybody ? Because the recruiting manager told me, that this solution was not strong enough.
Are we missing some super complex, do everything in one return line, solution ?
A linear-time solution in C#3 for the first problem is:-
IEnumerable<int> ProductExcept(List<int> l, List<int> indexes) {
if (l.Count(i => i == 0) == 1) {
int singleZeroProd = l.Aggregate(1, (x, y) => y != 0 ? x * y : x);
return from i in indexes select l[i] == 0 ? singleZeroProd : 0;
} else {
int prod = l.Aggregate(1, (x, y) => x * y);
return from i in indexes select prod == 0 ? 0 : prod / l[i];
}
}
Edit: Took into account a single zero!! My last solution took me 2 minutes while I was at work so I don't feel so bad :-)
Product of everything except current in C
void product_except_current(int input[], int index[], int out[],
int len) {
int prod = 1, nzeros = 0, izero = -1;
for (int i = 0; i < len; ++i)
if ((out[i] = input[index[i]]) != 0)
// compute product of non-zero elements
prod *= out[i]; // ignore possible overflow problem
else {
if (++nzeros == 2)
// if number of zeros greater than 1 then out[i] = 0 for all i
break;
izero = i; // save index of zero-valued element
}
//
for (int i = 0; i < len; ++i)
out[i] = nzeros ? 0 : prod / out[i];
if (nzeros == 1)
out[izero] = prod; // the only non-zero-valued element
}
Here's the answer to the second one in C# with a test method. Shuffle looks O(n) to me.
Edit: Having looked at the Fisher-Yates shuffle, I discovered that I'd re-invented that algorithm without knowing about it :-) it is obvious, however. I implemented the Durstenfeld approach which takes us from O(n^2) -> O(n), really clever!
public enum CardValue { A, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, J, Q, K }
public enum Suit { Spades, Hearts, Diamonds, Clubs }
public class Card {
public Card(CardValue value, Suit suit) {
Value = value;
Suit = suit;
}
public CardValue Value { get; private set; }
public Suit Suit { get; private set; }
}
public class Deck : IEnumerable<Card> {
public Deck() {
initialiseDeck();
Shuffle();
}
private Card[] cards = new Card[52];
private void initialiseDeck() {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 13; ++j) {
cards[i * 13 + j] = new Card((CardValue)j, (Suit)i);
}
}
}
public void Shuffle() {
Random random = new Random();
for (int i = 0; i < 52; ++i) {
int j = random.Next(51 - i);
// Swap the cards.
Card temp = cards[51 - i];
cards[51 - i] = cards[j];
cards[j] = temp;
}
}
public IEnumerator<Card> GetEnumerator() {
foreach (Card c in cards) yield return c;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
foreach (Card c in cards) yield return c;
}
}
class Program {
static void Main(string[] args) {
foreach (Card c in new Deck()) {
Console.WriteLine("{0} of {1}", c.Value, c.Suit);
}
Console.ReadKey(true);
}
}
In Haskell:
import Array
problem1 input index = [(left!i) * (right!(i+1)) | i <- index]
where left = scanWith scanl
right = scanWith scanr
scanWith scan = listArray (0, length input) (scan (*) 1 input)
Vaibhav, unfortunately we have to assume, that there could be a 0 in the input table.
Second problem.
public static void shuffle (int[] array)
{
Random rng = new Random(); // i.e., java.util.Random.
int n = array.length; // The number of items left to shuffle (loop invariant).
while (n > 1)
{
int k = rng.nextInt(n); // 0 <= k < n.
n--; // n is now the last pertinent index;
int temp = array[n]; // swap array[n] with array[k] (does nothing if k == n).
array[n] = array[k];
array[k] = temp;
}
}
This is a copy/paste from the wikipedia article about the Fisher-Yates shuffle. O(n) complexity
Tnilsson, I agree that YXJuLnphcnQ solution is arguably faster, but the idee is the same. I forgot to add, that the language is optional in the first problem, as well as int the second.
You're right, that calculationg zeroes, and the product int the same loop is better. Maybe that was the thing.
Tnilsson, I've also uset the Fisher-Yates shuffle :). I'm very interested dough, about the testing part :)
Trilsson made a separate topic about the testing part of the question
How to test randomness (case in point - Shuffling)
very good idea Trilsson:)
YXJuLnphcnQ, that's the way I did it too. It's the most obvious.
But the fact is, that if you write an algorithm, that just shuffles all the cards in the collection one position to the right every time you call sort() it would pass the test, even though the output is not random.
Shuffle card deck evenly in C++
#include <algorithm>
class Deck {
// each card is 8-bit: 4-bit for suit, 4-bit for value
// suits and values are extracted using bit-magic
char cards[52];
public:
// ...
void shuffle() {
std::random_shuffle(cards, cards + 52);
}
// ...
};
Complexity: Linear in N. Exactly 51 swaps are performed. See http://www.sgi.com/tech/stl/random_shuffle.html
Testing:
// ...
int main() {
typedef std::map<std::pair<size_t, Deck::value_type>, size_t> Map;
Map freqs;
Deck d;
const size_t ntests = 100000;
// compute frequencies of events: card at position
for (size_t i = 0; i < ntests; ++i) {
d.shuffle();
size_t pos = 0;
for(Deck::const_iterator j = d.begin(); j != d.end(); ++j, ++pos)
++freqs[std::make_pair(pos, *j)];
}
// if Deck.shuffle() is correct then all frequencies must be similar
for (Map::const_iterator j = freqs.begin(); j != freqs.end(); ++j)
std::cout << "pos=" << j->first.first << " card=" << j->first.second
<< " freq=" << j->second << std::endl;
}
As usual, one test is not sufficient.