int main ()
{
int num, i=num, isPrime;
printf("Enter an integer: ");
scanf("%d", &num);
while (i>=2)
{
if (num%i!=0)
i--;
if (num%i==0) //check if it is a factor
{
isPrime = 1;
for (int j=2; j<=i; j++)
{
if (i%j==0)
{
isPrime = 0;
break;
}
}
if (isPrime==1)
{
printf("%d ", i);
num = num/i;
}
}
}
return 0;
}
May I know why is my code not working?
I was trying to write a C code which print all prime factors of a given number from the biggest factor to the smallest and when I run it just show nothing after I input a number.
#include of required header stdio.h to use printf() and scanf() is missing.
num is assigned to i before a value is read to num. This is assigning an indeterminate value to i and using the value invokes undefined behavior.
Looping j until it becomes i is wrong because i%i will always become zero unless i is zero.
You should decrement i also when num%i==0 and i is not prime. Otherwise, the update of i will stop there and the loop may go infinitely.
The two if statements if (num%i!=0) and if (num%i==0) may see different values of i. This happens when num%i!=0 at the beginning of the iteration because i is updated when the condition is true. You should use else instead of the second if statement.
Fixed code:
#include <stdio.h>
int main (void)
{
int num, i, isPrime;
printf("Please enter an integer: ");
if (scanf("%d", &num) != 1)
{
fputs("read error\n", stderr);
return 1;
}
i=num;
while (i>=2)
{
if (num%i!=0) //check if it is a factor
{
i--;
}
else
{
isPrime = 1;
for (int j=2; j<i; j++)
{
if (i%j==0)
{
isPrime = 0;
break;
}
}
if (isPrime==1)
{
printf("%d ", i);
num = num/i;
}
else
{
i--;
}
}
}
return 0;
}
At least one bug here:
int num, i=num, isPrime;
It seems you are trying to make i equal to num. But here `num isn't even initialized yet. You have to do the assignment after reading user input:
int num, i, isPrime;
printf("Please enter an integer: ");
scanf("%d", &num);
i=num;
Also the check for primes is wrong:
for (int j=2; j<=i; j++)
{
if (i%j==0)
{
isPrime = 0;
break;
}
}
Since you use j<=i then j will eventually be equal to i and i%j==0 will be true so you don't find any primes. Try j<i instead.
I'm trying to learn pointers in C++, but seems that it get more complicated...
In the main loop
int i;
for (i = 0; i < 5; ++i){
if (fun == arrfun[i]) break;
}
How is that fun==arrfun[i] at fun2 if both fun and arrfun start looping form 0? Hence they should equal at log(x) instead. How could I loop to sin or cos, etc?
#include <iostream>
#include <cmath>
using namespace std;
typedef double(*FUNDtoD)(double);
typedef FUNDtoD ARRFUN[];
FUNDtoD funmax(ARRFUN, double);
double fun0(double x) { return log(x); }
double fun1(double x) { return x*x; }
double fun2(double x) { return exp(x); }
double fun3(double x) { return sin(x); }
double fun4(double x) { return cos(x); }
int main() {
ARRFUN arrfun = { fun0, fun1, fun2, fun3, fun4 };
FUNDtoD fun = funmax(arrfun, 1);
int i;
for (i = 0; i < 5; ++i){
if (fun == arrfun[i]) break;
}
cout.precision(14);
cout << "Largest value at x=1 assumed by function # "
<< i << ".\nThe value is " << fun(2) << endl;
return 0;
}
FUNDtoD funmax(ARRFUN f, double x){
double m = f[0](x), z;
int k = 0;
for (int i = 1; i < 5; i++){
if ((z = f[i](x)) > m) {
m = z;
k = i;
}
}
return f[k];
}
I don't understand how function FUNDtoD funmax is working at the bottom, could somebody clarify it please, many thanks.
How is that fun==arrfun[i] at fun2 if both fun and arrfun start looping form 0? > Hence they should equal at log(x) instead. How could I loop to sin or cos, etc?
fun is not being looped, the same pointer is checked against each function pointer stored in the array (arrfun). It is simply trying to find the index of the returned function pointer. If sin(x) gave the highest value then that loop would finish with a 4 in i.
don't understand how function FUNDtoD funmax is working at the bottom, could
somebody clarify it please, many thanks.
It breaks down as follows:
Firstly it performs m = f0;
f[0] -> fun0 so the result is m = log( x );
Next it steps through the other 4 functions and tests whether the operation on x results in a higher valuer than the previous. It then stores the index of the function that returned the highest value.
Finally it returns that function pointer.
I am trying to find the minimum distance between n points in cuda. I wrote the below code. This is working fine for number of points from 1 to 1024 i.e., 1 block. But if num_points is greater than 1024 i am getting wrong value for minimum distance. I am checking the gpu min value with the value I found in CPU using brute force algorithm.
The min value is stored in the temp1[0] at the end of kernel function.
I don't know what is wrong in this. Please help me out..
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>
#define MAX_POINTS 50000
__global__ void minimum_distance(float * X, float * Y, float * D, int n) {
__shared__ float temp[1024];
float temp1[1024];
int tid = threadIdx.x;
int bid = blockIdx.x;
int ref = tid+bid*blockDim.x;
temp[ref] = 1E+37F;
temp1[bid] = 1E+37F;
float dx,dy;
float Dij;
int i;
//each thread will take a point and find min dist to all
// points greater than its unique id(ref)
for (i = ref + 1; i < n; i++)
{
dx = X[ref]-X[i];
dy = Y[ref]-Y[i];
Dij = sqrtf(dx*dx+dy*dy);
if (temp[tid] > Dij)
{
temp[tid] = Dij;
}
}
__syncthreads();
//In each block the min value is stored in temp[0]
if(tid == 0)
{
if( bid == (n-1)/1024 ) {
int end = n - (bid) * 1024;
for (i = 1; i < end; i++ )
{
if (temp[i] < temp[tid])
temp[tid] = temp[i];
}
temp1[bid] = temp[tid];
}
else {
for (i = 1; i < 1024; i++ )
{
if (temp[i] < temp[tid])
temp[tid] = temp[i];
}
temp1[bid] = temp[tid];
}
}
__syncthreads();
//Here the min value is stored in temp1[0]
if (ref == 0)
{
for (i = 1; i <= (n-1)/1024; i++)
if( temp1[bid] > temp1[i])
temp1[bid] = temp1[i];
*D=temp1[bid];
}
}
//part of Main function
//kernel function invocation
// Invoking kernel of 1D grid and block sizes
// Vx and Vy are arrays of x-coordinates and y-coordinates respectively
int main(int argc, char* argv[]) {
.
.
blocks = (num_points-1)/1024 + 1;
minimum_distance<<<blocks,1024>>>(Vx,Vy,dmin_dist,num_points);
.
.
I'd say what's wrong is your choice of algorithm. You can certainly do better than O(n^2) - even if yours is pretty straightforward. Sure, on 5,000 points it might not seem terrible, but try 50,000 points and you'll feel the pain...
I'd think about parallelizing the construction of a Voronoi Diagram, or maybe some kind of BSP-like structure which might be easier to query with less code divergence.
I use a GTX 280, which has compute capability 1.3 and supports atomic operations on shared memory. I am using cuda SDK 2.2 and VS 2005. In my program I have to extensively use atomic operations because there is simply no other way.
One example is that I have to calculate the running sum of an array and find out the index where the sum exceeds a given cut off value. For this I am using a variant of scan algorithm and using atomicMin to store index while the value is less than the threshold. So this way at the end the shared memory would have the index where the value is just less than the threshold.
This is just one component of the kernel, and there are many similar code blocks in the kernel call.
I am having 3 problems
Firstly I have not been able to compile the code as it say atomic operations are not defined, I have searched but not found which file I have to add.
Second, I somehow managed to compile the code by copying it in the code provided by CUDA SDK, but then it is saying the atomic operations are not supported on shared memory, where as it is running in the following program
Even when I worked around a hack by giving -arch sm_12 in the command line compilation, the code snippet using these atomic operations are taking an awful lot of time.
I believe that in the worst case I should get some sort of speed up, because there are not very many atomic operations and I using 1 block of 16x16. Unfortunately the serial code in running 10x faster.
Below I am posting the kernel cod*, this kernel call seems to be the bottleneck if anyone could help me optimize then it would be nice. The serial code is just performing these actions in a serial manner. I am using a block configuration of 16 X 16.
The code seems to be lengthy but actually it contains an if code block and while code block that perform almost the same task, but they could not be merged.
#define limit (int)(log((float)256)/log((float)2))
// This receives a pointer to an image, some variables and 4 more arrays cont(of size 256) vars(some constants), lim and buf(of image size)
// block configuration 1 block of 16x16
__global__ void kernel_Main(unsigned char* in, int height,int width, int bs,int th, double cutoff, uint* cont,int* vars, unsigned int* lim,unsigned int* buf)
{
int j = threadIdx.x;
int i = threadIdx.y;
int k = i*blockDim.x+j;
__shared__ int prefix_sum[256];
__shared__ int sum_s[256];
__shared__ int ary_shared[256];
__shared__ int he_shared[256];
// this is the threshold
int cutval = (2*width*height)*cutoff;
prefix_sum[k] = cont[k];
int l;
// a variant of scan algorithm
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > cutval)
{
atomicMin(&vars[cut],k);
}
}
__syncthreads();
}
// The first thread will store the value in global array
if(k==0)
{
vars[cuts]=prefix_sum[vars[cut]];
}
__syncthreads();
if(vars[n])
{
// bs = 7 in this case
if(i<bs && j<bs)
{
// using atomic add because the index could be same for 2 different threads
atomicAdd(&ary_shared[in[i*(width) + j]],1);
}
__syncthreads();
int minth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[k];
sum_s[k] = 0;
// Again prefix sum
int l;
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > minth)
{
atomicMin(&vars[hmin],k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hminc]=prefix_sum[255];
// because we will always overshoot by 1
vars[hmin]--;
}
__syncthreads();
int maxth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[255-k];
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > maxth)
{
atomicMin(&vars[hmax], k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hmaxc]=prefix_sum[255];
vars[hmax]--;
vars[hmax]=255-vars[hmax];
}
__syncthreads();
int rng = vars[hmax] - vars[hmin];
if(rng >= vars[cut])
{
if( k <= vars[hmin] )
he_shared[k] = 0;
else if( k >= vars[hmax])
he_shared[k] = 255;
else
he_shared[k] = (255 * (k - vars[hmin])) / rng;
}
__syncthreads();
// only 7x7 = 49 threads will do this
if(i>0 && i<=bs && j>0 && j<=bs)
{
int base = (vars[oy]*width+vars[ox])+ (i-1)*width + (j-1);
if(rng >= vars[cut])
{
int value = he_shared[in[base]];
buf[base]+=value;
lim[base]++;
}
else
{
buf[base]+=255;
lim[base]++;
}
}
if(k==0)
vars[n]--;
__syncthreads();
}// if(n) block closes here
while(vars[n])
{
if(k==0)
{
if( vars[ox]==0 && vars[d1] ==3 )
vars[d1] = 0; // l2r
else if( vars[ox]==0 && vars[d1]==2 )
vars[d1] = 3; // l u2d
else if( vars[ox]==width-bs && vars[d1]==0)
vars[d1] = 1; // r u2d
else if( vars[ox]==width-bs && vars[d1]==1)
vars[d1] = 2; // r2l
}
// Because this value will be changed so
// all the threads should set their registers before
// they move forward
int ox_d = vars[ox];
int oy_d = vars[oy];
// Just putting it here so that all the threads should have set their
// values before moving on, as this value will be changed
__syncthreads();
if(vars[d1]==0)
{
if(i == 0 && j < bs)
{
int index = j*width + ox_d + oy_d*width;
int index2 = j*width + ox_d + oy_d*width +bs;
atomicSub(&ary_shared[in[index]],1);
atomicAdd(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0)
vars[ox]++;
}
else if(vars[d1]==1||vars[d1]==3)
{
if(i == 0 && j < bs)
{
/*if(j==0)
printf("Entered 1||3\n");*/
int index = j*width + ox_d + oy_d*width;
int index2 = j*width + ox_d + (oy_d+bs)*width;
atomicSub(&ary_shared[in[index]],1);
atomicAdd(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0)
vars[oy]++;
}
else if(vars[d1]==2)
{
if(i == 0 && j < bs)
{
int index = j*width + ox_d-1 + oy_d*width;
int index2 = j*width + ox_d-1 + oy_d*width +bs;
atomicAdd(&ary_shared[in[index]],1);
atomicSub(&ary_shared[in[index2]],1);
}
// The first thread of the first block should set this value
if(k==0 )
vars[ox]--;
}
__syncthreads();
//ary_shared has been calculated
// Reset the hmin and hminc values
// again the same task as done in the if(n) loop
if(k==0)
{
vars[hmin]=0;
vars[hminc]=0;
vars[hmax]=0;
vars[hmaxc]=0;
}
__syncthreads();
int minth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[k];
int l;
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > minth)
{
atomicMin(&vars[hmin],k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hminc]=prefix_sum[255];
vars[hmin]--;
}
__syncthreads();
// Calculate maxth
int maxth = 1>((bs*bs)/20)? 1: ((bs*bs)/20);
prefix_sum[k] = ary_shared[255-k];
for(l=0;l<=limit;l++)
{
sum_s[k]=prefix_sum[k];
if(k >= (int)pow((float)2,(float)l))
{
prefix_sum[k]+=sum_s[k-(int)pow((float)2,(float)l)];
// Find out the minimum index for which the cummulative sum crosses threshold
if(prefix_sum[k] > maxth)
{
atomicMin(&vars[hmax], k);
}
}
__syncthreads();
}
// set the maximum value here
if(k==0)
{
vars[hmaxc]=prefix_sum[255];
vars[hmax]--;
vars[hmax]=255-vars[hmax];
}
__syncthreads();
int rng = vars[hmax] - vars[hmin];
if(rng >= vars[cut])
{
if( k <= vars[hmin] )
he_shared[k] = 0;
else if( k >= vars[hmax])
he_shared[k] = 255;
else
he_shared[k] = (255 * (k - vars[hmin])) / rng;
}
__syncthreads();
if(i>0 && i<=bs && j>0 && j<=bs)
{
int base = (vars[oy]*width+vars[ox])+ (i-1)*width + (j-1);
if(rng >= vars[cut])
{
int value = he_shared[in[base]];
buf[base]+=value;
lim[base]++;
}
else
{
buf[base]+=255;
lim[base]++;
}
}
// This just might cause a little bit of problem
if(k==0)
vars[n]--;
// All threads will wait here before continuing the while loop
__syncthreads();
}// end of while(n)
}
Firstly you need -arch sm_12 (or in your case it should really be -arch sm_13) to enable atomic operations.
As for performance, there is no guarantee that your kernel will be any faster than normal code on the CPU - there are many problems which really do not fit well into the CUDA model and these may indeed run much slower than on the CPU. You need to do some analysis/design/modelling before coding any CUDA kernels to prevent yourself wasting a lot of time on something that is never going to fly.
Having said that, there may be a way to implement your algo in a more efficient way - maybe you could post the CPU code and then invite ideas as to how to efficiently implement it in CUDA ?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Not so long ago I was in an interview, that required solving two very interesting problems. I'm curious how would you approach the solutions.
Problem 1 :
Product of everything except current
Write a function that takes as input two integer arrays of length len, input and index, and generates a third array, result, such that:
result[i] = product of everything in input except input[index[i]]
For instance, if the function is called with len=4, input={2,3,4,5}, and index={1,3,2,0}, then result will be set to {40,24,30,60}.
IMPORTANT: Your algorithm must run in linear time.
Problem 2 : ( the topic was in one of Jeff posts )
Shuffle card deck evenly
Design (either in C++ or in C#) a class Deck to represent an ordered deck of cards, where a deck contains 52 cards, divided in 13 ranks (A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K) of the four suits: spades (?), hearts (?), diamonds (?) and clubs (?).
Based on this class, devise and implement an efficient algorithm to shuffle a deck of cards. The cards must be evenly shuffled, that is, every card in the original deck must have the same probability to end up in any possible position in the shuffled deck.
The algorithm should be implemented in a method shuffle() of the class Deck:
void shuffle()
What is the complexity of your algorithm (as a function of the number n of cards in the deck)?
Explain how you would test that the cards are evenly shuffled by your method (black box testing).
P.S. I had two hours to code the solutions
First question:
int countZeroes (int[] vec) {
int ret = 0;
foreach(int i in vec) if (i == 0) ret++;
return ret;
}
int[] mysticCalc(int[] values, int[] indexes) {
int zeroes = countZeroes(values);
int[] retval = new int[values.length];
int product = 1;
if (zeroes >= 2) { // 2 or more zeroes, all results will be 0
for (int i = 0; i > values.length; i++) {
retval[i] = 0;
}
return retval;
}
foreach (int i in values) {
if (i != 0) product *= i; // we have at most 1 zero, dont include in product;
}
int indexcounter = 0;
foreach(int idx in indexes) {
if (zeroes == 1 && values[idx] != 0) { // One zero on other index. Our value will be 0
retval[indexcounter] = 0;
}
else if (zeroes == 1) { // One zero on this index. result is product
retval[indexcounter] = product;
}
else { // No zeros. Return product/value at index
retval[indexcounter] = product / values[idx];
}
indexcouter++;
}
return retval;
}
Worst case this program will step through 3 vectors once.
For the first one, first calculate the product of entire contents of input, and then for every element of index, divide the calculated product by input[index[i]], to fill in your result array.
Of course I have to assume that the input has no zeros.
Tnilsson, great solution ( because I've done it the exact same way :P ).
I don't see any other way to do it in linear time. Does anybody ? Because the recruiting manager told me, that this solution was not strong enough.
Are we missing some super complex, do everything in one return line, solution ?
A linear-time solution in C#3 for the first problem is:-
IEnumerable<int> ProductExcept(List<int> l, List<int> indexes) {
if (l.Count(i => i == 0) == 1) {
int singleZeroProd = l.Aggregate(1, (x, y) => y != 0 ? x * y : x);
return from i in indexes select l[i] == 0 ? singleZeroProd : 0;
} else {
int prod = l.Aggregate(1, (x, y) => x * y);
return from i in indexes select prod == 0 ? 0 : prod / l[i];
}
}
Edit: Took into account a single zero!! My last solution took me 2 minutes while I was at work so I don't feel so bad :-)
Product of everything except current in C
void product_except_current(int input[], int index[], int out[],
int len) {
int prod = 1, nzeros = 0, izero = -1;
for (int i = 0; i < len; ++i)
if ((out[i] = input[index[i]]) != 0)
// compute product of non-zero elements
prod *= out[i]; // ignore possible overflow problem
else {
if (++nzeros == 2)
// if number of zeros greater than 1 then out[i] = 0 for all i
break;
izero = i; // save index of zero-valued element
}
//
for (int i = 0; i < len; ++i)
out[i] = nzeros ? 0 : prod / out[i];
if (nzeros == 1)
out[izero] = prod; // the only non-zero-valued element
}
Here's the answer to the second one in C# with a test method. Shuffle looks O(n) to me.
Edit: Having looked at the Fisher-Yates shuffle, I discovered that I'd re-invented that algorithm without knowing about it :-) it is obvious, however. I implemented the Durstenfeld approach which takes us from O(n^2) -> O(n), really clever!
public enum CardValue { A, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, J, Q, K }
public enum Suit { Spades, Hearts, Diamonds, Clubs }
public class Card {
public Card(CardValue value, Suit suit) {
Value = value;
Suit = suit;
}
public CardValue Value { get; private set; }
public Suit Suit { get; private set; }
}
public class Deck : IEnumerable<Card> {
public Deck() {
initialiseDeck();
Shuffle();
}
private Card[] cards = new Card[52];
private void initialiseDeck() {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 13; ++j) {
cards[i * 13 + j] = new Card((CardValue)j, (Suit)i);
}
}
}
public void Shuffle() {
Random random = new Random();
for (int i = 0; i < 52; ++i) {
int j = random.Next(51 - i);
// Swap the cards.
Card temp = cards[51 - i];
cards[51 - i] = cards[j];
cards[j] = temp;
}
}
public IEnumerator<Card> GetEnumerator() {
foreach (Card c in cards) yield return c;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
foreach (Card c in cards) yield return c;
}
}
class Program {
static void Main(string[] args) {
foreach (Card c in new Deck()) {
Console.WriteLine("{0} of {1}", c.Value, c.Suit);
}
Console.ReadKey(true);
}
}
In Haskell:
import Array
problem1 input index = [(left!i) * (right!(i+1)) | i <- index]
where left = scanWith scanl
right = scanWith scanr
scanWith scan = listArray (0, length input) (scan (*) 1 input)
Vaibhav, unfortunately we have to assume, that there could be a 0 in the input table.
Second problem.
public static void shuffle (int[] array)
{
Random rng = new Random(); // i.e., java.util.Random.
int n = array.length; // The number of items left to shuffle (loop invariant).
while (n > 1)
{
int k = rng.nextInt(n); // 0 <= k < n.
n--; // n is now the last pertinent index;
int temp = array[n]; // swap array[n] with array[k] (does nothing if k == n).
array[n] = array[k];
array[k] = temp;
}
}
This is a copy/paste from the wikipedia article about the Fisher-Yates shuffle. O(n) complexity
Tnilsson, I agree that YXJuLnphcnQ solution is arguably faster, but the idee is the same. I forgot to add, that the language is optional in the first problem, as well as int the second.
You're right, that calculationg zeroes, and the product int the same loop is better. Maybe that was the thing.
Tnilsson, I've also uset the Fisher-Yates shuffle :). I'm very interested dough, about the testing part :)
Trilsson made a separate topic about the testing part of the question
How to test randomness (case in point - Shuffling)
very good idea Trilsson:)
YXJuLnphcnQ, that's the way I did it too. It's the most obvious.
But the fact is, that if you write an algorithm, that just shuffles all the cards in the collection one position to the right every time you call sort() it would pass the test, even though the output is not random.
Shuffle card deck evenly in C++
#include <algorithm>
class Deck {
// each card is 8-bit: 4-bit for suit, 4-bit for value
// suits and values are extracted using bit-magic
char cards[52];
public:
// ...
void shuffle() {
std::random_shuffle(cards, cards + 52);
}
// ...
};
Complexity: Linear in N. Exactly 51 swaps are performed. See http://www.sgi.com/tech/stl/random_shuffle.html
Testing:
// ...
int main() {
typedef std::map<std::pair<size_t, Deck::value_type>, size_t> Map;
Map freqs;
Deck d;
const size_t ntests = 100000;
// compute frequencies of events: card at position
for (size_t i = 0; i < ntests; ++i) {
d.shuffle();
size_t pos = 0;
for(Deck::const_iterator j = d.begin(); j != d.end(); ++j, ++pos)
++freqs[std::make_pair(pos, *j)];
}
// if Deck.shuffle() is correct then all frequencies must be similar
for (Map::const_iterator j = freqs.begin(); j != freqs.end(); ++j)
std::cout << "pos=" << j->first.first << " card=" << j->first.second
<< " freq=" << j->second << std::endl;
}
As usual, one test is not sufficient.