Trying to test validity of a sudoku puzzle: Having trouble with checking for duplicates in each 3x3 square - duplicates

Here's my method:
public static boolean hasDuplicatesInSections(int[][] puzzle){
z = 2;
w = 0;
while(z <= 8){
for(i = w; i <= z; i++){
for(j = w; j <= z; j++){
for(l = w; l <= z; l++){
for(k = w; k <= z; k++){
if ((puzzle[l][k] == puzzle[i][j]) &&
(puzzle[l][k] != 0) &&
(k != j)){
System.out.println("There was a match in a square. Shame on you.");
return true;
}
}
}
}
}
z += 3;
w += 3;
}
System.out.println("There wasn't a match in a square. Good job!");
return false;
}
I made a 9x9 2-D int array for the puzzle. 0 means there's just a blank space. The method looks through each 3x3 square and says whether duplicates were found in one. But sometimes it still says there wasn't a match when there was. Can anybody see what might be wrong?

Found what was wrong. All of my comparisons and iterations were correct except for when I checked:
(k != j))
It was only checking if a column wasn't equal to a column which is wrong. So what I did is just add the indexes of the rows and columns and then compare them like so:
(l+k) != (i+j))
Which worked perfectly!

Related

C++/CUDA equation printing negative for 2 positives [duplicate]

This question already has answers here:
Detecting signed overflow in C/C++
(13 answers)
Closed 2 months ago.
I am currently trying to run this equation with a ton of different inputs (x+y)^2 / xy
I have noticed an issue when i get x = 46340 and y = 1
It seems to output -46341 as seen here.
__global__ void proof() {
int x = 1;
int y = 1;
int multi_number = 100000;
bool logged = false;
while (true) {
long eq = ((x + y) * (x + y)) / (x * y);
if (x >= multi_number) {
x = 1;
y = y + 1;
}
if (eq < 4) {
if (logged == true) {
continue;
}
printf("\nGPU: Equation being used: (%d", x);
printf("+%d", y);
printf(")^2 / %d", x);
printf("*%d", y);
printf(" >= 4");
printf("\nGPU: Proof Failed: %d", x);
printf(", %d", y);
logged = true;
continue;
}
if (y >= multi_number) {
if (x >= multi_number) {
if (logged == true) {
continue;
}
printf("\nGPU: Proof is true for all cases.");
logged = true;
continue;
}
}
printf("\nGPU: Equation being used: (%d", x);
printf("+%d", y);
printf(")^2 / %d", x);
printf("*%d", y);
printf(" >= 4");
printf("\nGPU: %d", eq); // printing the equation
x = x + 1;
}
}
I have tried rewriting the equation and even putting the equation into a calculator. The calculator always gave a different response than the code is currently outputting, I have since double checked what I have put into the calculator and it remains the same.
I was expecting an output of 46342.
int overflow, used longs instead to fix the issue.

Codility CyclicRotation in-place implementation

I can't wrap my head around my solution for the problem:
A zero-indexed array A consisting of N integers is given. Rotation of the array means that each element is shifted right by one index, and the last element of the array is also moved to the first place.
For example, the rotation of array A = [3, 8, 9, 7, 6] is [6, 3, 8, 9, 7]. The goal is to rotate array A K times; that is, each element of A will be shifted to the right by K indexes.
I wanted to create solution without creating new array, but just modifying the one in place. It works... most of the time. Example tests pass, and other also pass, but some, for which Codility doesn't show the input, fail.
public int[] solution(int[] A, int K) {
for (var i = 0; i < A.Length - 1; i++) {
var destIndex = (i * K + K) % A.Length;
var destValue = A[destIndex];
A[destIndex] = A[0];
A[0] = destValue;
}
return A;
}
I've skipped the code related to the fact that you don't need to rotate whole array few times (ie. rotating by A.Length % K is enough).
What's wrong with my implementation? Am I missing some corner case?
The algorithm for this should be very simple, like:
aux <- array[array.Length - 1]
for index = 0, array.Length - 2 do
array[index + 1] <- array[index]
array[0] <- aux
endfor
Note, that in the special cases when array.Length <= 1, you do not need anything to achieve the rotation. If you want to achieve K rotations without an array, you can call this K times.
You will need to be tricky to achieve an optimal algorithm. Here I will take the following approach. An array's given element can have three different possible states. I explain it through the example. If I put the K'th element into a variable called aux and place the 0'th element in its place, then we have the following three states:
at element 0 the original element was already moved to another place, but the final element did not arrive yet. This is the moved state
at element K the original element was already moved and the final element already arrived there. This is the arrived state
at element 2 * K we did nothing so far, so there we have the original state
So, if we can mark somehow the elements, then the algorithm would look like this:
arrivedCount <- 0 //the number of arrived elements is counted in order to make sure we know when we need to search for an element with an original state
index <- 0
N <- array.Length
aux <- array[index]
mark(array[index], moved)
index <- (index + K) mod N
while arrivedCount < N do
state <- array[index]
if (state = moved) then
array[index] <- aux
arrivedCount <- arrivedCount + 1
mark(array[index], arrived)
if arrivedCount < N then
while (state(array[index]) <> original) do
index <- (index + 1) mod N
endwhile
aux <- array[index]
mark(array[index], moved)
index <- (index + K) mod N
endif
else //it is original
aux2 <- array[index]
array[index] <- aux
aux <- aux2
arrivedCount <- arrivedCount + 1
mark(array[index], arrived)
index <- (index + K) mod N
endif
endwhile
Now, how could we use this in practice? Let's consider the example when your array only has positive numbers as value. You mark all elements at start by assigning them their negative value (-5 instead of 5, for example). Whenever a state is modified to move, it will have a value of 0 and whenever it is arrived, you will have the positive number. It is up to you to define how you can mark such elements and you will need to do this in conformity with your task. If you are unable to mark the elements for any reason, then you will need to create an auxiliary array in order to solve this.
EDIT
Do not be afraid of the while, it should not search for too many steps because of the modulo classes. An implementation in Javascript:
var array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
var arrivedCount = 0;
var index = 0;
var N = array.length;
var K = 3;
for (var i = 0; i < array.length; i++) array[i] = -array[i];
var aux, aux2, state;
aux = array[index];
array[index] = 0;
index = (index + K) % N;
var step = 0
while ((arrivedCount < N) && (++step < 1000)) {
if (array[index] === 0) {
array[index] = -aux;
arrivedCount++;
if (arrivedCount < N) {
while (array[index] >= 0) index = (index + 1) % N;
aux = array[index];
array[index] = 0;
index = (index + K) % N;
}
} else {
aux2 = array[index];
array[index] = -aux;
aux = aux2;
arrivedCount++;
index = (index + K) % N
}
}
Change the definition of array and K according to your will.
I've finally managed to find out what's wrong with my solution thanks to #dvaergiller who posted the question with a similar to mine approach: Fastest algorithm for circle shift N sized array for M position
This answer made me realize my solution is failing every time the greatest common divisor of A.Length and K is not 1. #IsaacTurner solution is much easier to understand, and also shows there's no need to constantly switch places of elements, but now I see I can correct my solution.
I basically should not go through all elements in the array to find correct place for every one of them, because if the greatest common divisor is not 1 I'll start switching elements again. Instead it must be stopped as soon as full cycle is made and restarted to start switching based on next position.
Here's corrected version of my solution:
int gcd(int a, int b) => b == 0 ? a : gcd(b, a % b);
public int[] solution(int[] A, int K)
{
for (var i = 0; i < gcd(A.Length, K); i++)
{
for (var j = i; j < A.Length - 1; j++)
{
var destIndex = ((j-i) * K + K + i) % A.Length;
if (destIndex == i) break;
var destValue = A[destIndex];
A[destIndex] = A[i];
A[i] = destValue;
}
}
return A;
}
You can try this, I got a 100%:
public int[] solution(int[] A, int K) {
if (A.length == 0) {
return A;
}
for (int i=0;i<K;i++) {
int[] aux = new int[A.length];
aux[0] = A[A.length-1];
System.arraycopy(A, 0, aux, 1, A.length - 1);
A = aux;
}
return A;
}
The time complexity of this would be really less :)
I tried a different approach I believe, without loops:
https://app.codility.com/demo/results/trainingE33ZRF-KGU/
public static int[] rotate(int[] A, int K){
if ( K > A.length && A.length > 0)
K = K % A.length;
if (K == A.length || K == 0 || A.length == 0){
return A;
}
int[] second = Arrays.copyOfRange(A, 0, A.length - (K));
int[] first = Arrays.copyOfRange(A, A.length - (K), A.length );
int[] both = Arrays.copyOf(first, first.length + second.length);
System.arraycopy(second, 0, both, first.length, second.length);
return both;
}
Each element is shifted right by one index, and the last element of the array is also moved to the first place.
public int[] solution(int[] A, int K) {
if (K > 0 && A.length > 0) {
K = K % A.length;
int[] B = new int[A.length];
for (int i = 0; i < A.length; i++) {
if ((i + K) > (A.length - 1)) {
B[i + K - A.length] = A[i];
} else {
B[i + K] = A[i];
}
}
return B;
} else {
return A;
}
}

hex to bin in C

I'm trying to convert hex into bin. If i call bits(0x101) it prints 00011, which is obviously wrong. Im pretty sure its in the for loop. Any ideas??
int hextobin (int n){
char buffer[33];
if(n==0) {
putchar('0');
return 0;
}
char *cp = buffer + 32;
*cp = 0;
for(int i =0;i <=sizeof(n); i++){
--cp;
if(n & 1) *cp = '1';
else *cp = '0';
n >>= i;
}
printf(cp);
return 0;
}
Once you shift the last 1 bit out of n, it becomes a zero, and your loop aborts, even if there's bits left to deal with.
And do yourself a favor... indent your code properly. It's oh-so-much easier to read/debug when it's formatted properly.

Cuda demoting double to float error despite no doubles in code

I'm writing a kernel using PyCUDA. My GPU device only supports compute capability 1.1 (arch sm_11) and so I can only use floats in my code. I've taken great effort to ensure I'm doing everything with floats, but despite that, there is a particular line in my code that keeps causing a compiler error.
The chunk of code is:
// Gradient magnitude, so 1 <= x <= width, 1 <= y <= height.
if( j > 0 && j < im_width && i > 0 && i < im_height){
gradient_mag[idx(i,j)] = float(sqrt(x_gradient[idx(i,j)]*x_gradient[idx(i,j)] + y_gradient[idx(i,j)]*y_gradient[idx(i,j)]));
}
Here, idx() is a __device__ helper function that returns a linear index based on pixel indices i and j, and it only works with integers. I use it throughout and it doesn't give errors anywhere else, so I strongly suspect it's not idx(). The sqrt() call is just from the standard C math functions which support floats. All of the arrays involved, x_gradient , y_gradient, and gradient_mag are all float* and they are part of the input to my function (i.e. declared in Python, then converted to device variables, etc.).
I've tried removing the extra cast to float in my code above, with no luck. I've also tried doing something completely stupid like this:
// Gradient magnitude, so 1 <= x <= width, 1 <= y <= height.
if( j > 0 && j < im_width && i > 0 && i < im_height){
gradient_mag[idx(i,j)] = 3.0f; // also tried float(3.0) here
}
All of these variations give the same error:
pycuda.driver.CompileError: nvcc said it demoted types in source code it compiled--this is likely not what you want.
[command: nvcc --cubin -arch sm_11 -I/usr/local/lib/python2.7/dist-packages/pycuda-2011.1.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda kernel.cu]
[stderr:
ptxas /tmp/tmpxft_00004329_00000000-2_kernel.ptx, line 128; warning : Double is not supported. Demoting to float
]
Any ideas? I've debugged many errors in my code and was hoping to get it working tonight, but this has proved to be a bug that I cannot understand.
Added -- Here is a truncated version of the kernel that produces the same error above on my machine.
every_pixel_hog_kernel_source = \
"""
#include <math.h>
#include <stdio.h>
__device__ int idx(int ii, int jj){
return gridDim.x*blockDim.x*ii+jj;
}
__device__ int bin_number(float angle_val, int total_angles, int num_bins){
float angle1;
float min_dist;
float this_dist;
int bin_indx;
angle1 = 0.0;
min_dist = abs(angle_val - angle1);
bin_indx = 0;
for(int kk=1; kk < num_bins; kk++){
angle1 = angle1 + float(total_angles)/float(num_bins);
this_dist = abs(angle_val - angle1);
if(this_dist < min_dist){
min_dist = this_dist;
bin_indx = kk;
}
}
return bin_indx;
}
__device__ int hist_number(int ii, int jj){
int hist_num = 0;
if(jj >= 0 && jj < 11){
if(ii >= 0 && ii < 11){
hist_num = 0;
}
else if(ii >= 11 && ii < 22){
hist_num = 3;
}
else if(ii >= 22 && ii < 33){
hist_num = 6;
}
}
else if(jj >= 11 && jj < 22){
if(ii >= 0 && ii < 11){
hist_num = 1;
}
else if(ii >= 11 && ii < 22){
hist_num = 4;
}
else if(ii >= 22 && ii < 33){
hist_num = 7;
}
}
else if(jj >= 22 && jj < 33){
if(ii >= 0 && ii < 11){
hist_num = 2;
}
else if(ii >= 11 && ii < 22){
hist_num = 5;
}
else if(ii >= 22 && ii < 33){
hist_num = 8;
}
}
return hist_num;
}
__global__ void every_pixel_hog_kernel(float* input_image, int im_width, int im_height, float* gaussian_array, float* x_gradient, float* y_gradient, float* gradient_mag, float* angles, float* output_array)
{
/////
// Setup the thread indices and linear offset.
/////
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
int ang_limit = 180;
int ang_bins = 9;
float pi_val = 3.141592653589f; //91
/////
// Compute a Gaussian smoothing of the current pixel and save it into a new image array
// Use sync threads to make sure everyone does the Gaussian smoothing before moving on.
/////
if( j > 1 && i > 1 && j < im_width-2 && i < im_height-2 ){
// Hard-coded unit standard deviation 5-by-5 Gaussian smoothing filter.
gaussian_array[idx(i,j)] = float(1.0/273.0) *(
input_image[idx(i-2,j-2)] + float(4.0)*input_image[idx(i-2,j-1)] + float(7.0)*input_image[idx(i-2,j)] + float(4.0)*input_image[idx(i-2,j+1)] + input_image[idx(i-2,j+2)] +
float(4.0)*input_image[idx(i-1,j-2)] + float(16.0)*input_image[idx(i-1,j-1)] + float(26.0)*input_image[idx(i-1,j)] + float(16.0)*input_image[idx(i-1,j+1)] + float(4.0)*input_image[idx(i-1,j+2)] +
float(7.0)*input_image[idx(i,j-2)] + float(26.0)*input_image[idx(i,j-1)] + float(41.0)*input_image[idx(i,j)] + float(26.0)*input_image[idx(i,j+1)] + float(7.0)*input_image[idx(i,j+2)] +
float(4.0)*input_image[idx(i+1,j-2)] + float(16.0)*input_image[idx(i+1,j-1)] + float(26.0)*input_image[idx(i+1,j)] + float(16.0)*input_image[idx(i+1,j+1)] + float(4.0)*input_image[idx(i+1,j+2)] +
input_image[idx(i+2,j-2)] + float(4.0)*input_image[idx(i+2,j-1)] + float(7.0)*input_image[idx(i+2,j)] + float(4.0)*input_image[idx(i+2,j+1)] + input_image[idx(i+2,j+2)]);
}
__syncthreads();
/////
// Compute the simple x and y gradients of the image and store these into new images
// again using syncthreads before moving on.
/////
// X-gradient, ensure x is between 1 and width-1
if( j > 0 && j < im_width){
x_gradient[idx(i,j)] = float(input_image[idx(i,j)] - input_image[idx(i,j-1)]);
}
else if(j == 0){
x_gradient[idx(i,j)] = float(0.0);
}
// Y-gradient, ensure y is between 1 and height-1
if( i > 0 && i < im_height){
y_gradient[idx(i,j)] = float(input_image[idx(i,j)] - input_image[idx(i-1,j)]);
}
else if(i == 0){
y_gradient[idx(i,j)] = float(0.0);
}
__syncthreads();
// Gradient magnitude, so 1 <= x <= width, 1 <= y <= height.
if( j < im_width && i < im_height){
gradient_mag[idx(i,j)] = float(sqrt(x_gradient[idx(i,j)]*x_gradient[idx(i,j)] + y_gradient[idx(i,j)]*y_gradient[idx(i,j)]));
}
__syncthreads();
/////
// Compute the orientation angles
/////
if( j < im_width && i < im_height){
if(ang_limit == 360){
angles[idx(i,j)] = float((atan2(y_gradient[idx(i,j)],x_gradient[idx(i,j)])+pi_val)*float(180.0)/pi_val);
}
else{
angles[idx(i,j)] = float((atan( y_gradient[idx(i,j)]/x_gradient[idx(i,j)] )+(pi_val/float(2.0)))*float(180.0)/pi_val);
}
}
__syncthreads();
// Compute the HoG using the above arrays. Do so in a 3x3 grid, with 9 angle bins for each grid.
// forming an 81-vector and then write this 81 vector as a row in the large output array.
int top_bound, bot_bound, left_bound, right_bound, offset;
int window = 32;
if(i-window/2 > 0){
top_bound = i-window/2;
bot_bound = top_bound + window;
}
else{
top_bound = 0;
bot_bound = top_bound + window;
}
if(j-window/2 > 0){
left_bound = j-window/2;
right_bound = left_bound + window;
}
else{
left_bound = 0;
right_bound = left_bound + window;
}
if(bot_bound - im_height > 0){
offset = bot_bound - im_height;
top_bound = top_bound - offset;
bot_bound = bot_bound - offset;
}
if(right_bound - im_width > 0){
offset = right_bound - im_width;
right_bound = right_bound - offset;
left_bound = left_bound - offset;
}
int counter_i = 0;
int counter_j = 0;
int bin_indx, hist_indx, glob_col_indx, glob_row_indx;
int row_width = 81;
for(int pix_i = top_bound; pix_i < bot_bound; pix_i++){
for(int pix_j = left_bound; pix_j < right_bound; pix_j++){
bin_indx = bin_number(angles[idx(pix_i,pix_j)], ang_limit, ang_bins);
hist_indx = hist_number(counter_i,counter_j);
glob_col_indx = ang_bins*hist_indx + bin_indx;
glob_row_indx = idx(i,j);
output_array[glob_row_indx*row_width + glob_col_indx] = float(output_array[glob_row_indx*row_width + glob_col_indx] + float(gradient_mag[idx(pix_i,pix_j)]));
counter_j = counter_j + 1;
}
counter_i = counter_i + 1;
counter_j = 0;
}
}
"""
Here's an unmistakable case of using doubles:
gaussian_array[idx(i,j)] = float(1.0/273.0) *
See the double literals being divided?
But really, use float literals instead of double literals cast to floats - the casts are ugly, and I suggest they will hide bugs like this.
-------Edit 1/Dec---------
Firstly, thanks #CygnusX1, constant folding would prevent that calculation - I didn't even think of it.
I've tried to reproduce the environment of the error: I installed the CUDA SDK 3.2 (That #EMS has mentioned they seem to use in the lab), compiling the truncated kernel version above, and indeed nvopencc did optimize the above calculation away (thanks #CygnusX1), and indeed it didn't use doubles anywhere in the generated PTX code. Further, ptxas didn't give the error received by #EMS. From that, I thought the problem is outside of the every_pixel_hog_kernel_source code itself, perhaps in PyCUDA. However, using PyCUDA 2011.1.2 and compiling with that still does not produce a warning like in #EMS's question. I can get the error in the question, however it is by introducing a double calculation, such as removing the cast from gaussian_array[idx(i,j)] = float(1.0/273.0) *
To get to the same python case, does the following produce your error:
import pycuda.driver as cuda
from pycuda.compiler import compile
x=compile("""put your truncated kernel code here""",options=[],arch="sm_11",keep=True)
It doesn't produce an error in my circumstance, so there is a possibility I simply can't replicate your result.
However, I can give some advice. When using compile (or SourceModule), if you use keep=True, python will print out the folder where the ptx file is being generated just before showing the error message.
Then, if you can examine the ptx file generated in that folder and looking where .f64 appears it should give some idea of what is being treated as a double - however, deciphering what code that is in your original kernel is difficult - having the simplest example that produces your error will help you.
Your problem is here:
angle1 = 0.0;
0.0 is a double precision constant. 0.0f is a single precision constant.
(a comment, not an answer, but it is too big to put it as a comment)
Could you provide the PTX code around the line where the error occurs?
I tried compiling a simple kernel using the code you provided:
__constant__ int im_width;
__constant__ int im_height;
__device__ int idx(int i,int j) {
return i+j*im_width;
}
__global__ void kernel(float* gradient_mag, float* x_gradient, float* y_gradient) {
int i = threadIdx.x;
int j = threadIdx.y;
// Gradient magnitude, so 1 <= x <= width, 1 <= y <= height.
if( j > 0 && j < im_width && i > 0 && i < im_height){
gradient_mag[idx(i,j)] = float(sqrt(x_gradient[idx(i,j)]*x_gradient[idx(i,j)] + y_gradient[idx(i,j)]*y_gradient[idx(i,j)]));
}
}
using:
nvcc.exe -m32 -maxrregcount=32 -gencode=arch=compute_11,code=\"sm_11,compute_11\" --compile -o "Debug\main.cu.obj" main.cu
got no errors.
Using the CUDA 4.1 beta compiler
Update
I tried compiling your new code (I am working within CUDA/C++, not PyCUDA, but this shouldn't matter). Didn't catch the error either! Used CUDA 4.1 and CUDA 4.0.
What is your version of CUDA installation?
C:\>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Wed_Oct_19_23:13:02_PDT_2011
Cuda compilation tools, release 4.1, V0.2.1221

How would you calculate all possible permutations of 0 through N iteratively?

I need to calculate permutations iteratively. The method signature looks like:
int[][] permute(int n)
For n = 3 for example, the return value would be:
[[0,1,2],
[0,2,1],
[1,0,2],
[1,2,0],
[2,0,1],
[2,1,0]]
How would you go about doing this iteratively in the most efficient way possible? I can do this recursively, but I'm interested in seeing lots of alternate ways to doing it iteratively.
see QuickPerm algorithm, it's iterative : http://www.quickperm.org/
Edit:
Rewritten in Ruby for clarity:
def permute_map(n)
results = []
a, p = (0...n).to_a, [0] * n
i, j = 0, 0
i = 1
results << yield(a)
while i < n
if p[i] < i
j = i % 2 * p[i] # If i is odd, then j = p[i], else j = 0
a[j], a[i] = a[i], a[j] # Swap
results << yield(a)
p[i] += 1
i = 1
else
p[i] = 0
i += 1
end
end
return results
end
The algorithm for stepping from one permutation to the next is very similar to elementary school addition - when an overflow occurs, "carry the one".
Here's an implementation I wrote in C:
#include <stdio.h>
//Convenience macro. Its function should be obvious.
#define swap(a,b) do { \
typeof(a) __tmp = (a); \
(a) = (b); \
(b) = __tmp; \
} while(0)
void perm_start(unsigned int n[], unsigned int count) {
unsigned int i;
for (i=0; i<count; i++)
n[i] = i;
}
//Returns 0 on wraparound
int perm_next(unsigned int n[], unsigned int count) {
unsigned int tail, i, j;
if (count <= 1)
return 0;
/* Find all terms at the end that are in reverse order.
Example: 0 3 (5 4 2 1) (i becomes 2) */
for (i=count-1; i>0 && n[i-1] >= n[i]; i--);
tail = i;
if (tail > 0) {
/* Find the last item from the tail set greater than
the last item from the head set, and swap them.
Example: 0 3* (5 4* 2 1)
Becomes: 0 4* (5 3* 2 1) */
for (j=count-1; j>tail && n[j] <= n[tail-1]; j--);
swap(n[tail-1], n[j]);
}
/* Reverse the tail set's order */
for (i=tail, j=count-1; i<j; i++, j--)
swap(n[i], n[j]);
/* If the entire list was in reverse order, tail will be zero. */
return (tail != 0);
}
int main(void)
{
#define N 3
unsigned int perm[N];
perm_start(perm, N);
do {
int i;
for (i = 0; i < N; i++)
printf("%d ", perm[i]);
printf("\n");
} while (perm_next(perm, N));
return 0;
}
Is using 1.9's Array#permutation an option?
>> a = [0,1,2].permutation(3).to_a
=> [[0, 1, 2], [0, 2, 1], [1, 0, 2], [1, 2, 0], [2, 0, 1], [2, 1, 0]]
Below is my generics version of the next permutation algorithm in C# closely resembling the STL's next_permutation function (but it doesn't reverse the collection if it is the max possible permutation already, like the C++ version does)
In theory it should work with any IList<> of IComparables.
static bool NextPermutation<T>(IList<T> a) where T: IComparable
{
if (a.Count < 2) return false;
var k = a.Count-2;
while (k >= 0 && a[k].CompareTo( a[k+1]) >=0) k--;
if(k<0)return false;
var l = a.Count - 1;
while (l > k && a[l].CompareTo(a[k]) <= 0) l--;
var tmp = a[k];
a[k] = a[l];
a[l] = tmp;
var i = k + 1;
var j = a.Count - 1;
while(i<j)
{
tmp = a[i];
a[i] = a[j];
a[j] = tmp;
i++;
j--;
}
return true;
}
And the demo/test code:
var src = "1234".ToCharArray();
do
{
Console.WriteLine(src);
}
while (NextPermutation(src));
I also came across the QuickPerm algorithm referenced in another answer. I wanted to share this answer in addition, because I saw some immediate changes one can make to write it shorter. For example, if the index array "p" is initialized slightly differently, it saves having to return the first permutation before the loop. Also, all those while-loops and if's took up a lot more room.
void permute(char* s, size_t l) {
int* p = new int[l];
for (int i = 0; i < l; i++) p[i] = i;
for (size_t i = 0; i < l; printf("%s\n", s)) {
std::swap(s[i], s[i % 2 * --p[i]]);
for (i = 1; p[i] == 0; i++) p[i] = i;
}
}
I found Joey Adams' version to be the most readable, but I couldn't port it directly to C# because of how C# handles the scoping of for-loop variables. Hence, this is a slightly tweaked version of his code:
/// <summary>
/// Performs an in-place permutation of <paramref name="values"/>, and returns if there
/// are any more permutations remaining.
/// </summary>
private static bool NextPermutation(int[] values)
{
if (values.Length == 0)
throw new ArgumentException("Cannot permutate an empty collection.");
//Find all terms at the end that are in reverse order.
// Example: 0 3 (5 4 2 1) (i becomes 2)
int tail = values.Length - 1;
while(tail > 0 && values[tail - 1] >= values[tail])
tail--;
if (tail > 0)
{
//Find the last item from the tail set greater than the last item from the head
//set, and swap them.
// Example: 0 3* (5 4* 2 1)
// Becomes: 0 4* (5 3* 2 1)
int index = values.Length - 1;
while (index > tail && values[index] <= values[tail - 1])
index--;
Swap(ref values[tail - 1], ref values[index]);
}
//Reverse the tail set's order.
int limit = (values.Length - tail) / 2;
for (int index = 0; index < limit; index++)
Swap(ref values[tail + index], ref values[values.Length - 1 - index]);
//If the entire list was in reverse order, tail will be zero.
return (tail != 0);
}
private static void Swap<T>(ref T left, ref T right)
{
T temp = left;
left = right;
right = temp;
}
Here's an implementation in C#, as an extension method:
public static IEnumerable<List<T>> Permute<T>(this IList<T> items)
{
var indexes = Enumerable.Range(0, items.Count).ToArray();
yield return indexes.Select(idx => items[idx]).ToList();
var weights = new int[items.Count];
var idxUpper = 1;
while (idxUpper < items.Count)
{
if (weights[idxUpper] < idxUpper)
{
var idxLower = idxUpper % 2 * weights[idxUpper];
var tmp = indexes[idxLower];
indexes[idxLower] = indexes[idxUpper];
indexes[idxUpper] = tmp;
yield return indexes.Select(idx => items[idx]).ToList();
weights[idxUpper]++;
idxUpper = 1;
}
else
{
weights[idxUpper] = 0;
idxUpper++;
}
}
}
And a unit test:
[TestMethod]
public void Permute()
{
var ints = new[] { 1, 2, 3 };
var orderings = ints.Permute().ToList();
Assert.AreEqual(6, orderings.Count);
AssertUtil.SequencesAreEqual(new[] { 1, 2, 3 }, orderings[0]);
AssertUtil.SequencesAreEqual(new[] { 2, 1, 3 }, orderings[1]);
AssertUtil.SequencesAreEqual(new[] { 3, 1, 2 }, orderings[2]);
AssertUtil.SequencesAreEqual(new[] { 1, 3, 2 }, orderings[3]);
AssertUtil.SequencesAreEqual(new[] { 2, 3, 1 }, orderings[4]);
AssertUtil.SequencesAreEqual(new[] { 3, 2, 1 }, orderings[5]);
}
The method AssertUtil.SequencesAreEqual is a custom test helper which can be recreated easily enough.
How about a recursive algorithm you can call iteratively? If you'd actually need that stuff as a list like that (you should clearly inline that rather than allocate a bunch of pointless memory). You could simply calculate the permutation on the fly, by its index.
Much like the permutation is carry-the-one addition re-reversing the tail (rather than reverting to 0), indexing the specific permutation value is finding the digits of a number in base n then n-1 then n-2... through each iteration.
public static <T> boolean permutation(List<T> values, int index) {
return permutation(values, values.size() - 1, index);
}
private static <T> boolean permutation(List<T> values, int n, int index) {
if ((index == 0) || (n == 0)) return (index == 0);
Collections.swap(values, n, n-(index % n));
return permutation(values,n-1,index/n);
}
The boolean returns whether your index value was out of bounds. Namely that it ran out of n values but still had remaining index left over.
And it can't get all the permutations for more than 12 objects.
12! < Integer.MAX_VALUE < 13!
-- But, it's so very very pretty. And if you do a lot of things wrong might be useful.
I have implemented the algorithm in Javascript.
var all = ["a", "b", "c"];
console.log(permute(all));
function permute(a){
var i=1,j, temp = "";
var p = [];
var n = a.length;
var output = [];
output.push(a.slice());
for(var b=0; b <= n; b++){
p[b] = b;
}
while (i < n){
p[i]--;
if(i%2 == 1){
j = p[i];
}
else{
j = 0;
}
temp = a[j];
a[j] = a[i];
a[i] = temp;
i=1;
while (p[i] === 0){
p[i] = i;
i++;
}
output.push(a.slice());
}
return output;
}
I've used the algorithms from here. The page contains a lot of useful information.
Edit: Sorry, those were recursive. uray posted the link to the iterative algorithm in his answer.
I've created a PHP example. Unless you really need to return all of the results, I would only create an iterative class like the following:
<?php
class Permutator implements Iterator
{
private $a, $n, $p, $i, $j, $k;
private $stop;
public function __construct(array $a)
{
$this->a = array_values($a);
$this->n = count($this->a);
}
public function current()
{
return $this->a;
}
public function next()
{
++$this->k;
while ($this->i < $this->n)
{
if ($this->p[$this->i] < $this->i)
{
$this->j = ($this->i % 2) * $this->p[$this->i];
$tmp = $this->a[$this->j];
$this->a[$this->j] = $this->a[$this->i];
$this->a[$this->i] = $tmp;
$this->p[$this->i]++;
$this->i = 1;
return;
}
$this->p[$this->i++] = 0;
}
$this->stop = true;
}
public function key()
{
return $this->k;
}
public function valid()
{
return !$this->stop;
}
public function rewind()
{
if ($this->n) $this->p = array_fill(0, $this->n, 0);
$this->stop = $this->n == 0;
$this->i = 1;
$this->j = 0;
$this->k = 0;
}
}
foreach (new Permutator(array(1,2,3,4,5)) as $permutation)
{
var_dump($permutation);
}
?>
Note that it treats every PHP array as an indexed array.