How to implement three stacks using a single array - language-agnostic

I came across this problem in an interview website. The problem asks for efficiently implement three stacks in a single array, such that no stack overflows until there is no space left in the entire array space.
For implementing 2 stacks in an array, it's pretty obvious: 1st stack grows from LEFT to RIGHT, and 2nd stack grows from RIGHT to LEFT; and when the stackTopIndex crosses, it signals an overflow.
Thanks in advance for your insightful answer.

You can implement three stacks with a linked list:
You need a pointer pointing to the next free element. Three more pointers return the last element of each stack (or null, if the stack is empty).
When a stack gets another element added, it has to use the first free element and set the free pointer to the next free element (or an overflow error will be raised). Its own pointer has to point to the new element, from there back to the next element in the stack.
When a stack gets an element removed it will hand it back into the list of free elements. The own pointer of the stack will be redirected to the next element in the stack.
A linked list can be implemented within an array.
How (space) efficent is this?
It is no problem to build a linked list by using two cells of an array for each list element (value + pointer). Depending on the specification you could even get pointer and value into one array element (e.g. the array is long, value and pointer are only int).
Compare this to the solution of kgiannakakis ... where you lose up to 50% (only in the worst case). But I think that my solution is a bit cleaner (and maybe more academic, which should be no disadvantage for an interview question ^^).

See Knuth, The Art of Computer Programming, Volume 1, Section 2.2.2. titled "Sequential allocation". Discusses allocating multiple queues/stacks in a single array, with algorithms dealing with overflows, etc.

We can use long bit array representing to which stack the i-th array cell belongs to.
We can take values by modulo 3 (00 - empty, 01 - A, 10 - B, 11 - C). It would take N/2 bits or N/4 bytes of additional memory for N sized array.
For example for 1024 long int elements (4096 bytes) it would take only 256 bytes or 6%.
This bit array map can be placed in the same array at the beginning or at the end, just shrinking the size of the given array by constant 6%!

First stack grows from left to right.
Second stack grows from right to left.
Third stack starts from the middle. Suppose odd sized array for simplicity. Then third stack grows like this:
* * * * * * * * * * *
5 3 1 2 4
First and second stacks are allowed to grow maximum at the half size of array. The third stack can grow to fill in the whole array at a maximum.
Worst case scenario is when one of the first two arrays grows at 50% of the array. Then there is a 50% waste of the array. To optimise the efficiency the third array must be selected to be the one that grows quicker than the other two.

This is an interesting conundrum, and I don't have a real answer but thinking slightly outside the box...
it could depend on what each element in the stack consists of. If it's three stacks of true/false flags, then you could use the first three bits of integer elements. Ie. bit 0 is the value for the first stack, bit 1 is the value for the second stack, bit 2 is the value for the third stack. Then each stack can grow independently until the whole array is full for that stack. This is even better as the other stacks can also continue to grow even when the first stack is full.
I know this is cheating a bit and doesn't really answer the question but it does work for a very specific case and no entries in the stack are wasted. I am watching with interest to see whether anyone can come up with a proper answer that works for more generic elements.

Split array in any 3 parts (no matter if you'll split it sequentially or interleaved). If one stack grows greater than 1/3 of array you start filling ends of rest two stacks from the end.
aaa bbb ccc
1 2 3
145 2 3
145 2 6 3
145 2 6 3 7
145 286 3 7
145 286 397
The worse case is when two stacks grows up to 1/3 boundary and then you have 30% of space waste.

Assuming that all array positions should be used to store values - I guess it depends on your definition of efficiency.
If you do the two stack solution, place the third stack somewhere in the middle, and track both its bottom and top, then most operations will continue to be efficient, at a penalty of an expensive Move operation (of the third stack towards wherever free space remains, moving to the half way point of free space) whenever a collision occurs.
It's certainly going to be quick to code and understand. What are our efficiency targets?

A rather silly but effective solution could be:
Store the first stack elements at i*3 positions: 0,3,6,...
Store the second stack elements at i*3+1 positions: 1,4,7...
And third stack elements at i*3+2 positions.
The problem with this solution is that the used memory will be always three times the size of the deepest stack and that you can overflow even when there are available positions at the array.

Make a HashMap with keys to the begin and end positions e.g. < "B1" , 0 >, <"E1" , n/3 >
for each Push(value) add a condition to check if position of Bx is previous to Ex or there is some other "By" in between. -- lets call it condition (2)
with above condition in mind,
if above (2) is true // if B1 and E1 are in order
{ if ( S1.Push()), then E1 ++ ;
else // condition of overflow ,
{ start pushing at end of E2 or E3 (whichever has a space) and update E1 to be E2-- or E3-- ; }
}
if above (2) is false
{ if ( S1.Push()), then E1 -- ;
else // condition of overflow ,
{ start pushing at end of E2 or E3 (whichever has a space) and update E1 to be E2-- or E3-- ; }
}

Assume you only has integer index. if it's treated using FILO (First In Last Out) and not referencing individual, and only using an array as data. Using it's 6 space as stack reference should help:
[head-1, last-1, head-2, last-2, head-3, last-3, data, data, ... ,data]
you can simply using 4 space, because head-1 = 0 and last-3 = array length. If using FIFO (First In First Out) you need to re-indexing.
nb: I’m working on improving my English.

first stack grows at 3n,
second stack grows at 3n+1,
third grows at 3n+2
for n={0...N}

Yet another approach (as additional to linked-list) is to use map of stacks. In that case you'll have to use additional log(3^n)/log(2) bits for building map of data distribution in your n-length array. Each of 3-value part of map says which stack is owns next element.
Ex. a.push(1); b.push(2); c.push(3); a.push(4); a.push(5); will give you image
aacba
54321
appropriate value of map is calculated while elements is pushed onto stack (with shifting contents of array)
map0 = any
map1 = map0*3 + 0
map2 = map1*3 + 1
map3 = map2*3 + 2
map4 = map3*3 + 0
map5 = map4*3 + 0 = any*3^5 + 45
and length of stacks 3,1,1
Once you'll want to do c.pop() you'll have to reorganize your elements by finding actual position of c.top() in original array through walking in cell-map (i.e. divide by 3 while mod by 3 isn't 2) and then shift all contents in array back to cover that hole. While walking through cell-map you'll have to store all position you have passed (mapX) and after passing that one which points to stack "c" you'll have to divide by 3 yet another time and multiply it by 3^(amount positions passed-1) and add mapX to get new value of cells-map.
Overhead for that fixed and depends on size of stack element (bits_per_value):
(log(3n)/log(2)) / (nlog(bits_per_value)/log(2)) = log(3n) / (nlog(bits_per_value)) = log(3) / log(bits_per_value)
So for bits_per_value = 32 it will be 31.7% space overhead and with growing bits_per_value it will decay (i.e. for 64 bits it will be 26.4%).

In this approach, any stack can grow as long as there is any free space in the array.
We sequentially allocate space to the stacks and we link new blocks to the previous block. This means any new element in a stack keeps a pointer to the previous top element of that particular stack.
int stackSize = 300;
int indexUsed = 0;
int[] stackPointer = {-1,-1,-1};
StackNode[] buffer = new StackNode[stackSize * 3];
void push(int stackNum, int value) {
int lastIndex = stackPointer[stackNum];
stackPointer[stackNum] = indexUsed;
indexUsed++;
buffer[stackPointer[stackNum]]=new StackNode(lastIndex,value);
}
int pop(int stackNum) {
int value = buffer[stackPointer[stackNum]].value;
int lastIndex = stackPointer[stackNum];
stackPointer[stackNum] = buffer[stackPointer[stackNum]].previous;
buffer[lastIndex] = null;
indexUsed--;
return value;
}
int peek(int stack) { return buffer[stackPointer[stack]].value; }
boolean isEmpty(int stackNum) { return stackPointer[stackNum] == -1; }
class StackNode {
public int previous;
public int value;
public StackNode(int p, int v){
value = v;
previous = p;
}
}

This code implements 3 stacks in single array. It takes care of empty spaces and fills the empty spaces in between the data.
#include <stdio.h>
struct stacknode {
int value;
int prev;
};
struct stacknode stacklist[50];
int top[3] = {-1, -1, -1};
int freelist[50];
int stackindex=0;
int freeindex=-1;
void push(int stackno, int value) {
int index;
if(freeindex >= 0) {
index = freelist[freeindex];
freeindex--;
} else {
index = stackindex;
stackindex++;
}
stacklist[index].value = value;
if(top[stackno-1] != -1) {
stacklist[index].prev = top[stackno-1];
} else {
stacklist[index].prev = -1;
}
top[stackno-1] = index;
printf("%d is pushed in stack %d at %d\n", value, stackno, index);
}
int pop(int stackno) {
int index, value;
if(top[stackno-1] == -1) {
printf("No elements in the stack %d\n", value, stackno);
return -1;
}
index = top[stackno-1];
freeindex++;
freelist[freeindex] = index;
value = stacklist[index].value;
top[stackno-1] = stacklist[index].prev;
printf("%d is popped put from stack %d at %d\n", value, stackno, index);
return value;
}
int main() {
push(1,1);
push(1,2);
push(3,3);
push(2,4);
pop(3);
pop(3);
push(3,3);
push(2,3);
}

Another solution in PYTHON, please let me know if that works as what you think.
class Stack(object):
def __init__(self):
self.stack = list()
self.first_length = 0
self.second_length = 0
self.third_length = 0
self.first_pointer = 0
self.second_pointer = 1
def push(self, stack_num, item):
if stack_num == 1:
self.first_pointer += 1
self.second_pointer += 1
self.first_length += 1
self.stack.insert(0, item)
elif stack_num == 2:
self.second_length += 1
self.second_pointer += 1
self.stack.insert(self.first_pointer, item)
elif stack_num == 3:
self.third_length += 1
self.stack.insert(self.second_pointer - 1, item)
else:
raise Exception('Push failed, stack number %d is not allowd' % stack_num)
def pop(self, stack_num):
if stack_num == 1:
if self.first_length == 0:
raise Exception('No more element in first stack')
self.first_pointer -= 1
self.first_length -= 1
self.second_pointer -= 1
return self.stack.pop(0)
elif stack_num == 2:
if self.second_length == 0:
raise Exception('No more element in second stack')
self.second_length -= 1
self.second_pointer -= 1
return self.stack.pop(self.first_pointer)
elif stack_num == 3:
if self.third_length == 0:
raise Exception('No more element in third stack')
self.third_length -= 1
return self.stack.pop(self.second_pointer - 1)
def peek(self, stack_num):
if stack_num == 1:
return self.stack[0]
elif stack_num == 2:
return self.stack[self.first_pointer]
elif stack_num == 3:
return self.stack[self.second_pointer]
else:
raise Exception('Peek failed, stack number %d is not allowd' % stack_num)
def size(self):
return len(self.items)
s = Stack()
# push item into stack 1
s.push(1, '1st_stack_1')
s.push(1, '2nd_stack_1')
s.push(1, '3rd_stack_1')
#
## push item into stack 2
s.push(2, 'first_stack_2')
s.push(2, 'second_stack_2')
s.push(2, 'third_stack_2')
#
## push item into stack 3
s.push(3, 'FIRST_stack_3')
s.push(3, 'SECOND_stack_3')
s.push(3, 'THIRD_stack_3')
#
print 'Before pop out: '
for i, elm in enumerate(s.stack):
print '\t\t%d)' % i, elm
#
s.pop(1)
s.pop(1)
#s.pop(1)
s.pop(2)
s.pop(2)
#s.pop(2)
#s.pop(3)
s.pop(3)
s.pop(3)
#s.pop(3)
#
print 'After pop out: '
#
for i, elm in enumerate(s.stack):
print '\t\t%d)' % i, elm

May be this can help you a bit...i wrote it by myself
:)
// by ashakiran bhatter
// compile: g++ -std=c++11 test.cpp
// run : ./a.out
// sample output as below
// adding: 1 2 3 4 5 6 7 8 9
// array contents: 9 8 7 6 5 4 3 2 1
// popping now...
// array contents: 8 7 6 5 4 3 2 1
#include <iostream>
#include <cstdint>
#define MAX_LEN 9
#define LOWER 0
#define UPPER 1
#define FULL -1
#define NOT_SET -1
class CStack
{
private:
int8_t array[MAX_LEN];
int8_t stack1_range[2];
int8_t stack2_range[2];
int8_t stack3_range[2];
int8_t stack1_size;
int8_t stack2_size;
int8_t stack3_size;
int8_t stack1_cursize;
int8_t stack2_cursize;
int8_t stack3_cursize;
int8_t stack1_curpos;
int8_t stack2_curpos;
int8_t stack3_curpos;
public:
CStack();
~CStack();
void push(int8_t data);
void pop();
void print();
};
CStack::CStack()
{
stack1_range[LOWER] = 0;
stack1_range[UPPER] = MAX_LEN/3 - 1;
stack2_range[LOWER] = MAX_LEN/3;
stack2_range[UPPER] = (2 * (MAX_LEN/3)) - 1;
stack3_range[LOWER] = 2 * (MAX_LEN/3);
stack3_range[UPPER] = MAX_LEN - 1;
stack1_size = stack1_range[UPPER] - stack1_range[LOWER];
stack2_size = stack2_range[UPPER] - stack2_range[LOWER];
stack3_size = stack3_range[UPPER] - stack3_range[LOWER];
stack1_cursize = stack1_size;
stack2_cursize = stack2_size;
stack3_cursize = stack3_size;
stack1_curpos = stack1_cursize;
stack2_curpos = stack2_cursize;
stack3_curpos = stack3_cursize;
}
CStack::~CStack()
{
}
void CStack::push(int8_t data)
{
if(stack3_cursize != FULL)
{
array[stack3_range[LOWER] + stack3_curpos--] = data;
stack3_cursize--;
} else if(stack2_cursize != FULL) {
array[stack2_range[LOWER] + stack2_curpos--] = data;
stack2_cursize--;
} else if(stack1_cursize != FULL) {
array[stack1_range[LOWER] + stack1_curpos--] = data;
stack1_cursize--;
} else {
std::cout<<"\tstack is full...!"<<std::endl;
}
}
void CStack::pop()
{
std::cout<<"popping now..."<<std::endl;
if(stack1_cursize < stack1_size)
{
array[stack1_range[LOWER] + ++stack1_curpos] = 0;
stack1_cursize++;
} else if(stack2_cursize < stack2_size) {
array[stack2_range[LOWER] + ++stack2_curpos] = 0;
stack2_cursize++;
} else if(stack3_cursize < stack3_size) {
array[stack3_range[LOWER] + ++stack3_curpos] = 0;
stack3_cursize++;
} else {
std::cout<<"\tstack is empty...!"<<std::endl;
}
}
void CStack::print()
{
std::cout<<"array contents: ";
for(int8_t i = stack1_range[LOWER] + stack1_curpos + 1; i <= stack1_range[UPPER]; i++)
std::cout<<" "<<static_cast<int>(array[i]);
for(int8_t i = stack2_range[LOWER] + stack2_curpos + 1; i <= stack2_range[UPPER]; i++)
std::cout<<" "<<static_cast<int>(array[i]);
for(int8_t i = stack3_range[LOWER] + stack3_curpos + 1; i <= stack3_range[UPPER]; i++)
std::cout<<" "<<static_cast<int>(array[i]);
std::cout<<"\n";
}
int main()
{
CStack stack;
std::cout<<"adding: ";
for(uint8_t i = 1; i < 10; i++)
{
std::cout<<" "<<static_cast<int>(i);
stack.push(i);
}
std::cout<<"\n";
stack.print();
stack.pop();
stack.print();
return 0;
}

Perhaps you can implement N number of stacks or queues in the single array. My defination of using single array is that we are using single array to store all the data of all the stacks and queues in the single array, anyhow we can use other N array to keep track of indices of all elements of particular stack or queue.
solution :
store data sequentially to in the array during the time of insertion in any of the stack or queue. and store it's respective index to the index keeping array of that particular stack or queue.
for eg : you have 3 stacks (s1, s2, s3) and you want to implement this using a single array (dataArray[]). Hence we will make 3 other arrays (a1[], a2[], a3[]) for s1, s2 and s3 respectively which will keep track of all of their elements in dataArray[] by saving their respective index.
insert(s1, 10) at dataArray[0] a1[0] = 0;
insert(s2, 20) at dataArray[1] a2[0] = 1;
insert(s3, 30) at dataArray[2] a3[0] = 2;
insert(s1, 40) at dataArray[3] a1[1] = 3;
insert(s3, 50) at dataArray[4] a3[1] = 4;
insert(s3, 60) at dataArray[5] a3[2] = 5;
insert(s2, 30) at dataArray[6] a2[1] = 6;
and so on ...
now we will perform operation in dataArray[] by using a1, a2 and a3 for respective stacks and queues.
to pop an element from s1
return a1[0]
shift all elements to left
do similar approach for other operations too and you can implement any number of stacks and queues in the single array.

Related

read CSV file in Cplex

my question is related to my previous question. I should make some change in my code. I have a number of nodes between 1 to 100 in the CSV file. I create another CSV file and generate 20 random numbers between the 100 nodes and called them demand points. Each of this demand point has specific demands which are the randomly generate numbers between 1 to 10. I want to read this demand points(the indexes) and their weights. this is the first part of my question? how can I read this?
After that, I need to have a distance between each of these demand points and all nodes. I don't how can I just read the indexes of demand points and calculate the distance between them and all the nodes.
Based on the code that I provided, I need the indexes of demand points for a lot of places. My main problem is that I don't know how should I get these indexes in Cplex through the CSV file.
The demand points with their demands picture is:
first column is demandpointindex and second column in their demands
this file has 200 rows
I tried this code for reading the demand points:
tuple demands
{
int demandpoint;
int weight;
}
{demands} demand={};
execute
{
var f=new IloOplInputFile("weight.csv");
while (!f.eof)
{
var data = f.readline().split(",");
if (ar.length==2)
demand.add(Opl.intValue(ar[0]),Opl.intValue(ar[1]));
}
f.close();
}
execute
{
writeln(demand);
}
but it's not true.
int n=100;
int p=5;
tuple demands
{
int demandpointindex;
int weight;
}
{demands} demand={};
execute
{
var f=new IloOplInputFile("weight.csv");
while (!f.eof)
{
var data = f.readline().split(",");
if (ar.length==2)
demand.add(Opl.intValue(ar[0]),Opl.intValue(ar[1]));
}
f.close();
}
execute
{
writeln(demand);
}
float d[demandpointindexes][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
dvar boolean x[demandpointindexe][facilities];
...
I hope I got your explanation right. Assume you have a file weight.csv like this:
1,2,
3,7,
4,9,
Here the first item in each row is the index of a demand point, the second item is its weight. Then you can parse this as before using this scripting block:
tuple demandpoint {
int index;
int weight;
}
{demandpoint} demand={};
execute {
var f = new IloOplInputFile("weight.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
demand.add(Opl.intValue(data[0]), Opl.intValue(data[1]));
}
writeln(demand);
}
Next you can create a set that contains the indeces of all the demand points:
{int} demandpoints = { d.index | d in demand };
Assume file test1.csv looks like this
1,1,0,
1,2,5,
1,3,6,
1,4,7,
3,1,1,
3,2,1.5,
3,3,0,
3,4,3.5,
4,1,1,
4,2,1.5,
4,3,1.7,
4,4,0,
Here the first item is a demand point index, the second item is a facility index and the third item is the distance between first and second item. Note that there are no lines that start with 2 since there is no demand point with index 2 in weight.csv. Also note that I assume only 4 facilities here (to keep the file short). You can read the distance between demand points and facitilies as follows:
range facilities = 1..4;
float d[demandpoints][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 4)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
The full script (including a dummy objective and constraints so that it can be run) looks:
tuple demandpoint {
int index;
int weight;
}
{demandpoint} demand={};
execute {
var f = new IloOplInputFile("weight.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 3)
demand.add(Opl.intValue(data[0]), Opl.intValue(data[1]));
}
writeln(demand);
}
// Create a set that contains all the indeces of demand points
// as read from weight.csv
{int} demandpoints = { d.index | d in demand };
range facilities = 1..4;
float d[demandpoints][facilities];
execute {
var f = new IloOplInputFile("test1.csv");
while (!f.eof) {
var data = f.readline().split(",");
if (data.length == 4)
d[Opl.intValue(data[0])][Opl.intValue(data[1])] = Opl.floatValue(data[2]);
}
writeln(d);
}
minimize 0;
subject to {}
It prints
{<1 2> <3 7> <4 9>}
[[0 5 6 7]
[1 1.5 0 3.5]
[1 1.5 1.7 0]]
Be careful about how many commas you have in your csv! The code posted above assumes that each line ends with a comma. That is, each line has as many commas as it has fields. If the last field is not terminated by a comma then you have to adapt the parser.
If you have in test1.csv the distance between all the nodes then it makes sense to first read the data into an array float distance[facilities][facilities]; and then define the array d based on that as
float d[i in demandpoints][j in facilities] = distance[i][j];
Update for the more detailed specification you gave in the comments:
In order to handle the test1.csv you explained in the comments you could define a new tuple:
tuple Distance {
int demandpoint;
int facility;
float distance;
}
{Distance} distances = {};
and read/parse this exactly as you did parse the weight.csv file (with one additional field, of course).
Then you can create the distance matrix like so:
float d[i in I][j in J] = sum (dist in distances : dist.demandpoint == i && dist.facility == j) dist.distance;
Here I and J are the sets or ranges of demand points and facilities, respectively. See above for how you can get a set of all demand points defined in the tuple set. The created matrix will have an entry for each demandpoint/distance pair. The trick in the definition d is that there are two cases:
If a pair (i,j) is defined in test1.csv then the sum will match exactly one element in distances: the one that defines the distance between two points.
If a pair (i,j) is not defined in test1.csv then the sum will not match anything and the corresponding entry in the distance matrix will thus be 0.

Count no. of ones by position(place) of an array of 32-bit binary numbers

An array(10^5 size) of 32-bit binary numbers is given, we're required to count the no. of ones for every bit of those numbers.
For example:
Array : {10101,1011,1010,1}
Counts : {1's place: 3, 2's place: 2, 3's place: 1, 4's place: 2, 5's place: 1}
No bit manipulation technique seems to satisfy the constraints to me.
Well, this should be solveable with two loops: one going over the array the other one masking the right bits. Running time should be not too bad for your constraints.
Here is a rust implementation (out of my head, not throughtfully tested):
fn main() {
let mut v = vec!();
for i in 1..50*1000 {
v.push(i);
}
let r = bitcount_arr(v);
r.iter().enumerate().for_each( |(i,x)| print!("index {}:{} ",i+1,x));
}
fn bitcount_arr(input:Vec<u32>) -> [u32;32] {
let mut res = [0;32];
for num in input {
for i in 0..31 {
let mask = 1 << i;
if num & mask != 0 {
res[i] += 1;
}
}
}
res
}
This can be done with transposed addition, though the array is a bit long for it.
To transpose addition, use an array of counters, but instead of using one counter for every position we'll use one counter for every bit of the count. So a counter that tracks for each position whether the count is even/odd, a counter that tracks for each position whether the count has a 2 in it, etc.
To add an element of the array into this, only half-add operations (& to find the new carry, ^ to update) are needed, since it's only a conditional increment: (not tested)
uint32_t counters[17];
for (uint32_t elem : array) {
uint32_t c = elem;
for (int i = 0; i < 17; i++) {
uint32_t nextcarry = counters[i] & c;
counters[i] ^= c;
c = nextcarry;
}
}
I chose 17 counters because log2(10^5) is just less than 17. So even if all bits are 1, the counters won't wrap.
To read off the result for bit k, take the k'th bit of every counter.
There are slightly more efficient ways that can add several elements of the array into the counters at once using some full-adds and duplicated counters.

Removal of every 'kth' person from a circle. Find the last remaining person

As part of a recent job application I was asked to code a solution to this problem.
Given,
n = number of people standing in a circle.
k = number of people to count over each time
Each person is given a unique (incrementing) id. Starting with the first person (the lowest id), they begin counting from 1 to k.
The person at k is then removed and the circle closes up. The next remaining person (following the eliminated person) resumes counting at 1. This process repeats until only one person is left, the winner.
The solution must provide:
the id of each person in the order they are removed from the circle
the id of the winner.
Performance constraints:
Use as little memory as possible.
Make the solution run as fast as possible.
I remembered doing something similar in my CS course from years ago but could not recall the details at the time of this test. I now realize it is a well known, classic problem with multiple solutions. (I will not mention it by name yet as some may just 'wikipedia' an answer).
I've already submitted my solution so I'm absolutely not looking for people to answer it for me. I will provide it a bit later once/if others have provided some answers.
My main goal for asking this question is to see how my solution compares to others given the requirements and constraints.
(Note the requirements carefully as I think they may invalidate some of the 'classic' solutions.)
Manuel Gonzalez noticed correctly that this is the general form of the famous Josephus problem.
If we are only interested in the survivor f(N,K) of a circle of size N and jumps of size K, then we can solve this with a very simple dynamic programming loop (In linear time and constant memory). Note that the ids start from 0:
int remaining(int n, int k) {
int r = 0;
for (int i = 2; i <= n; i++)
r = (r + k) % i;
return r;
}
It is based on the following recurrence relation:
f(N,K) = (f(N-1,K) + K) mod N
This relation can be explained by simulating the process of elimination, and after each elimination re-assigning new ids starting from 0. The old indices are the new ones with a circular shift of k positions. For a more detailed explanation of this formula, see http://blue.butler.edu/~phenders/InRoads/MathCounts8.pdf.
I know that the OP asks for all the indices of the eliminated items in their correct order. However, I believe that the above insight can be used for solving this as well.
You can do it using a boolean array.
Here is a pseudo code:
Let alive be a boolean array of size N. If alive[i] is true then ith person is alive else dead. Initially it is true for every 1>=i<=N
Let numAlive be the number of persons alive. So numAlive = N at start.
i = 1 # Counting starts from 1st person.
count = 0;
# keep looping till we've more than 1 persons.
while numAlive > 1 do
if alive[i]
count++
end-if
# time to kill ?
if count == K
print Person i killed
numAlive --
alive[i] = false
count = 0
end-if
i = (i%N)+1 # Counting starts from next person.
end-while
# Find the only alive person who is the winner.
while alive[i] != true do
i = (i%N)+1
end-while
print Person i is the winner
The above solution is space efficient but not time efficient as the dead persons are being checked.
To make it more efficient time wise you can make use of a circular linked list. Every time you kill a person you delete a node from the list. You continue till a single node is left in the list.
The problem of determining the 'kth' person is called the Josephus Problem.
Armin Shams-Baragh from Ferdowsi University of Mashhad published some formulas for the Josephus Problem and the extended version of it.
The paper is available at: http://www.cs.man.ac.uk/~shamsbaa/Josephus.pdf
This is my solution, coded in C#. What could be improved?
public class Person
{
public Person(int n)
{
Number = n;
}
public int Number { get; private set; }
}
static void Main(string[] args)
{
int n = 10; int k = 4;
var circle = new List<Person>();
for (int i = 1; i <= n; i++)
{
circle.Add(new Person(i));
}
var index = 0;
while (circle.Count > 1)
{
index = (index + k - 1) % circle.Count;
var person = circle[index];
circle.RemoveAt(index);
Console.WriteLine("Removed {0}", person.Number);
}
Console.ReadLine();
}
Console.WriteLine("Winner is {0}", circle[0].Number);
Essentially the same as Ash's answer, but with a custom linked list:
using System;
using System.Linq;
namespace Circle
{
class Program
{
static void Main(string[] args)
{
Circle(20, 3);
}
static void Circle(int k, int n)
{
// circle is a linked list representing the circle.
// Each element contains the index of the next member
// of the circle.
int[] circle = Enumerable.Range(1, k).ToArray();
circle[k - 1] = 0; // Member 0 follows member k-1
int prev = -1; // Used for tracking the previous member so we can delete a member from the list
int curr = 0; // The member we're currently inspecting
for (int i = 0; i < k; i++) // There are k members to remove from the circle
{
// Skip over n members
for (int j = 0; j < n; j++)
{
prev = curr;
curr = circle[curr];
}
Console.WriteLine(curr);
circle[prev] = circle[curr]; // Delete the nth member
curr = prev; // Start counting again from the previous member
}
}
}
}
Here is a solution in Clojure:
(ns kthperson.core
(:use clojure.set))
(defn get-winner-and-losers [number-of-people hops]
(loop [people (range 1 (inc number-of-people))
losers []
last-scan-start-index (dec hops)]
(if (= 1 (count people))
{:winner (first people) :losers losers}
(let [people-to-filter (subvec (vec people) last-scan-start-index)
additional-losers (take-nth hops people-to-filter)
remaining-people (difference (set people)
(set additional-losers))
new-losers (concat losers additional-losers)
index-of-last-removed-person (* hops (count additional-losers))]
(recur remaining-people
new-losers
(mod last-scan-start-index (count people-to-filter)))))))
Explanation:
start a loop, with a collection of people 1..n
if there is only one person left, they are the winner and we return their ID, as well as the IDs of the losers (in order of them losing)
we calculate additional losers in each loop/recur by grabbing every N people in the remaining list of potential winners
a new, shorter list of potential winners is determined by removing the additional losers from the previously-calculated potential winners.
rinse & repeat (using modulus to determine where in the list of remaining people to start counting the next time round)
This is a variant of the Josephus problem.
General solutions are described here.
Solutions in Perl, Ruby, and Python are provided here. A simple solution in C using a circular doubly-linked list to represent the ring of people is provided below. None of these solutions identify each person's position as they are removed, however.
#include <stdio.h>
#include <stdlib.h>
/* remove every k-th soldier from a circle of n */
#define n 40
#define k 3
struct man {
int pos;
struct man *next;
struct man *prev;
};
int main(int argc, char *argv[])
{
/* initialize the circle of n soldiers */
struct man *head = (struct man *) malloc(sizeof(struct man));
struct man *curr;
int i;
curr = head;
for (i = 1; i < n; ++i) {
curr->pos = i;
curr->next = (struct man *) malloc(sizeof(struct man));
curr->next->prev = curr;
curr = curr->next;
}
curr->pos = n;
curr->next = head;
curr->next->prev = curr;
/* remove every k-th */
while (curr->next != curr) {
for (i = 0; i < k; ++i) {
curr = curr->next;
}
curr->prev->next = curr->next;
curr->next->prev = curr->prev;
}
/* announce last person standing */
printf("Last person standing: #%d.\n", curr->pos);
return 0;
}
Here's my answer in C#, as submitted. Feel free to criticize, laugh at, ridicule etc ;)
public static IEnumerable<int> Move(int n, int k)
{
// Use an Iterator block to 'yield return' one item at a time.
int children = n;
int childrenToSkip = k - 1;
LinkedList<int> linkedList = new LinkedList<int>();
// Set up the linked list with children IDs
for (int i = 0; i < children; i++)
{
linkedList.AddLast(i);
}
LinkedListNode<int> currentNode = linkedList.First;
while (true)
{
// Skip over children by traversing forward
for (int skipped = 0; skipped < childrenToSkip; skipped++)
{
currentNode = currentNode.Next;
if (currentNode == null) currentNode = linkedList.First;
}
// Store the next node of the node to be removed.
LinkedListNode<int> nextNode = currentNode.Next;
// Return ID of the removed child to caller
yield return currentNode.Value;
linkedList.Remove(currentNode);
// Start again from the next node
currentNode = nextNode;
if (currentNode== null) currentNode = linkedList.First;
// Only one node left, the winner
if (linkedList.Count == 1) break;
}
// Finally return the ID of the winner
yield return currentNode.Value;
}

Cummulative array summation using OpenCL

I'm calculating the Euclidean distance between n-dimensional points using OpenCL. I get two lists of n-dimensional points and I should return an array that contains just the distances from every point in the first table to every point in the second table.
My approach is to do the regular doble loop (for every point in Table1{ for every point in Table2{...} } and then do the calculation for every pair of points in paralell.
The euclidean distance is then split in 3 parts:
1. take the difference between each dimension in the points
2. square that difference (still for every dimension)
3. sum all the values obtained in 2.
4. Take the square root of the value obtained in 3. (this step has been omitted in this example.)
Everything works like a charm until I try to accumulate the sum of all differences (namely, executing step 3. of the procedure described above, line 49 of the code below).
As test data I'm using DescriptorLists with 2 points each:
DescriptorList1: 001,002,003,...,127,128; (p1)
129,130,131,...,255,256; (p2)
DescriptorList2: 000,001,002,...,126,127; (p1)
128,129,130,...,254,255; (p2)
So the resulting vector should have the values: 128, 2064512, 2130048, 128
Right now I'm getting random numbers that vary with every run.
I appreciate any help or leads on what I'm doing wrong. Hopefully everything is clear about the scenario I'm working in.
#define BLOCK_SIZE 128
typedef struct
{
//How large each point is
int length;
//How many points in every list
int num_elements;
//Pointer to the elements of the descriptor (stored as a raw array)
__global float *elements;
} DescriptorList;
__kernel void CompareDescriptors_deb(__global float *C, DescriptorList A, DescriptorList B, int elements, __local float As[BLOCK_SIZE])
{
int gpidA = get_global_id(0);
int featA = get_local_id(0);
//temporary array to store the difference between each dimension of 2 points
float dif_acum[BLOCK_SIZE];
//counter to track the iterations of the inner loop
int loop = 0;
//loop over all descriptors in A
for (int i = 0; i < A.num_elements/BLOCK_SIZE; i++){
//take the i-th descriptor. Returns a DescriptorList with just the i-th
//descriptor in DescriptorList A
DescriptorList tmpA = GetDescriptor(A, i);
//copy the current descriptor to local memory.
//returns one element of the only descriptor in DescriptorList tmpA
//and index featA
As[featA] = GetElement(tmpA, 0, featA);
//wait for all the threads to finish copying before continuing
barrier(CLK_LOCAL_MEM_FENCE);
//loop over all the descriptors in B
for (int k = 0; k < B.num_elements/BLOCK_SIZE; k++){
//take the difference of both current points
dif_acum[featA] = As[featA]-B.elements[k*BLOCK_SIZE + featA];
//wait again
barrier(CLK_LOCAL_MEM_FENCE);
//square value of the difference in dif_acum and store in C
//which is where the results should be stored at the end.
C[loop] = 0;
C[loop] += dif_acum[featA]*dif_acum[featA];
loop += 1;
barrier(CLK_LOCAL_MEM_FENCE);
}
}
}
Your problem lies in these lines of code:
C[loop] = 0;
C[loop] += dif_acum[featA]*dif_acum[featA];
All threads in your workgroup (well, actually all your threads, but lets come to to that later) are trying to modify this array position concurrently without any synchronization whatsoever. Several factors make this really problematic:
The workgroup is not guaranteed to work completely in parallel, meaning that for some threads C[loop] = 0 can be called after other threads have already executed the next line
Those that execute in parallel all read the same value from C[loop], modify it with their increment and try to write back to the same address. I'm not completely sure what the result of that writeback is (I think one of the threads succeeds in writing back, while the others fail, but I'm not completely sure), but its wrong either way.
Now lets fix this:
While we might be able to get this to work on global memory using atomics, it won't be fast, so lets accumulate in local memory:
local float* accum;
...
accum[featA] = dif_acum[featA]*dif_acum[featA];
barrier(CLK_LOCAL_MEM_FENCE);
for(unsigned int i = 1; i < BLOCKSIZE; i *= 2)
{
if ((featA % (2*i)) == 0)
accum[featA] += accum[featA + i];
barrier(CLK_LOCAL_MEM_FENCE);
}
if(featA == 0)
C[loop] = accum[0];
Of course you can reuse other local buffers for this, but I think the point is clear (btw: Are you sure that dif_acum will be created in local memory, because I think I read somewhere that this wouldn't be put in local memory, which would make preloading A into local memory kind of pointless).
Some other points about this code:
Your code is seems to be geared to using only on workgroup (you aren't using either groupid nor global id to see which items to work on), for optimal performance you might want to use more then that.
Might be personal preferance, but I to me it seems better to use get_local_size(0) for the workgroupsize than to use a Define (since you might change it in the host code without realizing you should have changed your opencl code to)
The barriers in your code are all unnecessary, since no thread accesses an element in local memory which is written by another thread. Therefore you don't need to use local memory for this.
Considering the last bullet you could simply do:
float As = GetElement(tmpA, 0, featA);
...
float dif_acum = As-B.elements[k*BLOCK_SIZE + featA];
This would make the code (not considering the first two bullets):
__kernel void CompareDescriptors_deb(__global float *C, DescriptorList A, DescriptorList B, int elements, __local float accum[BLOCK_SIZE])
{
int gpidA = get_global_id(0);
int featA = get_local_id(0);
int loop = 0;
for (int i = 0; i < A.num_elements/BLOCK_SIZE; i++){
DescriptorList tmpA = GetDescriptor(A, i);
float As = GetElement(tmpA, 0, featA);
for (int k = 0; k < B.num_elements/BLOCK_SIZE; k++){
float dif_acum = As-B.elements[k*BLOCK_SIZE + featA];
accum[featA] = dif_acum[featA]*dif_acum[featA];
barrier(CLK_LOCAL_MEM_FENCE);
for(unsigned int i = 1; i < BLOCKSIZE; i *= 2)
{
if ((featA % (2*i)) == 0)
accum[featA] += accum[featA + i];
barrier(CLK_LOCAL_MEM_FENCE);
}
if(featA == 0)
C[loop] = accum[0];
barrier(CLK_LOCAL_MEM_FENCE);
loop += 1;
}
}
}
Thanks to Grizzly, I have now a working kernel. Some things I needed to modify based in the answer of Grizzly:
I added an IF statement at the beginning of the routine to discard all threads that won't reference any valid position in the arrays I'm using.
if(featA > BLOCK_SIZE){return;}
When copying the first descriptor to local (shared) memory (i.g. to Bs), the index has to be specified since the function GetElement returns just one element per call (I skipped that on my question).
Bs[featA] = GetElement(tmpA, 0, featA);
Then, the SCAN loop needed a little tweaking because the buffer is being overwritten after each iteration and one cannot control which thread access the data first. That is why I'm 'recycling' the dif_acum buffer to store partial results and that way, prevent inconsistencies throughout that loop.
dif_acum[featA] = accum[featA];
There are also some boundary control in the SCAN loop to reliably determine the terms to be added together.
if (featA >= j && next_addend >= 0 && next_addend < BLOCK_SIZE){
Last, I thought it made sense to include the loop variable increment within the last IF statement so that only one thread modifies it.
if(featA == 0){
C[loop] = accum[BLOCK_SIZE-1];
loop += 1;
}
That's it. I still wonder how can I make use of group_size to eliminate that BLOCK_SIZE definition and if there are better policies I can adopt regarding thread usage.
So the code looks finally like this:
__kernel void CompareDescriptors(__global float *C, DescriptorList A, DescriptorList B, int elements, __local float accum[BLOCK_SIZE], __local float Bs[BLOCK_SIZE])
{
int gpidA = get_global_id(0);
int featA = get_local_id(0);
//global counter to store final differences
int loop = 0;
//auxiliary buffer to store temporary data
local float dif_acum[BLOCK_SIZE];
//discard the threads that are not going to be used.
if(featA > BLOCK_SIZE){
return;
}
//loop over all descriptors in A
for (int i = 0; i < A.num_elements/BLOCK_SIZE; i++){
//take the gpidA-th descriptor
DescriptorList tmpA = GetDescriptor(A, i);
//copy the current descriptor to local memory
Bs[featA] = GetElement(tmpA, 0, featA);
//loop over all the descriptors in B
for (int k = 0; k < B.num_elements/BLOCK_SIZE; k++){
//take the difference of both current descriptors
dif_acum[featA] = Bs[featA]-B.elements[k*BLOCK_SIZE + featA];
//square the values in dif_acum
accum[featA] = dif_acum[featA]*dif_acum[featA];
barrier(CLK_LOCAL_MEM_FENCE);
//copy the values of accum to keep consistency once the scan procedure starts. Mostly important for the first element. Two buffers are necesarry because the scan procedure would override values that are then further read if one buffer is being used instead.
dif_acum[featA] = accum[featA];
//Compute the accumulated sum (a.k.a. scan)
for(int j = 1; j < BLOCK_SIZE; j *= 2){
int next_addend = featA-(j/2);
if (featA >= j && next_addend >= 0 && next_addend < BLOCK_SIZE){
dif_acum[featA] = accum[featA] + accum[next_addend];
}
barrier(CLK_LOCAL_MEM_FENCE);
//copy As to accum
accum[featA] = GetElementArray(dif_acum, BLOCK_SIZE, featA);
barrier(CLK_LOCAL_MEM_FENCE);
}
//tell one of the threads to write the result of the scan in the array containing the results.
if(featA == 0){
C[loop] = accum[BLOCK_SIZE-1];
loop += 1;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
}
}

Function to determine number of unordered combinations with non-unqiue choices

I'm trying to determine the function for determining the number of unordered combinations with non-unique choices.
Given:
n = number of unique symbols to select from
r = number of choices
Example... for n=3, r=3, the result would be: (edit: added missing values pointed out by Dav)
000
001
002
011
012
022
111
112
122
222
I know the formula for permutations (unordered, unique selections), but I can't figure out how allowing repetition increases the set.
In C++ given the following routine:
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
/* Credits: Mark Nelson http://marknelson.us */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator i1 = first;
Iterator i2 = last;
++i1;
if (last == i1)
return false;
i1 = last;
--i1;
i1 = k;
--i2;
while (first != i1)
{
if (*--i1 < *i2)
{
Iterator j = k;
while (!(*i1 < *j)) ++j;
std::iter_swap(i1,j);
++i1;
++j;
i2 = k;
std::rotate(i1,j,last);
while (last != j)
{
++j;
++i2;
}
std::rotate(k,i2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
You can then proceed to do the following:
std::string s = "12345";
std::size_t r = 3;
do
{
std::cout << std::string(s.begin(),s.begin() + r) << std::endl;
}
while(next_combination(s.begin(), s.begin() + r, s.end()));
If you have N unique symbols, and want to select a combination of length R, then you are essentially putting N-1 dividers into R+1 "slots" between cumulative total numbers of symbols selected.
0 [C] 1 [C] 2 [C] 3
The C's are choices, the numbers are the cumulative count of choices made so far. You're essentially putting a divider for each possible thing you could choose of when you "start" choosing that thing (it's assumed that you start with choosing a particular thing before any dividers are placed, hence the -1 in the N-1 dividers).
If you place all of the dividers are spot 0, then you chose the final thing for all of the choices. If you place all of the dividers at spot 3, then you choose the initial thing for all of the choices. In general, if you place the ith divider at spot k, then you chose thing i+1 for all of the choices that come between that spot and the spot of the next divider.
Since we're trying to put N-1 non-unique items (the dividers are non-unique, they're just dividers) around R slots, we really just want to permute N-1 1's and R 0's, which is effectively
(N+R-1) choose (N-1) =(N+R-1)!/((N-1)!R!).
Thus the final formula is (N+R-1)!/((N-1)!R!) for the number of unordered combinations with non-unique selection of items.
Note that this evaluates to 10 for N=3, R=3, which matches with your result... after you add the missing options that I pointed out in comments above.