Space complexity confusion - language-agnostic

I'm a bit confused about analyzing space complexity in general. I'm not sure the meaning of "extra space taken up by the algorithm". What counts as space of 1?
In the example here
int findMin(int[] x) {
int k = 0; int n = x.length;
for (int i = 1; i < n; i++) {
if (x[i] < x[k]) {
k = i;
}
}
return k;
}
The space complexity is O(n), and I'm guessing it's due to an array size of n.
But for something like heapsort, it takes O(1). Wouldn't an in-place heapsort also need to have an array of size n(n is size of input)? Or are we assuming the input is already in an array? Why is heapsort's space complexity O(1)?
Thanks!

Heapsort requires only a constant amount of auxiliary storage, hence O(1). The space used by the input to be sorted is of course O(n).

Actually extra space corresponds to extra stack space that an algo uses i.e. other dan the input and generally it requires stack in recursive function calls , if recursion is present in algo than surely it will use stack to store contents until it get solved by termination condition.
The size of the stack will be O(height of the recursion tree).
Hope this is helpful!!

Related

How to look for factorization of one integer in linear sieve algorithm without divisions?

I learned an algorithm called "linear sieve" https://cp-algorithms.com/algebra/prime-sieve-linear.html that is able to get all primes numbers smaller than N in linear time.
This algorithm has one by-product since it has an array lp[x] that stores the minimal prime factor of the number x.
So we can follow lp[x] to find the first prime factor, and continue the division to get all factors.
In the meantime, the article also mentioned that with the help of one extra array, we can get all factors without division, how to achieve that?
The article said: "... Moreover, using just one extra array will allow us to avoid divisions when looking for factorization."
The algorithm is due to Pritchard. It is a variant of Algorithm 3.3 in Paul Pritchard: "Linear Prime-Number Sieves: a Famiy Tree", Science of Computer Programming, vol. 9 (1987), pp.17-35.
Here's the code with an unnecessary test removed, and an extra vector used to store the factor:
for (int i=2; i <= N; ++i) {
if (lp[i] == 0) {
lp[i] = i;
pr.push_back(i);
}
for (int j=0; i*pr[j] <= N; ++j) {
lp[i*pr[j]] = pr[j];
factor[i*pr[j]] = i;
if (pr[j] == lp[i]) break;
}
}
Afterwards, to get all the prime factors of a number x,
get the first prime factor as lp[x], then recursively get the prime factors of factor[x], stopping after lp[x] == x. E.g.with x=20, lp[x]=2, and factor[x]=10;
then lp[10]=2 and factor[10]=5; then lp[5]=5 and we stop. So the prime factorization is 20 = 2*2*5.

Strange behavior in a simple for using uint

This works as expected:
for (var i:uint = 5; i >= 1; i-- )
{
trace(i); // output is from 5~1, as expected
}
This is the strange behavior:
for (var i:uint = 5; i >= 0; i-- )
{
trace(i)
}
// output:
5
4
3
2
1
0
4294967295
4294967294
4294967293
...
Below 0, something like a MAX_INT appears and it goes on decrementing forever. Why is this happening?
EDIT
I tested a similar code using C++, with a unsigned int and I have the same result. Probably the condition is being evaluated after the decrement.
The behavior you are describing has little to do with any programming language. This is true for C, C++, actionscript, etc. Let me say this though, what you do see is quite normal behavior and has to do with the way a number is represented (see the wiki article and read about unsigned integers).
Because you are using an uint (unsigned integer). Which can only be a positive number, the type you are using cannot represent negative numbers so if you take a uint like this:
uint i = 0;
And you reduce 1 from the above
i = i - 1;
In this case i does not represent negative numbers, as it is unsigned. Then i will display the maximum value of a uint data type.
Your edit that you posted above,
"...in C++, .. same result..."
Should give you a clue as to why this is happening, it has nothing to do with what language you are using, or when the comparison is done. It has to do with what data type you are using.
As an excercise, fire up that C++ program again and write a program that displays the maximum value of a uint. The program should not display any defined constants :)..it should take you one line of code too!

AS3 math: nearest neighbour in array

So let's say i have T, T = 1200. I also have A, A is an array that contains 1000s of entries and these are numerical entries that range from 1000-2000 but does not include an entry for 1200.
What's the fastest way of finding the nearest neighbour (closest value), let's say we ceil it, so it'll match 1201, not 1199 in A.
Note: this will be run on ENTER_FRAME.
Also note: A is static.
It is also very fast to use Vector.<int>instead of Arrayand do a simple for-loop:
var vector:Vector.<int> = new <int>[ 0,1,2, /*....*/ 2000];
function seekNextLower( searchNumber:int ) : int {
for (var i:int = vector.length-1; i >= 0; i--) {
if (vector[i] <= searchNumber) return vector[i];
}
}
function seekNextHigher( searchNumber:int ) : int {
for (var i:int = 0; i < vector.length; i++) {
if (vector[i] >= searchNumber) return vector[i];
}
}
Using any array methods will be more costly than iterating over Vector.<int> - it was optimized for exactly this kind of operation.
If you're looking to run this on every ENTER_FRAME event, you'll probably benefit from some extra optimization.
If you keep track of the entries when they are written to the array, you don't have to sort them.
For example, you'd have an array where T is the index, and it would have an object with an array with all the indexes of the A array that hold that value. you could also put the closest value's index as part of that object, so when you're retrieving this every frame, you only need to access that value, rather than search.
Of course this would only help if you read a lot more than you write, because recreating the object is quite expensive, so it really depends on use.
You might also want to look into linked lists, for certain operations they are quite a bit faster (slower on sort though)
You have to read each value, so the complexity will be linear. It's pretty much like finding the smallest int in an array.
var closestIndex:uint;
var closestDistance:uint = uint.MAX_VALUE;
var currentDistance:uint;
var arrayLength:uint = A.length;
for (var index:int = 0; index<arrayLength; index++)
{
currentDistance = Math.abs(T - A[index]);
if (currentDistance < closestDistance ||
(currentDistance == closestDistance && A[index] > T)) //between two values with the same distance, prefers the one larger than T
{
closestDistance = currentDistance;
closestIndex = index;
}
}
return T[closestIndex];
Since your array is sorted you could adapt a straightforward binary search (such as explained in this answer) to find the 'pivot' where the left-subdivision and the right-subdivision at a recursive step bracket the value you are 'searching' for.
Just a thought I had... Sort A (since its static you can just sort it once before you start), and then take a guess of what index to start guessing at (say A is length 100, you want 1200, 100*(200/1000) = 20) so guess starting at that guess, and then if A[guess] is higher than 1200, check the value at A[guess-1]. If it is still higher, keep going down until you find one that is higher and one that is lower. Once you find that determine what is closer. if your initial guess was too low, keep going up.
This won't be great and might not be the best performance wise, but it would be a lot better than checking every single value, and will work quite well if A is evenly spaced between 1000 and 2000.
Good luck!
public function nearestNumber(value:Number,list:Array):Number{
var currentNumber:Number = list[0];
for (var i:int = 0; i < list.length; i++) {
if (Math.abs(value - list[i]) < Math.abs(value - currentNumber)){
currentNumber = list[i];
}
}
return currentNumber;
}

How to represent a binomial tree in memory

I've got such a structure, is described as a "binomial tree". Let'see a drawing:
Which is the best way to represent this in memory? Just to clarify, is not a simple binary tree since the node N4 is both the left child of N1 and the right child of N2, the same sharing happens for N7 and N8 and so on... I need a construction algorithm tha easily avoid to duplicates such nodes, but just referencing them.
UPDATE
Many of us does not agree with the "binomial tree deefinition" but this cames from finance ( expecially derivative pricing ) have a look here: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter45.html for example. So I used the "Domain acceted definition".
You could generate the structure level by level. In each iteration, create one level of nodes, put them in an array, and connect the previous level to them. Something like this (C#):
Node GenerateStructure(int levels)
{
Node root = null;
Node[] previous = null;
for (int level = 1; level <= levels; level++)
{
int count = level;
var current = new Node[count];
for (int i = 0; i < count; i++)
current[i] = new Node();
if (level == 1)
root = current[0];
for (int i = 0; i < count - 1; i++)
{
previous[i].Left = current[i];
previous[i].Right = current[i + 1];
}
previous = current;
}
return root;
}
The whole structure requires O(N^2) memory, where N is the number of level. This approach requires O(N) additional memory for the two arrays. Another approach would be to generate the graph from left to right, but that would require O(N) additional memory too.
The time complexity is obviously O(N^2).
More than a tree, of which I would give a definition like 'connected graph of N vertex and N-1 edges', that structure seems like a Pascal (or Tartaglia, as teached in Italy) triangle. As such, an array with a suitable indexing suffices.
Details on construction depends on your data input: please give some more hint.

CUDA - specifiying <<<x,y>>> for a for loop

Hey,
I have two arrays of size 2000. I want to write a kernel to copy one array to the other. The array represents 1000 particles. index 0-999 will contain an x value and 1000-1999 the y value for their position.
I need a for loop to copy up to N particles from 1 array to the other. eg
int halfway = 1000;
for(int i = 0; i < N; i++){
array1[i] = array2[i];
array1[halfway + i] = array[halfway + i];
}
Due to the number of N always being less than 2000, can I just create 2000 threads? or do I have to create several blocks.
I was thinking about doing this inside a kernel:
int tid = threadIdx.x;
if (tid >= N) return;
array1[tid] = array2[tid];
array1[halfway + tid] = array2[halfway + tid];
and calling it as follows:
kernel<<<1,2000>>>(...);
Would this work? will it be fast? or will I be better off splitting the problem into blocks. I'm not sure how to do this, perhaps: (is this correct?)
int tid = blockDim.x*blockIdx.x + threadIdx.x;
if (tid >= N) return;
array1[tid] = array2[tid];
array1[halfway + tid] = array2[halfway + tid];
kernel<<<4,256>>>(...);
Would this work?
Have you actually tried it?
It will fail to launch, because you are allowed to have 512 threads maximum (value may vary on different architectures, mine is one of GTX 200-series). You will either need more blocks or have fewer threads and a for-loop inside with blockDim.x increment.
Your multi-block solution should work as well.
Other approach
If this is the only purpose of the kernel, you might as well try using cudaMemcpy with cudaMemcpyDeviceToDevice as the last parameter.
The only way to answer questions about configurations is to test them. To do this, write your kernels so that they work regardless of the configuration. Often, I will assume that I will launch enough threads, which makes the kernel easier to write. Then, I will do something like this:
threads_per_block = 512;
num_blocks = SIZE_ARRAY/threads_per_block;
if(num_blocks*threads_per_block<SIZE_ARRAY)
num_blocks++;
my_kernel <<< num_blocks, threads_per_block >>> ( ... );
(except, of course, threads_per_block might be a define, or a command line argument, or iterated to test many configurations)
Is better to use more than one block for any kernel.
It Seems to me that you are simply copying from one array to another as a sequence of values with an offset.
If this is the case you can simply use the cudaMemcpy API call and specify
cudaMemcpyDeviceToDevice
cudaMemcpy(array1+halfway,array1,1000,cudaMemcpyDeviceToDevice);
The API will figure out the best partition of block / threads.