Related
I wanted to test a recursive function in gnuplot.
The following is the function
factorial(n)= ((n==1?sprintf("1"):sprintf("%f",n*factorial(n-1))))
when factorial(100) tested, it looks fine
93326215443944102188325606108575267240944254854960571509166910400407995064242937148632694030450512898042989296944474898258737204311236641477561877016501813248.000000
to make the number a integer, i changed sprintf like that
factorial(n)= ((n==1?sprintf("1"):sprintf("%d",n*factorial(n-1))))
But, the result is strange ; it seemed to be out of range on integer size?
-2147483648
So i changed the function on real number type without no number below floating point
factorial(n)= ((n==1?sprintf("1"):sprintf("%.0f",n*factorial(n-1))))
But, the result is more strange,
-75703234367175440481992733883343393025021587121454605713387911182978138051561570016048371488788270318477285688499861254195149884149214115360733197762560
Could u explain what it is?
Thanks,
The largest integer that can be represented in 32 bits is 2147483647.
So a 32-bit integer (gnuplot through version 5.2) runs out of bits between 12! and 13!
A 64-bit integer (gnuplot 5.4) runs out of bits between 20! and 21!
If you use double precision floating point arithmetic it can hold larger numbers, but the number of bits still limits the precision. 64-bit floating point has only ~53 bits of precision, so the largest exact factorial it can represent is actually smaller than can be handled by a 64-bit integer.
Note that all of your example gnuplot code uses integer arithmetic if n is an integer. Changing to a floating point format for printing doesn't change that.
Not sure what your final goal is... But just for fun and feasibility, you can calculate Factorial(100) with gnuplot.
However, you will get the result as a string. I haven't tested what the limits for the arguments a and d would be. For a it would be maximum string length and for d probably the integer limit (32 bit or 64 bit).
Code:
### factorial for large numbers (output as string!)
reset session
# a: integer number string
# d: integer number
Multiply(a,d) = (_u=0, _tmp='', _n=strlen(a), sum [_i=1:_n] (_j=_n-_i+1, \
_m1=a[_j:_j]*d, _d1=(_m1+_u)%10, _u=(_m1+_u)/10, \
_tmp=sprintf("%d%s",_d1,_tmp),0), _u ? sprintf("%d%s",_u,_tmp) : _tmp)
Factorial(n) = (_tmp="1", sum [_j=1:n] ( _tmp=Multiply(_tmp,_j), 0), _tmp)
print Factorial(100)
### end of code
Result: 100! (actually, 24 zeros at the end)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
I'm trying to the the mass of the black hole at the center of this galaxy, I have the mass in solar masses, but need it in kg. However when I try to convert (1Msolar = 1.989*10^30kg) idl just gives me 0.0000. I have no idea what I'm doing wrong and I tried just telling idl to print both 1.989*10^30 and 1989000000000000000000000000000 and the outputs are 0.00000 and -1 respectively. Can someone please explain why this is happening?
This is a type conversion error/overflow issue. When you use large numbers you either need to explicitly define them as long or long64 (i.e., 64-bit long integer) for integer numbers. For real numbers, you can use float or double and to do this, the easiest way is the following:
msun = 1.989d30
which is equivalent to 1.989 x 1030 as a double-precision floating point number. If you want single precision, then just do the following:
msun = 1.989e30
To make a 32- or 64-bit long integer, just use:
msun = 1989L * 10L^(27)
or for 64-bit
msun = 1989LL * 10LL^(27)
I agree with #honeste_vivere's answer about overflow and data types, but I would add that I often change units to avoid this. I frequently have densities that are order 1e19/m^3, so I cast density in units of 1e19/m^3 and then deal with numbers that are order 1. This prevents math errors during least squares fits and other operations that might do things like squaring my data.
My table has a column which I know is a 128-bit unsigned number, stored in base-10 as a varchar e.g.
"25495123833603613494099723681886"
I know the bit-fields in this number, and I want to use the top 64 bits in GROUP BY.
In my example, the top 64-bits would be 1382093432409
I have not found a way, but I have eliminated some leads:
cannot convert to NUMERIC/DECIMAL because these are 64-bit too
cannot use LEFT() because 1<<64 is not base-10 aligned
CONV(N,10,16) would allow LEFT() but CONV() works at 64-bit precision only too :(
How can I get the BIGINT that is the top 64-bits of this number, so I can use that in the GROUP BY?
You can get at least part of the way there using floating-point math.
MySQL uses double-precision floating-point for non-integer math. This gives you about 50 bits of reliable precision for integral values - while this isn't quite 64, it's pretty close. You can use floating-point math to extract the top bits here using the expression:
FLOOR(n / POW(2,64))
where n is the name of the column.
This approach runs out of steam if more than 50 bits are needed, though, as even double-precision floats don't have enough significant bits to represent the whole thing, and trying to get any more using subtraction fails due to cancellation. (The extra bits are lost as soon as the string is converted to a number; they can't be brought back without doing something entirely different.)
I know that Knapsack is NP-complete while it can be solved by DP. They say that the DP solution is pseudo-polynomial, since it is exponential in the "length of input" (i.e. the numbers of bits required to encode the input). Unfortunately I did not get it. Can anybody explain that pseudo-polynomial thing to me slowly ?
The running time is O(NW) for an unbounded knapsack problem with N items and knapsack of size W. W is not polynomial in the length of the input though, which is what makes it pseudo-polynomial.
Consider W = 1,000,000,000,000. It only takes 40 bits to represent this number, so input size = 40, but the computational runtime uses the factor 1,000,000,000,000 which is O(240).
So the runtime is more accurately said to be O(N.2bits in W), which is exponential.
Also see:
How to understand the knapsack problem is NP-complete?
The NP-Completeness of Knapsack
Complexity of dynamic programming algorithm for the 0-1 knapsack problem
Pseudo-polynomial time
In most of our problems, we're dealing with large lists of numbers which fit comfortably inside standard int/float data types. Because of the way most processors are built to handle 4-8 byte numbers at a time at no additional cost (relative to numbers than fit in, say, 1 byte), we rarely encounter a change in running time from scaling our numbers up or down within ranges we encounter in real problems - so the dominant factor remains just the sheer quantity of data points, the n or m factors that we're used to.
(You can imagine that the Big-O notation is hiding a constant factor that divides-out 32 or 64 bits-per-datum, leaving only the number-of-data-points whenever each of our numbers fit in that many bits or less)
But try reworking with other algorithms to act on data sets involving big ints - numbers that require more than 8 bytes to represent - and see what that does to the runtime. The magnitude of the numbers involved always makes a difference, even in the other algorithms like binary sort, once you expand beyond the buffer of safety conventional processors give us "for-free" by handling 4-8 byte batches.
The trick with the Knapsack algorithm that we discussed is that it's unusually sensitive (relative to other algorithms ) to the magnitude of a particular parameter, W. Add one bit to W and you double the running time of the algorithm. We haven't seen that kind of dramatic response to changes in value in other algorithms before this one, which is why it might seem like we're treating Knapsack differently - but that's a genuine analysis of how it responds in a non-polynomial fashion to changes in input size.
The way I understand this is that the capacity would've been O(W) if the capacity input were an array of [1,2,...,W], which has a size of W. But the capacity input is not an array of numbers, it's instead a single integer. The time complexity is about the relationship to the size of input. The size of an integer is NOT the value of the integer, but the number of bits representing it. We do later convert this integer W into an array [1,2,...,W] in the algorithm, leading people into mistakenly thinking W is the size, but this array is not the input, the integer itself is.
Think of input as "an array of stuff", and the size as "how many stuff in the array". The item input is actually an array of n items in the array so size=n. The capacity input is NOT an array of W numbers in it, but a single integer, represented by an array of log(W) bits. Increase the size of it by 1 (adding 1 meaningful bit), W doubles so run time doubles, hence the exponential time complexity.
The Knapsack algorithm's run-time is bound not only on the size of the input (n - the number of items) but also on the magnitude of the input (W - the knapsack capacity) O(nW) which is exponential in how it is represented in computer in binary (2^n) .The computational complexity (i.e how processing is done inside a computer through bits) is only concerned with the size of the inputs, not their magnitudes/values.
Disregard the value/weight list for a moment. Let's say we have an instance with knapsack capacity 2. W would take two bits in the input data. Now we shall increase the knapsack capacity to 4, keeping the rest of the input. Our input has only grown by one bit, but the computational complexity has increased twofold. If we increase the capacity to 1024, we would have just 10 bits of the input for W instead of 2, but the complexity has increased by a factor of 512. Time complexity grows exponentially in the size of W in binary (or decimal) representation.
Another simple example that helped me understand the pseudo-polynomial concept is the naive primality testing algorithm. For a given number n we are checking if it's divided evenly by each integer number in range 2..√n, so the algorithm takes √(n−1) steps. But here, n is the magnitude of the input, not it's size.
Now The regular O(n) case
By contrast, searching an array for a given element runs in polynomial time: O(n). It takes at most n steps and here n is the size of the input (the length of the array).
[ see here ]
Calculating bits required to store decimal number
Complexity is based on input. In knapsack problem, Inputs are size, max Capacity, and profit, weight arrays. We construct dp table as size * W so we feel as its of polynomial time complexity. But, input W is an integer, not an array. So, it will be O(size*(no Of bits required to store given W)). If no of bits increase by 1, then running time doubles. Thus it is exponential, thereby pseudo-polynomial.
What is an integer overflow error?
Why do i care about such an error?
What are some methods of avoiding or preventing it?
Integer overflow occurs when you try to express a number that is larger than the largest number the integer type can handle.
If you try to express the number 300 in one byte, you have an integer overflow (maximum is 255). 100,000 in two bytes is also an integer overflow (65,535 is the maximum).
You need to care about it because mathematical operations won't behave as you expect. A + B doesn't actually equal the sum of A and B if you have an integer overflow.
You avoid it by not creating the condition in the first place (usually either by choosing your integer type to be large enough that you won't overflow, or by limiting user input so that an overflow doesn't occur).
The easiest way to explain it is with a trivial example. Imagine we have a 4 bit unsigned integer. 0 would be 0000 and 1111 would be 15. So if you increment 15 instead of getting 16 you'll circle back around to 0000 as 16 is actually 10000 and we can not represent that with less than 5 bits. Ergo overflow...
In practice the numbers are much bigger and it circles to a large negative number on overflow if the int is signed but the above is basically what happens.
Another way of looking at it is to consider it as largely the same thing that happens when the odometer in your car rolls over to zero again after hitting 999999 km/mi.
When you store an integer in memory, the computer stores it as a series of bytes. These can be represented as a series of ones and zeros.
For example, zero will be represented as 00000000 (8 bit integers), and often, 127 will be represented as 01111111. If you add one to 127, this would "flip" the bits, and swap it to 10000000, but in a standard two's compliment representation, this is actually used to represent -128. This "overflows" the value.
With unsigned numbers, the same thing happens: 255 (11111111) plus 1 would become 100000000, but since there are only 8 "bits", this ends up as 00000000, which is 0.
You can avoid this by doing proper range checking for your correct integer size, or using a language that does proper exception handling for you.
An integer overflow error occurs when an operation makes an integer value greater than its maximum.
For example, if the maximum value you can have is 100000, and your current value is 99999, then adding 2 will make it 'overflow'.
You should care about integer overflows because data can be changed or lost inadvertantly, and can avoid them with either a larger integer type (see long int in most languages) or with a scheme that converts long strings of digits to very large integers.
Overflow is when the result of an arithmetic operation doesn't fit in the data type of the operation. You can have overflow with a byte-sized unsigned integer if you add 255 + 1, because the result (256) does not fit in the 8 bits of a byte.
You can have overflow with a floating point number if the result of a floating point operation is too large to represent in the floating point data type's exponent or mantissa.
You can also have underflow with floating point types when the result of a floating point operation is too small to represent in the given floating point data type. For example, if the floating point data type can handle exponents in the range of -100 to +100, and you square a value with an exponent of -80, the result will have an exponent around -160, which won't fit in the given floating point data type.
You need to be concerned about overflows and underflows in your code because it can be a silent killer: your code produces incorrect results but might not signal an error.
Whether you can safely ignore overflows depends a great deal on the nature of your program - rendering screen pixels from 3D data has a much greater tolerance for numerical errors than say, financial calculations.
Overflow checking is often turned off in default compiler settings. Why? Because the additional code to check for overflow after every operation takes time and space, which can degrade the runtime performance of your code.
Do yourself a favor and at least develop and test your code with overflow checking turned on.
From wikipedia:
In computer programming, an integer
overflow occurs when an arithmetic
operation attempts to create a numeric
value that is larger than can be
represented within the available
storage space. For instance, adding 1 to the largest value that can be represented
constitutes an integer overflow. The
most common result in these cases is
for the least significant
representable bits of the result to be
stored (the result is said to wrap).
You should care about it especially when choosing the appropriate data types for your program or you might get very subtle bugs.
From http://www.first.org/conference/2006/papers/seacord-robert-slides.pdf :
An integer overflow occurs when an integer is
increased beyond its maximum value or
decreased beyond its minimum value.
Overflows can be signed or unsigned.
P.S.: The PDF has detailed explanation on overflows and other integer error conditions, and also how to tackle/avoid them.
I'd like to be a bit contrarian to all the other answers so far, which somehow accept crappy broken math as a given. The question is tagged language-agnostic and in a vast number of languages, integers simply never overflow, so here's my kind-of sarcastic answer:
What is an integer overflow error?
An obsolete artifact from the dark ages of computing.
why do i care about it?
You don't.
how can it be avoided?
Use a modern programming language in which integers don't overflow. (Lisp, Scheme, Smalltalk, Self, Ruby, Newspeak, Ioke, Haskell, take your pick ...)
I find showing the Two’s Complement representation on a disc very helpful.
Here is a representation for 4-bit integers. The maximum value is 2^3-1 = 7.
For 32 bit integers, we will see the maximum value is 2^31-1.
When we add 1 to 2^31-1 : Clockwise we move by one and it is clearly -2^31 which is called integer overflow
Ref : https://courses.cs.washington.edu/courses/cse351/17wi/sections/03/CSE351-S03-2cfp_17wi.pdf
This happens when you attempt to use an integer for a value that is higher than the internal structure of the integer can support due to the number of bytes used. For example, if the maximum integer size is 2,147,483,647 and you attempt to store 3,000,000,000 you will get an integer overflow error.