random number generation how? - function

I was wondering how the random number functions work. I mean is the server time used or which other methods are used to generate random numbers? Are they really random numbers or do they lean to a certain pattern? Let's say in python:
import random
number = random.randint(1,10)

Random number generators vary (of course) by different platform, but in general, they're only "pseudo-random" numbers. That is, the "random" numbers are generated by an algorithm that is chosen to provide a distribution of numbers that's reasonably even and with a statistical distribution similar to what one would expect of true randomness. These random number generators typically take a "seed" value, which is used to initiate the "sequence"; usually, the same "seed" value will return the same "random" number (indicating that it's clearly not actually "random").
One can obtain reasonable pseudorandom results, however, by seeding the "random" number function with a rapidly changing number, such as the time (in ticks) from the machine, or other varying seed values. That doesn't change the fact, however, that these "random" numbers aren't really random; however, for most purposes, they can be considered "good enough".
One note as an addendum: there are actual random number generators that are hardware based that can be purchased and used that actually are random. These typically depend on the measurement of a varying quantity, such as the number of photons received by a detector, and biased such that they return truly random values. These are relatively rare, however.

Yes, time is typically used to seed a random number generator when it's not important that the numbers be unpredictable. For example, if you are displaying random images in a slideshow then the time is a good value to use so that the sequence of images isn't the same the next time you run the slideshow.
However, since the time is known by everyone to a high degree of accuracy, this would be a terrible seed for crypto purposes. Netscape used to use this method and it was shown to be vulnerable to attack. Nowadays secure random numbers are generated using entropy gathered by devices like mouse movement and microphone input. "Headless" network devices use characteristics of its network traffic as a more-or-less unpredictable entropy source. For really special applications sometimes hardware randomness sources are used like cameras and Geiger counters. On unix systems you can get secure random numbers from /dev/random and it'll block if there's not "enough entropy" (estimated through a counter) to guarantee secure randomness.

Its pseudo random number generator, exact working depends on implementation, but I assume it's some kind of c implementation of Mersen-Twister: http://docs.python.org/library/random.html (Third paragraph)
Oh, and exact function randint is built upon base random function. Random returns real number from range (0,1], and randint(a,b) returns integer from range [a,b] and can be implemented as lambda a,b: int(a + random.random()*(b+1-a))

Depending on your background you might like Numerical Recipes. I am a physicist
and I really like this book (even though mathematicians occasionally
write bad things about it, it gives nice overviews on a lot of topics).
See chapter 7 for a nice introduction into random numbers.

Related

Does Actionscript have a math specification?

This Flash game has a lot of players including me and some friends. We noticed the same thing can run differently for different people. The math in the simulation is definitely to blame. Whether the cause is in hardware, OS, browser, 32-bit/64-bit, etc. is not really known. But with the combinations we have to test with, we've gotten 5 distinct end results from the same simulation starting conditions, and can likely get more.
This makes me wonder, does Actionscript have a floating point math specification? If so, what does it say about the accuracy and determinism of the computations?
I compare to Java, which differentiates between regular floating point math with the Math class and deterministic floating point with the StrictMath class and strictfp keyword. Both are always within 1 ulp of the exact result, this also implies the regular math and strict math always give results within 1 ulp of each other for a single operation or function call. The docs are very clear about this. I'd expect other respectable languages to have something similar, saying how accurate their floating point computations are and if they give the same results everywhere.
Update since some people have been saying the game is dishonest:
Some others have taken apart the swf and even made mods for it, they've seen the game engine and can confirm there is no randomness. Box2d is used for its physics. If a design ever does run differently on subsequent runs, it has actually changed due to some bug, usually this is a visible difference, but if not, you can check the raw data with this tool and see it is different. Different starting conditions as expected get different end results.
As for what we know so far, this is results on a test level:
For example, if I am running 32-bit Chrome on my desktop (AMD A10-5700 as CPU), I will always get that result of "946 ticks". But if I run on Firefox or Internet Explorer instead I always get the result of "794 ticks".
Actionscript doesn't really have a math specification in that sense. This is the closest you'll get:
https://help.adobe.com/en_US/FlashPlatform/reference/actionscript/3/Math.html
It says at the bottom of the top section:
The Math functions acos, asin, atan, atan2, cos, exp, log, pow, sin, and sqrt may result in slightly different values depending on the algorithms used by the CPU or operating system. Flash runtimes call on the CPU (or operating system if the CPU doesn't support floating point calculations) when performing the calculations for the listed functions, and results have shown slight variations depending upon the CPU or operating system in use.
So to answer our two questions:
What does it say about accuracy? Nothing, actually. At no point does it mention a limit to how inaccurate a result can be.
What does it say about determinism? Hardware and operating system are definitely factors, so it is platform-dependent. No confirmation for other factors.
If you want to look any deeper, you're on your own.
According to the docs, Actionscript has a catch-all Number data type in addition to int and uint types:
The Number data type uses the 64-bit double-precision format as specified by the IEEE Standard for Binary Floating-Point Arithmetic (IEEE-754). This standard dictates how floating-point numbers are stored using the 64 available bits. One bit is used to designate whether the number is positive or negative. Eleven bits are used for the exponent, which is stored as base 2. The remaining 52 bits are used to store the significand (also called mantissa), the number that is raised to the power indicated by the exponent.
By using some of its bits to store an exponent, the Number data type can store floating-point numbers significantly larger than if it used all of its bits for the significand. For example, if the Number data type used all 64 bits to store the significand, it could store a number as large as 265 – 1. By using 11 bits to store an exponent, the Number data type can raise its significand to a power of 21023.
Although this range of numbers is enormous, it comes at the cost of precision. Because the Number data type uses 52 bits to store the significand, numbers that require more than 52 bits for accurate representation, such as the fraction 1/3, are only approximations. If your application requires absolute precision with decimal numbers, use software that implements decimal floating-point arithmetic as opposed to binary floating-point arithmetic.
This could account for the varying results you're seeing.

Does using binary numbers in code improves performance?

I've seen quite a few examples where binary numbers are being used in code, like 32,64,128 and so on (for instance, very well known example - minecraft)
I want to ask, does using binary numbers in such high level languages as Java / C++ help anything?
I know assembly and that you would always rather use these because in low level language it overcomplicates things if you go above register limit.
Will programs run any faster/save up more memory if you use binary numbers?
As with most things, "it depends".
In compiled languages, the better compilers will deduce that slow machine instructions can sometimes be done with different faster machine instructions (but only for special values, such as powers of two). Sometimes coders know this and program accordingly. (e.g. multiplying by a power of two is cheap)
Other times, algorithms are suited towards representations involving powers of two (e.g. many divide and conquer algorithms like the Fast Fourier Transform or a merge sort).
Yet other times, it's the most compact way to represent boolean values (like a bitmask).
And on top of that, other times it's more efficiency for memory purposes (typically because it's so fast do to multiply and divide logic with powers of two, the OS/hardware/etc will use cache line / page sizes / etc that are powers of two, so you'd do well to have nice power of two sizes for your important data structures).
And then, on top of that, other times.. programmers are just so used to using powers of two that they simply do it because it seems like a nice number.
There are some benefits of using powers of two numbers in your programs. Bitmasks are one application of this, mainly because bitwise operators (&, |, <<, >>, etc) are incredibly fast.
In C++ and Java, this is done a fair bit- especially with GUI applications. You could have a field of 32 different menu options (such as resizable, removable, editable, etc), and apply each one without having to go through convoluted addition of values.
In terms of raw speedup or any performance improvement, that really depends on the application itself. GUI packages can be huge, so getting any speedup out of those when applying menu/interface options is a big win.
From the title of your question, it sounds like you mean, "Does it make your program more efficient if you write constants in binary?" If that's what you meant, the answer is emphatically, No. The compiler translates all your constants to binary at compile time, so by the time the program runs, it makes no difference. I don't know if the compiler can interpret binary constants faster than decimal, but the difference would surely be trivial.
But the body of your question seems to indicate that you mean, "use constants that are round number in binary" rather than necessarily expressing them in binary digits.
For most purposes, the answer would be no. If, say, the computer has to add two numbers together, adding a number that happens to be a round number in binary is not going to be any faster than adding a not-round number.
It might be slightly faster for multiplication. Some compilers are smart enough to turn multiplication by powers of 2 into a bit shift operation rather than a hardware multiply, and bit shifts are usually faster than multiplies.
Back in my assembly-language days I often made elements in arrays have sizes that were powers of 2 so I could index into the array with a bit-shift rather than a multiply. But in a high-level language that would be hard to do, as you'd have to do some research to find out just how much space your primitives take in memory, whether the compiler adds padding bytes between them, etc etc. And if you did add some bytes to an array element to pad it out to a power of 2, the entire array is now bigger, and so you might generate an extra page fault, i.e. the operating system runs out of memory and has to write a chunck of your data to the hard drive and then read it back when it needs it. One extra hard drive right takes more time than 1000 multiplications.
In practice, (a) the difference is so trivial that it would almost never be worth worrying about; and (b) you don't normally know everything happenning at the low level, so it would often be hard to predict whether a change with its intendent ramifications would help or hurt.
In short: Don't bother. Use the constant values that are natural to the problem.
The reason they're used is probably different - e.g. bitmasks.
If you see them in array sizes, it doesn't really increase performance, but usually memory is allocated by power of 2. E.g. if you wrote char x[100], you'd probably get 128 allocated bytes.
No, your code will ran the same way, no matter what is the number you use.
If by binary numbers you mean numbers that are power of 2, like: 2, 4, 8, 16, 1024.... they are common due to optimization of space, normally. Example, if you have a 8 bit pointer it is capable of point to 256 (that is a power of 2), addresses, so if you use less than 256 you are wasting your pointer.... so normally you allocate a 256 buffer... this same works for all other power of 2 numbers....
In most cases the answer is almost always no, there is no noticeable performance difference.
However, there are certain cases (very few) when NOT using binary numbers for array/structure sizes/length will give noticeable performance benefits. These are cases when you're filling the cache and because you're looping over a structure that fills the cache in a such a way that you have cache collisions every time you loop through your array/structure. This case is very rare, and shouldn't be preoptimized unless you're having problems with your code performing much more slowly than theoretical limits say it should. Also, this case is very hardware dependent and will change from system to system.

Extracting initial seed value of a PRNG?

I recently read that you can predict the outcomes of a PRNG if you:
Know what algorithm is being used.
Have consecutive data points.
Is it possible to figure out the seed used for a PRNG from only data points?
I managed to find a paper by Kelsey et al which details the different types of attack and also summarises some real-world examples. It seems most attacks rely on similar techniques to those against cryptosystems, and in most cases actually taking advantage of the fact that the PRNG is used in a cryptosystem.
With "enough" data points that are the absolute first data points generated by the PRNG with no gaps, sure. Most PRNG functions are invertible, so just work backwards and you should get the seed.
For example, the typical return seed=(seed*A+B)%N has an inverse of return seed=((seed-B)/A)%N.
It's always theoretically possible, if you're "allowed" to brute force all possible values for the seed, and if you have enough data points that there's only one seed that could have produced that output. If the PRNG was seeded with the time, and you know roughly when that happened, then this might be very fast since there aren't many plausible values to try. If the PRNG was seeded with data from a truly random source having 64 bits of entropy, then this approach is computationally infeasible.
Whether there are other techniques depends on the algorithm. For example doing this for Blum Blum Shub is equivalent to integer factorization, which is generally believed to be a hard computational problem. Other, faster PRNGs might be less "secure" in this sense. Any PRNG used for crypto purposes, for example in a stream cipher, pretty much needs there to be no known feasible way of doing it.

How could random functions be really random?

Introduction
I know I'm going to lose a lot of reputation for this question and I also know it will be flagged as inappropriate but I'm really curious about that so I'm not giving up if there's any chance I'm getting at least an answer.
Question
Today I woke up thinking:
Hei, how could random functions be really random if they are created by an algorithm?
Think about it. How could you create a function that simulates randomness without the concept of random already built in? I began to think:
Hei, I'd take an array of int, then I'd do [thing], then [thing], than [thing] again, then I'd choose only odd numbers... ecc
But it seems more likely a function that make it more confusing to predict what the choose will be rather than real randomness.
Is it possible to create randomness? How are functions that returns random ints (such as rand() in PHP) created? How can they simulate randomness?
Functions that algorithmically produce so-called random numbers are pseudorandom number generators. If you know the seed used to generate the sequence, then the numbers are predictable. The sequence itself is a statistically random distribution but not truly random.
There are true random number generators that typically involve some hardware that samples randomness from the physical world, e.g., radioactivity or acoustic noise. A naive implementation would be to sample hard disk access and mouse movements. See random.org for a real RNG.
Obligatory xkcd strip:
There's a reason they're called pseudorandom numbers; they're not truly random. From Wikipedia:
A pseudorandom number generator
(PRNG), also known as a deterministic
random bit generator (DRBG),[1] is an
algorithm for generating a sequence of
numbers that approximates the
properties of random numbers. The
sequence is not truly random in that
it is completely determined by a
relatively small set of initial
values, called the PRNG's state.
Read volume 2, chapter 3 of this seminal work if you want the maths behind it. You can buy it to look impressive on your bookshelf. (Just keep in mind that most people who buy it wind up never actually reading it -- for a good reason. It's VERY dense and VERY difficult reading.) The short answer that doesn't involve massive tomes of difficult text is that "random" numbers generated purely algorithmically are pseudorandom, which is to say that they are "random enough".
You might want to look into wikipedia's article on PRNGS - what all random number generators we have on PCs (pretty much) are.
About the closest you can get to random, which I think is done somewhere, is to use temperatures in the CPU or some other sensor reading as a seed for one of these. If the seed is random (the temperature is unlikely to ever be exactly the same), the sequence is about as close to random as possible.
I usually "get Milliseconds" and divide it to a pseudorandom number. This makes it even more random and unpredictable.

What does 'seeding' mean?

Very simple question. What does the term 'seeding' mean in general? I'll put the context, i.e., you must seed for random functions.
It means: pick a place to start.
Think of a pseudo random number generator as just a really long list of numbers. This list is circular, it eventually repeats.
To use it, you need to pick a starting place. This is called a "seed".
Most random functions that are common on personal computers aren't random, but deterministic to a degree. The 'seed' for these psuedo-random functions are the starting point upon which future values are based. This is useful for debugging purposes: if you keep the seed the same from execution to execution you'll get the same numbers.
To get numbers that are more random a different seed is often used from execution to execution. This is often based on the time of the machine.
This method is completely different than generating a 'true' random number based on some sort of physical property in the world around us. Lava lamps and sun spots are two of the more 'fun' properties that can be observed to generate 'more random' numbers. Anyone can hit http://www.random.org/ to get a real random number if its truly neccessary like for a poker website. If you don't have a good generator folks can attempt to figure out how the generator works and predict future numbers.
Imagine a card game and development of the game program vs. running the game to actually play it.
Pseudo-random number generators use a seed or seeds to determine the starting point of the sequence. Some of them always make the same sequence, others can produce different sequences depending on the seed. Some use a cascade, a simple RNG is given a simple seed, and this is run for a while to produce a more complex seed for the masterpiece RNG.
It is quite useful to be able to deliberately repeat the sequence when developing the program or when one wishes to reproduce previous results.
However, imagine a card game. It's obviously not a good idea to always deal the same sequence of cards.
"Seeding" random function prevents it from giving out the same sequence of random numbers.
Think of it as a super-random start of your random generator.