How could random functions be really random? - function

Introduction
I know I'm going to lose a lot of reputation for this question and I also know it will be flagged as inappropriate but I'm really curious about that so I'm not giving up if there's any chance I'm getting at least an answer.
Question
Today I woke up thinking:
Hei, how could random functions be really random if they are created by an algorithm?
Think about it. How could you create a function that simulates randomness without the concept of random already built in? I began to think:
Hei, I'd take an array of int, then I'd do [thing], then [thing], than [thing] again, then I'd choose only odd numbers... ecc
But it seems more likely a function that make it more confusing to predict what the choose will be rather than real randomness.
Is it possible to create randomness? How are functions that returns random ints (such as rand() in PHP) created? How can they simulate randomness?

Functions that algorithmically produce so-called random numbers are pseudorandom number generators. If you know the seed used to generate the sequence, then the numbers are predictable. The sequence itself is a statistically random distribution but not truly random.
There are true random number generators that typically involve some hardware that samples randomness from the physical world, e.g., radioactivity or acoustic noise. A naive implementation would be to sample hard disk access and mouse movements. See random.org for a real RNG.
Obligatory xkcd strip:

There's a reason they're called pseudorandom numbers; they're not truly random. From Wikipedia:
A pseudorandom number generator
(PRNG), also known as a deterministic
random bit generator (DRBG),[1] is an
algorithm for generating a sequence of
numbers that approximates the
properties of random numbers. The
sequence is not truly random in that
it is completely determined by a
relatively small set of initial
values, called the PRNG's state.

Read volume 2, chapter 3 of this seminal work if you want the maths behind it. You can buy it to look impressive on your bookshelf. (Just keep in mind that most people who buy it wind up never actually reading it -- for a good reason. It's VERY dense and VERY difficult reading.) The short answer that doesn't involve massive tomes of difficult text is that "random" numbers generated purely algorithmically are pseudorandom, which is to say that they are "random enough".

You might want to look into wikipedia's article on PRNGS - what all random number generators we have on PCs (pretty much) are.
About the closest you can get to random, which I think is done somewhere, is to use temperatures in the CPU or some other sensor reading as a seed for one of these. If the seed is random (the temperature is unlikely to ever be exactly the same), the sequence is about as close to random as possible.

I usually "get Milliseconds" and divide it to a pseudorandom number. This makes it even more random and unpredictable.

Related

Does using binary numbers in code improves performance?

I've seen quite a few examples where binary numbers are being used in code, like 32,64,128 and so on (for instance, very well known example - minecraft)
I want to ask, does using binary numbers in such high level languages as Java / C++ help anything?
I know assembly and that you would always rather use these because in low level language it overcomplicates things if you go above register limit.
Will programs run any faster/save up more memory if you use binary numbers?
As with most things, "it depends".
In compiled languages, the better compilers will deduce that slow machine instructions can sometimes be done with different faster machine instructions (but only for special values, such as powers of two). Sometimes coders know this and program accordingly. (e.g. multiplying by a power of two is cheap)
Other times, algorithms are suited towards representations involving powers of two (e.g. many divide and conquer algorithms like the Fast Fourier Transform or a merge sort).
Yet other times, it's the most compact way to represent boolean values (like a bitmask).
And on top of that, other times it's more efficiency for memory purposes (typically because it's so fast do to multiply and divide logic with powers of two, the OS/hardware/etc will use cache line / page sizes / etc that are powers of two, so you'd do well to have nice power of two sizes for your important data structures).
And then, on top of that, other times.. programmers are just so used to using powers of two that they simply do it because it seems like a nice number.
There are some benefits of using powers of two numbers in your programs. Bitmasks are one application of this, mainly because bitwise operators (&, |, <<, >>, etc) are incredibly fast.
In C++ and Java, this is done a fair bit- especially with GUI applications. You could have a field of 32 different menu options (such as resizable, removable, editable, etc), and apply each one without having to go through convoluted addition of values.
In terms of raw speedup or any performance improvement, that really depends on the application itself. GUI packages can be huge, so getting any speedup out of those when applying menu/interface options is a big win.
From the title of your question, it sounds like you mean, "Does it make your program more efficient if you write constants in binary?" If that's what you meant, the answer is emphatically, No. The compiler translates all your constants to binary at compile time, so by the time the program runs, it makes no difference. I don't know if the compiler can interpret binary constants faster than decimal, but the difference would surely be trivial.
But the body of your question seems to indicate that you mean, "use constants that are round number in binary" rather than necessarily expressing them in binary digits.
For most purposes, the answer would be no. If, say, the computer has to add two numbers together, adding a number that happens to be a round number in binary is not going to be any faster than adding a not-round number.
It might be slightly faster for multiplication. Some compilers are smart enough to turn multiplication by powers of 2 into a bit shift operation rather than a hardware multiply, and bit shifts are usually faster than multiplies.
Back in my assembly-language days I often made elements in arrays have sizes that were powers of 2 so I could index into the array with a bit-shift rather than a multiply. But in a high-level language that would be hard to do, as you'd have to do some research to find out just how much space your primitives take in memory, whether the compiler adds padding bytes between them, etc etc. And if you did add some bytes to an array element to pad it out to a power of 2, the entire array is now bigger, and so you might generate an extra page fault, i.e. the operating system runs out of memory and has to write a chunck of your data to the hard drive and then read it back when it needs it. One extra hard drive right takes more time than 1000 multiplications.
In practice, (a) the difference is so trivial that it would almost never be worth worrying about; and (b) you don't normally know everything happenning at the low level, so it would often be hard to predict whether a change with its intendent ramifications would help or hurt.
In short: Don't bother. Use the constant values that are natural to the problem.
The reason they're used is probably different - e.g. bitmasks.
If you see them in array sizes, it doesn't really increase performance, but usually memory is allocated by power of 2. E.g. if you wrote char x[100], you'd probably get 128 allocated bytes.
No, your code will ran the same way, no matter what is the number you use.
If by binary numbers you mean numbers that are power of 2, like: 2, 4, 8, 16, 1024.... they are common due to optimization of space, normally. Example, if you have a 8 bit pointer it is capable of point to 256 (that is a power of 2), addresses, so if you use less than 256 you are wasting your pointer.... so normally you allocate a 256 buffer... this same works for all other power of 2 numbers....
In most cases the answer is almost always no, there is no noticeable performance difference.
However, there are certain cases (very few) when NOT using binary numbers for array/structure sizes/length will give noticeable performance benefits. These are cases when you're filling the cache and because you're looping over a structure that fills the cache in a such a way that you have cache collisions every time you loop through your array/structure. This case is very rare, and shouldn't be preoptimized unless you're having problems with your code performing much more slowly than theoretical limits say it should. Also, this case is very hardware dependent and will change from system to system.

Extracting initial seed value of a PRNG?

I recently read that you can predict the outcomes of a PRNG if you:
Know what algorithm is being used.
Have consecutive data points.
Is it possible to figure out the seed used for a PRNG from only data points?
I managed to find a paper by Kelsey et al which details the different types of attack and also summarises some real-world examples. It seems most attacks rely on similar techniques to those against cryptosystems, and in most cases actually taking advantage of the fact that the PRNG is used in a cryptosystem.
With "enough" data points that are the absolute first data points generated by the PRNG with no gaps, sure. Most PRNG functions are invertible, so just work backwards and you should get the seed.
For example, the typical return seed=(seed*A+B)%N has an inverse of return seed=((seed-B)/A)%N.
It's always theoretically possible, if you're "allowed" to brute force all possible values for the seed, and if you have enough data points that there's only one seed that could have produced that output. If the PRNG was seeded with the time, and you know roughly when that happened, then this might be very fast since there aren't many plausible values to try. If the PRNG was seeded with data from a truly random source having 64 bits of entropy, then this approach is computationally infeasible.
Whether there are other techniques depends on the algorithm. For example doing this for Blum Blum Shub is equivalent to integer factorization, which is generally believed to be a hard computational problem. Other, faster PRNGs might be less "secure" in this sense. Any PRNG used for crypto purposes, for example in a stream cipher, pretty much needs there to be no known feasible way of doing it.

random number generation how?

I was wondering how the random number functions work. I mean is the server time used or which other methods are used to generate random numbers? Are they really random numbers or do they lean to a certain pattern? Let's say in python:
import random
number = random.randint(1,10)
Random number generators vary (of course) by different platform, but in general, they're only "pseudo-random" numbers. That is, the "random" numbers are generated by an algorithm that is chosen to provide a distribution of numbers that's reasonably even and with a statistical distribution similar to what one would expect of true randomness. These random number generators typically take a "seed" value, which is used to initiate the "sequence"; usually, the same "seed" value will return the same "random" number (indicating that it's clearly not actually "random").
One can obtain reasonable pseudorandom results, however, by seeding the "random" number function with a rapidly changing number, such as the time (in ticks) from the machine, or other varying seed values. That doesn't change the fact, however, that these "random" numbers aren't really random; however, for most purposes, they can be considered "good enough".
One note as an addendum: there are actual random number generators that are hardware based that can be purchased and used that actually are random. These typically depend on the measurement of a varying quantity, such as the number of photons received by a detector, and biased such that they return truly random values. These are relatively rare, however.
Yes, time is typically used to seed a random number generator when it's not important that the numbers be unpredictable. For example, if you are displaying random images in a slideshow then the time is a good value to use so that the sequence of images isn't the same the next time you run the slideshow.
However, since the time is known by everyone to a high degree of accuracy, this would be a terrible seed for crypto purposes. Netscape used to use this method and it was shown to be vulnerable to attack. Nowadays secure random numbers are generated using entropy gathered by devices like mouse movement and microphone input. "Headless" network devices use characteristics of its network traffic as a more-or-less unpredictable entropy source. For really special applications sometimes hardware randomness sources are used like cameras and Geiger counters. On unix systems you can get secure random numbers from /dev/random and it'll block if there's not "enough entropy" (estimated through a counter) to guarantee secure randomness.
Its pseudo random number generator, exact working depends on implementation, but I assume it's some kind of c implementation of Mersen-Twister: http://docs.python.org/library/random.html (Third paragraph)
Oh, and exact function randint is built upon base random function. Random returns real number from range (0,1], and randint(a,b) returns integer from range [a,b] and can be implemented as lambda a,b: int(a + random.random()*(b+1-a))
Depending on your background you might like Numerical Recipes. I am a physicist
and I really like this book (even though mathematicians occasionally
write bad things about it, it gives nice overviews on a lot of topics).
See chapter 7 for a nice introduction into random numbers.

Counting FLOPS/GFLOPS in program - CUDA

Already finished my application which multiplies CRS matrix and vector (SpMV) and the only thing to do now is to count FLOPS my application did. In my opinion it's really hard to estimate number of floating point operation in case of sparse matrix - vector multiplication, because the number of multiplies in one row is really "jumpy" or fluent.
I only tried to measure time using "cudaprof" ( available in ./CUDA/bin directory) - it works fine.
Any sugestions and instruction pastes appreciated !
That's not just your opinion; it's simple fact that the number of operations in the case of a sparse matrix is data-dependent, and so you can't get a reasonable answer without knowing something about the data. That makes it impossible to have a one-number-fits-all-data estimate.
This is probably one of the sorts of situations where you could think hard about it for many hours (and do lots of research) to make a maybe-accurate estimate, or you could spend a few minutes writing a variant of your existing implementation that increments a counter each time it does an operation. Sure, that's going to take quite a while to run (especially if you don't do it in a CUDA-enabled form), but probably a lot less time than it would take to do the thinking, and when the answer comes out, you don't have to do a lot of work to convince yourself that it's right.

What does 'seeding' mean?

Very simple question. What does the term 'seeding' mean in general? I'll put the context, i.e., you must seed for random functions.
It means: pick a place to start.
Think of a pseudo random number generator as just a really long list of numbers. This list is circular, it eventually repeats.
To use it, you need to pick a starting place. This is called a "seed".
Most random functions that are common on personal computers aren't random, but deterministic to a degree. The 'seed' for these psuedo-random functions are the starting point upon which future values are based. This is useful for debugging purposes: if you keep the seed the same from execution to execution you'll get the same numbers.
To get numbers that are more random a different seed is often used from execution to execution. This is often based on the time of the machine.
This method is completely different than generating a 'true' random number based on some sort of physical property in the world around us. Lava lamps and sun spots are two of the more 'fun' properties that can be observed to generate 'more random' numbers. Anyone can hit http://www.random.org/ to get a real random number if its truly neccessary like for a poker website. If you don't have a good generator folks can attempt to figure out how the generator works and predict future numbers.
Imagine a card game and development of the game program vs. running the game to actually play it.
Pseudo-random number generators use a seed or seeds to determine the starting point of the sequence. Some of them always make the same sequence, others can produce different sequences depending on the seed. Some use a cascade, a simple RNG is given a simple seed, and this is run for a while to produce a more complex seed for the masterpiece RNG.
It is quite useful to be able to deliberately repeat the sequence when developing the program or when one wishes to reproduce previous results.
However, imagine a card game. It's obviously not a good idea to always deal the same sequence of cards.
"Seeding" random function prevents it from giving out the same sequence of random numbers.
Think of it as a super-random start of your random generator.