Simulating a roll with a biased dice - language-agnostic

I did a search but didn't really get any proper hits. Maybe I used incorrect terms?
What I want to ask about is an algorithm for simulating a biased role rather than a standard supposedly-random roll.
It wouldn't be a problem if you can't give me exact answers (maybe the explanation is lengthy?) but I would appreciate &pointers to material I can read about it.
What I have in mind is to for example, shift the bias towards the 5, 6 area so that the numbers rolls would have a higher chances of getting a 5 or a 6; that's the sort of problem I'm trying to solve.
[Update]
Upon further thought and by inspecting some of the answers, I've realized that what I want to achieve is really the Roulette Wheel Selection operator that's used in genetic algorithms since having a larger sector means increasing the odds the ball will land there. Am I correct with this line of thought?

In general, if your probabilities are {p1,p2, ...,p6}, construct the following helper list:
{a1, a2, ... a5} = { p1, p1+p2, p1+p2+p3, p1+p2+p3+p4, p1+p2+p3+p4+p5}
Now get a random number X in [0,1]
If
X <= a1 choose 1 as outcome
a1 < X <= a2 choose 2 as outcome
a2 < X <= a3 choose 3 as outcome
a3 < X <= a4 choose 4 as outcome
a4 < X <= a5 choose 5 as outcome
a5 < X choose 6 as outcome
Or, more efficient pseudocode
if X > a5 then N=6
elseif X > a4 then N=5
elseif X > a3 then N=4
elseif X > a2 then N=3
elseif X > a1 then N=2
else N=1
Edit
This is equivalent to the roulette wheel selection you mention in your question update as shown in this picture:

Let's say the die is biased towards a 3.
Instead of picking a random entry from an array 1..6 with 6 entries, pick a random entry from an array 1..6, 3, 3. (8 entries).

Make a 2 dimensional array of possible values and their weights. Sum up all the weights. Randomly choose a value on the range of 0 to the sum of the weights.
Now iterate through the array while keeping an accumulator of the weights seen so far. Once this value exceeds your random number, pick the value of the die represented here.
Hope this helps

Hmm. Say you want to have a 1/2 chance of getting a six, and a 1/10 chance of getting any other face. To simulate this, you could generate a random integer n in [1, 2, ... , 10] , and the outcome would map to six if n is in [6, 7, 8, 9, 10] and map to n otherwise.

One way that's usually fairly easy is to start with a random number in an expanded range, and break that range up into unequal pieces.
For example, with a perfectly even (six-sided) die, each number should come up 1/6th of the time. Let's assume you decide on round percentages -- all the other numbers should come up 16 percent of the time, but 2 should come up 17 percent of the time.
You could do that by generating numbers from 1 to 100. If the number is from 1 to 16, it comes out as a 1. If it's from 17 to 34, it comes out as a 2. If it's from 34 to 50, it comes out as a 3 (and the rest are blocks of 16 apiece).

Related

Theory behind multiplying two numbers without operands

I have been reading a Elements of Programming Interview and am struggling to understand the passage below:
"The algorithm taught in grade-school for decimal multiplication does
not use repeated addition- it uses shift and add to achieve a much
better time complexity. We can do the same with binary numbers- to
multiply x and y we initialize the result to 0 and iterate through the
bits of x, adding (2^k)y to the result if the kth bit of x is 1.
The value (2^k)y can be computed by left-shifting y by k. Since we
cannot use add directly, we must implement it. We can apply the
grade-school algorithm for addition to the binary case, i.e, compute
the sum bit-by-bit and "rippling" the carry along.
As an example, we show how to multiply 13 = (1101) and 9 = (1001)
using the algorithm described above. In the first iteration, since
the LSB of 13 is 1, we set the result to (1001). The second bit of
(1101) is 0, so we move on the third bit. The bit is 1, so we shift
(1001) to the left by 2 to obtain (1001001), which we add to (1001) to
get (101101). The forth and final bit of (1101) is 1, so we shift
(1001) to the left by 3 to obtain (1001000), which we add to (101101)
to get (1110101) = 117.
My Questions are:
What is the overall idea behind this, how is it a "bit-by-bit" addition
where does (2^k)y come from
what does it mean by "left-shifting y by k"
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
The algorithm relies on the way numbers are coded in binary.
Let A be an unsigned number. A is coded by a set of bits an-1an-2...a0 in such a way that A=∑i=0n-1ai×2i
Now, assume you have two numbers A and B coded in binary and you wand to compute A×B
B×A=B×∑i=0n-1ai×2i
=∑i=0n-1B×ai×2i
ai is equal to 0 or 1. If ai=0, the sum will not be modified. If ai=1, we need to add B×ai
So, we can simply deduce the multiplication algorithm
result=0
for i in 0 to n-1
if a[i]=1 // assumes a[i] is the ith bit
result = result + B * 2^i
end
end
What is the overall idea behind this, how is it a "bit-by-bit" addition
It is just an application of the previous method where you process successively every bit of the multiplicator
where does (2^k)y come from
As mentioned above from the way binary numbers are coded. If ith bit is set, then there is a 2i in the decomposition of the number.
what does it mean by "left-shifting y by k"
Left shift means "pushing" the bits leftwards and filling the "holes" with zeroes. Hence if number is 1101 and it is left shifted by three, it becomes 1101000.
This is the way to multiply the number by 2i (just as when "left shifting" by 2 a decimal number and putting zeroes at the right places is the way to multiply by 100=102)
In the example, why do we set result to (1001) just because the LSB of 13 is 1?
Because there is a 1 at right most position, that corresponds to 20. So we left shift by 0 and add it to the result that is initialized to 0.

Why is alpha set to 15 in NLTK - VADER?

I am trying to understand what the VADER does for analysis of sentences.
Why is the hyper-parameter Alpha set to 15 here? I understand that the it is unstable when left unbound, but why 15?
def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score
Vader's normalization equation is which is the equation for
I have read the paper of the research for Vader from here:http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
Unfortunately, I could not find any reason why such a formula and 15 as the value for alpha was chosen but the experiments and the graph show that as x grows which is the sum of sentiments' scores grow the value becomes closer to -1 or 1 which indicates that as number of words grow the score tends more towards -1 or 1. Which means that Vader works better with short documents or tweets compared to long documents.

Three-way xor-like function

I'm trying to solve the following puzzle:
Given a stream of numbers (only 1 iteration over them is allowed) in which all numbers appear 3 times, but 1 number appear only 2 times, find this number, using O(1) memory.
I started with the idea that, if all numbers appeared 2 times, and 1 number only once, I could use xor operation between all numbers and the result would be the incognito number.
So I want to extend this idea to solve the puzzle. All I need is a xor-like function (or operator), which would yield 0 on the third apply:
SEED xor3 X xor3 X xor3 X = SEED
X xor3 Y xor3 SEED xor3 X xor3 Y xor3 Y xor3 X = SEED
Any ideas for such a function?
Regard XOR as summation on each bit of a number expressed in binary (i.e. a radix of 2), modulo 2.
Now consider a numerical system consisting of tribits 0, 1, and 2. That is, it has a radix of 3.
The operator T now becomes an operation on any number, decomposed into this radix. As in XOR, you sum the bits, but the difference is that operator T is ran in modulo 3.
You can easily show that a T a T a is zero for any a. You can also show that T is both commutative and associative. That is necessary since, in general, your sequence will have the numbers jumbled up.
Now apply this to your list of numbers. At the end of the operation, the output will be b where b = o T o and o is the number that occurs exactly twice.
Your solution for the simpler case (all number appear twice, one number appears once) works since xor operates on each bit x as
x xor x = 0 and 0 xor x = x
xor is basically a bit-wise summation modulus 2. You would need the base-3 equivalent: Transform each number into a base-3 representation. And then use summation modulus 3 for each decimal:
0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
Call this operation xor3. Now you have for each decimal x:
x xor3 x xor3 x = 0 and 0 xor3 x = x
If you apply that to all your numbers then all values that appear 3 times will vanish. The result is x xor3 x of the number x that appears twice. You need to apply decimal-wise division by 2 modulus 3.
I believe there are more efficient ways to implement that. The advantage of the xor function in the first case relies on the fact that xor is a natural base-2 operation. Is there any practical application for that?
This approach is a bit fragile: If the precondition (all numbers appear 3 times except one that appears twice) breaks the algorithm will not help you.
Take a Map with int-keys and int-values. Then walk through your numbers and for each number x increase each the according value. If x is a new key take 0 as start value.
Then you can analyze it easily: Walk through all keys and check the cardinality. It should be three for all keys except one that should be two. This is more robust and my gut feeling says it is also faster.

Plese help me. Matlab 'while' loop. A set of data is given

Now I calculate the value of F from equation. From the F that I determined, I need to find the diameter from set of data.
THe method is like this. If my F value is smaller than the value of F from the data, then I choose the corresponding diameter.
For example. The value of F that I calculated is 11. The value of F from the data is 8, 10, 12, 14, 16. The first value is 8, final value is 16, and its corresponding diameter is 1,2,3,4,5.
11 is bigger than 8, the first value of the data, hence we move to the next F on the data. Again, 11 is bigger 10, we move to the next F.
But 11 is less than 12, the iteration is stop. We need not look further. We take the diameter of 12, which is 3.
You get the idea.
And for the set of data. Here's the code. The while (Fa==0) function is the condition that I apply in order to perform this question.
while Fa==0
load data.dat;
diameter=data(:,1);
F=data(:,2);
I'm stuck at that.
Please help me
Here is how I understand your problem: You have a dataset from which you get a list of values F. Now you also calculate a single value Fc, and you want to find the element Fe in the list, which satisfies the two conditions
closest to Fc
Fc < Fe
One way to achieve this is by the following
F = [1 2 3 4.5 5 6 7 8];
Fc = 4;
sort(F);
for i=1:length(F)
if(Fc<F(i))
Fe = F(i);
break
end
end
This gives 4.5. From this value of Fe you can find the desired diameter.

Evenly distributed random numbers relatively prime to 2

A specific example
I need to generate a random number between 0 and 2, inclusive. (or choose randomly between -1, 0, and 1).
The naive approach would be to do something like rand() mod 3 where rand() returns an integer. This approach will not generate statistically random numbers unless the upper bound of rand() is not relatively prime (and the lower bound is 0).
For instance, assuming rand() returned 2 bits (from 0 to 3, inclusive), the modulus would map:
0 -> 0
1 -> 1
2 -> 2
3 -> 0
This skew toward 0 would obviously be much less if more bits would be returned, but regardless, the skew would remain.
The generic question
Is there a way of generating an evenly distributed random number between 0 and n-1, inclusive, where n is relatively prime to 2?
A common approach is to discard random values above the last full cycle, and just ask for a new random number.
It might help choosing your rand() upper bound to be k*n where k is an integer. This way the outcome will be evenly distributed provided that rand() is a good random generator.
If it's not possible to reduce the upper bound, you can pick k so that k*n is as close to rand() upper bound as possible and discard the results above this number trying again.
See my answer to a similar question.
Basically, use your RNG and discard everything above N and try again. For optimization, you can use mod, and discard everything above n * floor(MAX / n)
Generic Answer: You need to use more than just 2 bits of the number.
My rule of thumb is to generate floating-point values, x, 0.0 <= x < 1.0, multiply by 3 and truncate. That should get you values in the range 0, 1 and 2 that depend on a larger number of bits.