What's the correct way to expand a [0,1] interval to [a,b]? - language-agnostic

Many random-number generators return floating numbers between 0 and 1.
What's the best and correct way to get integers between a and b?

Divide the interval [0,1] in B-A+1 bins
Example A=2, B=5
[----+----+----+----]
0 1/4 1/2 3/4 1
Maps to 2 3 4 5
The problem with the formula
Int (Rnd() * (B-A+1)) + A
is that your Rnd() generation interval is closed on both sides, thus the 0 and the 1 are both possible outputs and the formula gives 6 when the Rnd() is exactly 1.
In a real random distribution (not pseudo), the 1 has probability zero. I think it is safe enough to program something like:
r=Rnd()
if r equal 1
MyInt = B
else
MyInt = Int(r * (B-A+1)) + A
endif
Edit
Just a quick test in Mathematica:
Define our function:
f[a_, b_] := If[(r = RandomReal[]) == 1, b, IntegerPart[r (b - a + 1)] + a]
Build a table with 3 10^5 numbers in [1,100]:
table = SortBy[Tally[Table[f[1, 100], {300000}]], First]
Check minimum and maximum:
In[137]:= {Max[First /# table], Min[First /# table]}
Out[137]= {100, 1}
Lets see the distribution:
BarChart[Last /# SortBy[Tally[Table[f[1, 100], {300000}]], First],
ChartStyle -> "DarkRainbow"]

X = (Rand() * (B - A)) + A

Another way to look at it, where r is your random number in the range 0 to 1:
(1-r)a + rb
As for your additional requirement of the result being an integer, maybe (apart from using built in casting) the modulus operator can help you out. Check out this question and the answer:
Expand a random range from 1–5 to 1–7

Well, why not just look at how Python does it itself? Read random.py in your installation's lib directory.
After gutting it to only support the behavior of random.randint() (which is what you want) and removing all error checks for non-integer or out-of-bounds arguments, you get:
import random
def randint(start, stop):
width = stop+1 - start
return start + int(random.random()*width)
Testing:
>>> l = []
>>> for i in range(2000000):
... l.append(randint(3,6))
...
>>> l.count(3)
499593
>>> l.count(4)
499359
>>> l.count(5)
501432
>>> l.count(6)
499616
>>>

Assuming r_a_b is the desired random number between a and b and r_0_1 is a random number between 0 and 1 the following should work just fine:
r_a_b = (r_0_1 * (b-a)) + a

Related

Elixir: How to get bit_size of an Integer variable?

I need to get the size of bits used in one Integer variable.
like this:
bit_number = 1
bit_number = bit_number <<< 2
bit_size(bit_number) # must return 3 here
the bit_size/1 function is for 'strings', not for integers but, in the exercise, whe need to get the size of bits of the integer.
I'm doing one exercise of compression of an book (Classic Computer Science Problems in Python, of Daivid Kopec) and I'm trying to do in Elixir for study.
This works:
(iex) import Bitwise
(iex) Integer.digits(1 <<< 1, 2) |> length
2
but I'm sure there are better solutions.
(as #Hauleth mentions, the answer here should be 2, not 3)
You can count how many times you can divide it by two:
defmodule Example do
def bits_required(0), do: 1
def bits_required(int), do: bits_required(int, 1)
defp bits_required(1, acc), do: acc
defp bits_required(int, acc), do: bits_required(div(int, 2), acc + 1)
end
Output:
iex> Example.bits_required(4)
3

Summing binary numbers representing fractions in Sagemath

I'm just starting to learn how to code in Sagemath, I know it's similar to python but I don't have much experience with that either.
I'm trying to add two binary numbers representing fractions. That is, something like
a = '110'
b = '011'
bin(int(a,2) + int(b,2))
But using values representing fractions, such as '1.1'.
Thanks in advance!
If you want to do this in vanilla Python, parsing the binary fractions by hand isn't too bad (the first part being from this answer);
def binstr_to_float(s):
t = s.split('.')
return int(t[0], 2) + int(t[1], 2) / 2.**len(t[1])
def float_to_binstr(f):
i = 0
while int(f) != f:
f *= 2
i += 1
as_str = str(bin(int(f)))
if i == 0:
return as_str[2:]
return as_str[2:-i] + '.' + as_str[-i:]
float_to_binstr(parse_bin('11.1') + parse_bin('0.111')) # is '100.011'
In python you can use the Binary fractions package. With this package you can convert binary-fraction strings into floats and vice-versa. Then, you can perform operations on them.
Example:
>>> from binary_fractions import Binary
>>> sum = Binary("1.1") + Binary("10.01")
>>> str(sum)
'0b11.11'
>>> float(sum)
3.75
>>>
It has many more helper functions to manipulate binary strings such as: shift, add, fill, to_exponential, invert...
PS: Shameless plug, I'm the author of this package.

Why am I receiving this error message "UnboundLocalError: local variable 'sigma_opt' referenced before assignment"

I have a dataset that follows a lognormal distribution. If I plot the y-values against the x-values on a semilog-x axis, the distribution will appear Gaussian. Similarly, if I sort the logarithm of every value in my dataset and plot them against a domain of log(x), the distribution will appear Gaussian (but nicer due to wider linear spacing of log(x) values on the domain). My code attempts to minimize chi square of the dataset in the three representations above by optimizing the parameters mu and sigma (since the average of the lognormal distribution does not equal the average of the normal distribution). My issue is not the chi square minimization (works for 2/3 of these representations), but rather the syntax in one specific part of my code.
To simplify the code, I use a function argument pickdist to denote which distribution is being dealt with. In the code below, 2 denotes the y vs semilog(x) representation, 3 denotes the y vs log(x) representation, optpar2 and optpar3 are parameters calculated previously from the code (not shown) and represent the optimized values of mu and sigma for the distributions.
def distribGS(pickdist, x):
if pickdist == 2:
mu_opt, sigma_opt = optpar2
elif pickdist == 3:
mu_opt, sigma_opt = optpar3
cnorm = 1/ ( sigma_opt * (2 * pi)**(1/2) )
return [(( cnorm * exp( (-1) * (x[index] - mu_opt)**2 / ( 2 * (sigma_opt **2) ) ) )) for index in range(len(x))]
The reason for this attempt at code is to plot this fit of data against the (normalized) histogram of the actual data. However, I am getting an error when I run the code that reads:
UnboundLocalError: local variable 'sigma_opt' referenced before assignment
I find this weird because sigma_opt is only defined inside of a few functions but is not defined globally. I've read other posts on SO that deal with this error message, but none apply to my case. Why am I receiving this error message? (I would post the whole code but it's 350+ lines)
The reason you are getting this error because if you call 'distribGS' function with 'pickdist' attribute different than 2 or 3 'sigma_opt' variable becomes used without assignment first.
What you can do is assign 'sigma_opt' variable in the beginning of your function to some default value, or use 'else' statement to assign default value to it.
For example
def distribGS(pickdist, x):
mu_opt, sigma_opt = 0
if pickdist == 2:
mu_opt, sigma_opt = optpar2
elif pickdist == 3:
mu_opt, sigma_opt = optpar3
cnorm = 1/ ( sigma_opt * (2 * pi)**(1/2) )
return [(( cnorm * exp( (-1) * (x[index] - mu_opt)**2 / ( 2 * (sigma_opt **2) ) ) )) for index in range(len(x))

Egg dropping in worst case

I have been trying to write an algorithm to compute the maximum number or trials required in worst case, in the egg dropping problem. Here is my python code
def eggDrop(n,k):
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
for i in range(1, n+1):
eggFloor[i][1] = 1
eggFloor[i][0] = 0
for j in range(1, k+1):
eggFloor[1][j] = j
for i in range (2, n+1):
for j in range (2, k+1):
eggFloor[i][j] = 'infinity'
for x in range (1, j + 1):
res = 1 + max(eggFloor[i-1][x-1], eggFloor[i][j-x])
if res < eggFloor[i][j]:
eggFloor[i][j] = res
return eggFloor[n][k]print eggDrop(2, 100)
```
The code is outputting a value of 7 for 2eggs and 100floors, but the answer should be 14, i don't know what mistake i have made in the code. What is the problem?
The problem is in this line:
eggFloor=[ [0 for i in range(k+1) ] ]* (n+1)
You want this to create a list containing (n+1) lists of (k+1) zeroes. What the * (n+1) does is slightly different - it creates a list containing (n+1) copies of the same list.
This is an important distinction - because when you start modifying entries in the list - say,
eggFloor[i][1] = 1
this actually changes element [1] of all of the lists, not just the ith one.
To instead create separate lists that can be modified independently, you want something like:
eggFloor=[ [0 for i in range(k+1) ] for j in range(n+1) ]
With this modification, the program returns 14 as expected.
(To debug this, it might have been a good idea to write out a function to pring out the eggFloor array, and display it at various points in your program, so you can compare it with what you were expecting. It would soon become pretty clear what was going on!)

Custom function to create an index of results

I’m trying to create a function which creates an index (starting at 100) and then adjust this index according to the results of investments. So, in a nutshell, if the first investment gives an profit of 5%, then the index will stand 105, if the second result is -7%, then the index stands at 97.65. In this question when I use the word "index", I'm not referring to the index function of the zoo package.
Besides creating this index, my goal is also to create an function which can be applied to various subsets of my complete data set (i.e. with the use of sapply and it's friends).
Here’s the function which I have so far (data at end of this question):
CalculateIndex <- function(x){
totalAccount <- accountValueStart
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
indexedValues <- 100 + ( 100 *((((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100))
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
else{ # the value is not the first
indexedValues <- c(indexedValues,
indexedValues[-1] + (indexedValues[-1] *(((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100)
)
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
return(indexedValues)
}
In words the function does (read: is intended to do) the following:
If the value is the first, use 100 as an starting point for the index. If the value is not the first, use the previous calculated index value as the starting point for calculating the new index value. Besides this, the function also takes the weight of the individual result (compared with the totalAccount value) into account.
The problem:
Using this CalculateIndex function on the theData data frame gives the following incorrect output:
> CalculateIndex(theData)
[1] 99.97901 99.94180 99.65632 101.88689 100.89309 98.92878 102.02911 100.49159 98.52955 102.02243 98.43655 100.76502 99.34869 100.76401 101.18014 99.75136 97.90130
[18] 100.39935 99.81311 101.34961
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
Edit:
Wow, I already got an vote down, though I thought my question was already too long. Sorry, I thought/think the problem lay inside my loop, so I didn't want to bore you with the details, which I thought would only give less answers. Sorry, misjudgement on my part.
The problem is, with the above output from CalculateIndex, that the results are wildly different from Excel. Even though this could be resulting from rounding errors (as Joris mentions below), I doubt it. In comparison with the Excel results, the R results differ quite some:
R output Excel calculate values
99,9790085700 99,97900857
99,9418035700 99,92081189
99,6563228600 99,57713687
101,8868850000 101,4639947
100,8930864300 102,3570786
98,9287771400 101,2858564
102,0291071400 103,3149664
100,4915864300 103,806556
98,5295542900 102,3361186
102,0224285700 104,3585552
98,4365550000 102,795089
100,7650171400 103,5601228
99,3486857100 102,9087897
100,7640057100 103,6728077
101,1801400000 104,8529634
99,7513600000 104,6043164
97,9013000000 102,5055298
100,3993485700 102,9048999
99,8131085700 102,7179995
101,3496071400 104,0676555
I think it would be fair to say that the difference in output isn't the result of R versus Excel problems, but more an error in my function. So, let's focus on the function.
The manual calculation of the function
The function uses different variables:
Size.Units.; this is the number of units which are bought at the EntryPrice.
EntryPrice: the price at which the stocks are bought,
TradeResult.Percent.: the percentage gain or loss resulting from the investment,
TradeResult.Currency.: the currency value ($) of the gain or loss resulting from the investment,
These variables are used in the following section of the function:
100 + ( 100 *((((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100))
and
indexedValues[-1] + (indexedValues[-1] *(((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100)
Both of the formula's are essentially the same, with the difference that the the first starts at 100, and the second uses the previous value to calculate the new indexed value.
The formula can be broken down in different steps:
First, x$Size.Units. * x$EntryPrice determines the total position that was taken, in the sense that buying 100 shares at an price of 48.98 gives an position of $4898.
The resulting total position is then divided by the total account size (i.e. totalAccount). This is needed to correct the impact of one position relative to the complete portfolio. For example, if our 100 shares bought at 48.98 drop 10 percent, the calculated index (i.e. the CalculateIndex function) doesn't have to drop 10%, because off course not all the money in totalAccount is invested in one stock. So, by dividing the total position by the totalAccount we get an ratio which tells us how much money is invested. For example, the position with the size of 4898 dollar (on a total account of 14000) results in a total account loss of 3.49% if the stock drops 10%. (i.e. 4898 / 14000 = 0.349857. 0.349857 * 10% = 3.49857%)
This ratio (of invested amount versus total amount) is then in the formula multiplied with x$TradeResult.Percent., so to get the percentage impact on the total account (see calculation example in the previous paragraph).
As an last step, the percentage loss on the total account is applied to the index value (which starts at 100). In this case, the first investment in 100 shares bought at 48.89 dollar let's the index drop from it starting point at 100 to 99.97901, reflecting the losing trade's impact on the total account.
End of Edit
Stripping the function clean and then adding a part of the formula at a time, so to uncover the error, I came to the following step where the error seems to reside:
CalculateIndex <- function(x){
totalAccount <- accountValueStart
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
indexedValues <- totalAccount
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
else{ # the value is not the first
indexedValues <- c(indexedValues, totalAccount)
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
return(indexedValues)
}
> CalculateIndex(theData)
[1] 14000
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
So, it seems that if I just use the totalAccount variable, the function doesn’t get updated correctly. This seems to suggest there is some error with the basics of the if else statement, because it only outputs the first value.
If I remove the else statement from the function, I do get values for each of the rows in theData. However, these are then wrongly calculated. So, it seems to me that there is some error in how this function updates the totalAccount variable. I don’t see where I made an error, so any suggestion would be highly appreciated. What am I doing wrong?
The Data
Here’s what my data looks like:
> theData
Size.Units. EntryPrice TradeResult.Percent. TradeResult.Currency.
1 100 48.98 -0.06 -3
11 100 32.59 -0.25 -8
12 100 32.51 -1.48 -48
2 100 49.01 5.39 264
13 100 32.99 3.79 125
14 100 34.24 -4.38 -150
3 100 51.65 5.50 284
4 100 48.81 1.41 69
15 100 35.74 -5.76 -206
5 100 49.50 5.72 283
6 100 46.67 -4.69 -219
16 100 33.68 3.18 107
7 100 44.48 -2.05 -91
17 100 32.61 3.28 107
8 100 45.39 3.64 165
9 100 47.04 -0.74 -35
10 100 47.39 -6.20 -294
18 100 33.68 1.66 56
19 100 33.12 -0.79 -26
20 100 32.86 5.75 189
theData <- structure(list(X = c(1L, 11L, 12L, 2L, 13L, 14L, 3L, 4L, 15L,
5L, 6L, 16L, 7L, 17L, 8L, 9L, 10L, 18L, 19L, 20L), Size.Units. = c(100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L), EntryPrice = c(48.98,
32.59, 32.51, 49.01, 32.99, 34.24, 51.65, 48.81, 35.74, 49.5,
46.67, 33.68, 44.48, 32.61, 45.39, 47.04, 47.39, 33.68, 33.12,
32.86), TradeResult.Percent. = c(-0.06, -0.25, -1.48, 5.39, 3.79,
-4.38, 5.5, 1.41, -5.76, 5.72, -4.69, 3.18, -2.05, 3.28, 3.64,
-0.74, -6.2, 1.66, -0.79, 5.75), TradeResult.Currency. = c(-3L,
-8L, -48L, 264L, 125L, -150L, 284L, 69L, -206L, 283L, -219L,
107L, -91L, 107L, 165L, -35L, -294L, 56L, -26L, 189L)), .Names = c("X",
"Size.Units.", "EntryPrice", "TradeResult.Percent.", "TradeResult.Currency."
), class = "data.frame", row.names = c(NA, -20L))
# Set the account start # 14000
> accountValueStart <- 14000
Your code looks very strange, and it seems you have a lot of misconceptions about R that come from another programming language. Gavin and Gillespie pointed out already why you get the warniong. Let me add some tips for far more optimal coding:
[-1] does NOT mean: drop the last one. It means "keep everything but the first value", which also explains why you get erroneous results.
calculate common things in the beginning, to unclutter your code.
head(x$TradeResult.Currency., n = 1) is the same as x$TradeResult.Currency.[1].
Keep an eye on your vectors. Most of the mistakes in your code come from forgetting you're working with vectors.
If you need a value to be the first in a vector, put that OUTSIDE of any loop you'd use, never add an if-clause in the function.
predefine your vectors/matrices as much as possible, that goes a lot faster and gives less memory headaches when working with big data.
vectorization, vectorization, vectorization. Did I mention vectorization?
Learn the use of debug(), debugonce() and browser() to check what your function is doing. Many of your problems could have been solved by checking the objects when manipulated within the function.
This said and taken into account, your function becomes :
CalculateIndex <- function(x,accountValueStart){
# predifine your vector
indexedValues <- vector("numeric",nrow(x))
# get your totalAccount calculated FAST. This is a VECTOR!!!
totalAccount <- cumsum(c(accountValueStart,x$TradeResult.Currency.))
#adjust length:
totalAccount <- totalAccount[-(nrow(x)+1)]
# only once this calculation. This is a VECTOR!!!!
totRatio <- 1+(((x$Size.Units. * x$EntryPrice)/totalAccount) *
x$TradeResult.Percent.)/100
# and now the calculations
indexedValues[1] <- 100 * totRatio[1]
for(i in 2:nrow(x)){
indexedValues[i] <- indexedValues[i-1]*totRatio[i]
}
return(indexedValues)
}
and returns
> CalculateIndex(theData,14000)
[1] 99.97901 99.92081 99.57714 101.46399 102.35708 101.28586 103.31497
103.80656 102.33612 104.35856 102.79509 103.56012
[13] 102.90879 103.67281 104.85296 104.60432 102.50553 102.90490 102.71800
104.06766
So now you do:
invisible(replicate(10,print("I will never forget about vectorization any more!")))
The warning message is coming from this line:
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
It is easy to see why; x$TradeResult.Currency is a vector and thus the comparison with head(x$TradeResult.Currency., n = 1) yields a vector of logicals. (By the way, why not x$TradeResult.Currency[1] instead of the head() call?). if() requires a single logical not a vector of logicals, and that is what the warning is about. ifelse() is useful if you want to do one of two things depending upon a condition that gives a vector of logicals.
In effect, what you are doing is only entering the if() part of the statement and it gets executed once only, because the first element of x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1) is TRUE and R ignores the others.
> if(c(TRUE, FALSE)) {
+ print("Hi")
+ } else {
+ print("Bye")
+ }
[1] "Hi"
Warning message:
In if (c(TRUE, FALSE)) { :
the condition has length > 1 and only the first element will be used
> ifelse(c(TRUE, FALSE), print("Hi"), print("Bye"))
[1] "Hi"
[1] "Bye"
[1] "Hi" "Bye"
As to solving your real problem:
CalculateIndex2 <- function(x, value, start = 100) {
rowSeq <- seq_len(NROW(x))
totalAc <- cumsum(c(value, x$TradeResult.Currency.))[rowSeq]
idx <- numeric(length = nrow(x))
interm <- (((x$Size.Units. * x$EntryPrice) / totalAc) *
x$TradeResult.Percent.) / 100
for(i in rowSeq) {
idx[i] <- start + (start * interm[i])
start <- idx[i]
}
idx
}
which when used on theData gives:
> CalculateIndex2(theData, 14000)
[1] 99.97901 99.92081 99.57714 101.46399 102.35708 101.28586 103.31497
[8] 103.80656 102.33612 104.35856 102.79509 103.56012 102.90879 103.67281
[15] 104.85296 104.60432 102.50553 102.90490 102.71800 104.06766
What you want is a recursive function (IIRC); the current index is some function of the previous index. These are hard to solve in a vectorised way in R, hence the loop.
I'm still slightly confused as to what exactly you want to do, but hopefully the following will be helpful.
Your R script gives the same answers as your Excel function for the first value. You see a difference because R doesn't print out all digits.
> tmp = CalculateIndex(thedata)
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
> print(tmp, digits=10)
[1] 99.97900857 99.94180357 99.65632286 101.88688500 100.89308643
<snip>
The reason for the warning message is because x$TradeResult.Currency is a vector that is being compared to a single number.
That warning message is also where your bug lives. In your if statement, you never execute the else part, since only the value of x$TradeResult.Currency is being used. As the warning message states, only the first element of x$TradeResult.Currency is being used.