I’m trying to create a function which creates an index (starting at 100) and then adjust this index according to the results of investments. So, in a nutshell, if the first investment gives an profit of 5%, then the index will stand 105, if the second result is -7%, then the index stands at 97.65. In this question when I use the word "index", I'm not referring to the index function of the zoo package.
Besides creating this index, my goal is also to create an function which can be applied to various subsets of my complete data set (i.e. with the use of sapply and it's friends).
Here’s the function which I have so far (data at end of this question):
CalculateIndex <- function(x){
totalAccount <- accountValueStart
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
indexedValues <- 100 + ( 100 *((((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100))
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
else{ # the value is not the first
indexedValues <- c(indexedValues,
indexedValues[-1] + (indexedValues[-1] *(((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100)
)
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
return(indexedValues)
}
In words the function does (read: is intended to do) the following:
If the value is the first, use 100 as an starting point for the index. If the value is not the first, use the previous calculated index value as the starting point for calculating the new index value. Besides this, the function also takes the weight of the individual result (compared with the totalAccount value) into account.
The problem:
Using this CalculateIndex function on the theData data frame gives the following incorrect output:
> CalculateIndex(theData)
[1] 99.97901 99.94180 99.65632 101.88689 100.89309 98.92878 102.02911 100.49159 98.52955 102.02243 98.43655 100.76502 99.34869 100.76401 101.18014 99.75136 97.90130
[18] 100.39935 99.81311 101.34961
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
Edit:
Wow, I already got an vote down, though I thought my question was already too long. Sorry, I thought/think the problem lay inside my loop, so I didn't want to bore you with the details, which I thought would only give less answers. Sorry, misjudgement on my part.
The problem is, with the above output from CalculateIndex, that the results are wildly different from Excel. Even though this could be resulting from rounding errors (as Joris mentions below), I doubt it. In comparison with the Excel results, the R results differ quite some:
R output Excel calculate values
99,9790085700 99,97900857
99,9418035700 99,92081189
99,6563228600 99,57713687
101,8868850000 101,4639947
100,8930864300 102,3570786
98,9287771400 101,2858564
102,0291071400 103,3149664
100,4915864300 103,806556
98,5295542900 102,3361186
102,0224285700 104,3585552
98,4365550000 102,795089
100,7650171400 103,5601228
99,3486857100 102,9087897
100,7640057100 103,6728077
101,1801400000 104,8529634
99,7513600000 104,6043164
97,9013000000 102,5055298
100,3993485700 102,9048999
99,8131085700 102,7179995
101,3496071400 104,0676555
I think it would be fair to say that the difference in output isn't the result of R versus Excel problems, but more an error in my function. So, let's focus on the function.
The manual calculation of the function
The function uses different variables:
Size.Units.; this is the number of units which are bought at the EntryPrice.
EntryPrice: the price at which the stocks are bought,
TradeResult.Percent.: the percentage gain or loss resulting from the investment,
TradeResult.Currency.: the currency value ($) of the gain or loss resulting from the investment,
These variables are used in the following section of the function:
100 + ( 100 *((((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100))
and
indexedValues[-1] + (indexedValues[-1] *(((x$Size.Units. * x$EntryPrice) / totalAccount) * x$TradeResult.Percent.) / 100)
Both of the formula's are essentially the same, with the difference that the the first starts at 100, and the second uses the previous value to calculate the new indexed value.
The formula can be broken down in different steps:
First, x$Size.Units. * x$EntryPrice determines the total position that was taken, in the sense that buying 100 shares at an price of 48.98 gives an position of $4898.
The resulting total position is then divided by the total account size (i.e. totalAccount). This is needed to correct the impact of one position relative to the complete portfolio. For example, if our 100 shares bought at 48.98 drop 10 percent, the calculated index (i.e. the CalculateIndex function) doesn't have to drop 10%, because off course not all the money in totalAccount is invested in one stock. So, by dividing the total position by the totalAccount we get an ratio which tells us how much money is invested. For example, the position with the size of 4898 dollar (on a total account of 14000) results in a total account loss of 3.49% if the stock drops 10%. (i.e. 4898 / 14000 = 0.349857. 0.349857 * 10% = 3.49857%)
This ratio (of invested amount versus total amount) is then in the formula multiplied with x$TradeResult.Percent., so to get the percentage impact on the total account (see calculation example in the previous paragraph).
As an last step, the percentage loss on the total account is applied to the index value (which starts at 100). In this case, the first investment in 100 shares bought at 48.89 dollar let's the index drop from it starting point at 100 to 99.97901, reflecting the losing trade's impact on the total account.
End of Edit
Stripping the function clean and then adding a part of the formula at a time, so to uncover the error, I came to the following step where the error seems to reside:
CalculateIndex <- function(x){
totalAccount <- accountValueStart
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
indexedValues <- totalAccount
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
else{ # the value is not the first
indexedValues <- c(indexedValues, totalAccount)
# Update the accountvalue
totalAccount <- totalAccount + x$TradeResult.Currency.
}
return(indexedValues)
}
> CalculateIndex(theData)
[1] 14000
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
So, it seems that if I just use the totalAccount variable, the function doesn’t get updated correctly. This seems to suggest there is some error with the basics of the if else statement, because it only outputs the first value.
If I remove the else statement from the function, I do get values for each of the rows in theData. However, these are then wrongly calculated. So, it seems to me that there is some error in how this function updates the totalAccount variable. I don’t see where I made an error, so any suggestion would be highly appreciated. What am I doing wrong?
The Data
Here’s what my data looks like:
> theData
Size.Units. EntryPrice TradeResult.Percent. TradeResult.Currency.
1 100 48.98 -0.06 -3
11 100 32.59 -0.25 -8
12 100 32.51 -1.48 -48
2 100 49.01 5.39 264
13 100 32.99 3.79 125
14 100 34.24 -4.38 -150
3 100 51.65 5.50 284
4 100 48.81 1.41 69
15 100 35.74 -5.76 -206
5 100 49.50 5.72 283
6 100 46.67 -4.69 -219
16 100 33.68 3.18 107
7 100 44.48 -2.05 -91
17 100 32.61 3.28 107
8 100 45.39 3.64 165
9 100 47.04 -0.74 -35
10 100 47.39 -6.20 -294
18 100 33.68 1.66 56
19 100 33.12 -0.79 -26
20 100 32.86 5.75 189
theData <- structure(list(X = c(1L, 11L, 12L, 2L, 13L, 14L, 3L, 4L, 15L,
5L, 6L, 16L, 7L, 17L, 8L, 9L, 10L, 18L, 19L, 20L), Size.Units. = c(100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L), EntryPrice = c(48.98,
32.59, 32.51, 49.01, 32.99, 34.24, 51.65, 48.81, 35.74, 49.5,
46.67, 33.68, 44.48, 32.61, 45.39, 47.04, 47.39, 33.68, 33.12,
32.86), TradeResult.Percent. = c(-0.06, -0.25, -1.48, 5.39, 3.79,
-4.38, 5.5, 1.41, -5.76, 5.72, -4.69, 3.18, -2.05, 3.28, 3.64,
-0.74, -6.2, 1.66, -0.79, 5.75), TradeResult.Currency. = c(-3L,
-8L, -48L, 264L, 125L, -150L, 284L, 69L, -206L, 283L, -219L,
107L, -91L, 107L, 165L, -35L, -294L, 56L, -26L, 189L)), .Names = c("X",
"Size.Units.", "EntryPrice", "TradeResult.Percent.", "TradeResult.Currency."
), class = "data.frame", row.names = c(NA, -20L))
# Set the account start # 14000
> accountValueStart <- 14000
Your code looks very strange, and it seems you have a lot of misconceptions about R that come from another programming language. Gavin and Gillespie pointed out already why you get the warniong. Let me add some tips for far more optimal coding:
[-1] does NOT mean: drop the last one. It means "keep everything but the first value", which also explains why you get erroneous results.
calculate common things in the beginning, to unclutter your code.
head(x$TradeResult.Currency., n = 1) is the same as x$TradeResult.Currency.[1].
Keep an eye on your vectors. Most of the mistakes in your code come from forgetting you're working with vectors.
If you need a value to be the first in a vector, put that OUTSIDE of any loop you'd use, never add an if-clause in the function.
predefine your vectors/matrices as much as possible, that goes a lot faster and gives less memory headaches when working with big data.
vectorization, vectorization, vectorization. Did I mention vectorization?
Learn the use of debug(), debugonce() and browser() to check what your function is doing. Many of your problems could have been solved by checking the objects when manipulated within the function.
This said and taken into account, your function becomes :
CalculateIndex <- function(x,accountValueStart){
# predifine your vector
indexedValues <- vector("numeric",nrow(x))
# get your totalAccount calculated FAST. This is a VECTOR!!!
totalAccount <- cumsum(c(accountValueStart,x$TradeResult.Currency.))
#adjust length:
totalAccount <- totalAccount[-(nrow(x)+1)]
# only once this calculation. This is a VECTOR!!!!
totRatio <- 1+(((x$Size.Units. * x$EntryPrice)/totalAccount) *
x$TradeResult.Percent.)/100
# and now the calculations
indexedValues[1] <- 100 * totRatio[1]
for(i in 2:nrow(x)){
indexedValues[i] <- indexedValues[i-1]*totRatio[i]
}
return(indexedValues)
}
and returns
> CalculateIndex(theData,14000)
[1] 99.97901 99.92081 99.57714 101.46399 102.35708 101.28586 103.31497
103.80656 102.33612 104.35856 102.79509 103.56012
[13] 102.90879 103.67281 104.85296 104.60432 102.50553 102.90490 102.71800
104.06766
So now you do:
invisible(replicate(10,print("I will never forget about vectorization any more!")))
The warning message is coming from this line:
if(x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)){
It is easy to see why; x$TradeResult.Currency is a vector and thus the comparison with head(x$TradeResult.Currency., n = 1) yields a vector of logicals. (By the way, why not x$TradeResult.Currency[1] instead of the head() call?). if() requires a single logical not a vector of logicals, and that is what the warning is about. ifelse() is useful if you want to do one of two things depending upon a condition that gives a vector of logicals.
In effect, what you are doing is only entering the if() part of the statement and it gets executed once only, because the first element of x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1) is TRUE and R ignores the others.
> if(c(TRUE, FALSE)) {
+ print("Hi")
+ } else {
+ print("Bye")
+ }
[1] "Hi"
Warning message:
In if (c(TRUE, FALSE)) { :
the condition has length > 1 and only the first element will be used
> ifelse(c(TRUE, FALSE), print("Hi"), print("Bye"))
[1] "Hi"
[1] "Bye"
[1] "Hi" "Bye"
As to solving your real problem:
CalculateIndex2 <- function(x, value, start = 100) {
rowSeq <- seq_len(NROW(x))
totalAc <- cumsum(c(value, x$TradeResult.Currency.))[rowSeq]
idx <- numeric(length = nrow(x))
interm <- (((x$Size.Units. * x$EntryPrice) / totalAc) *
x$TradeResult.Percent.) / 100
for(i in rowSeq) {
idx[i] <- start + (start * interm[i])
start <- idx[i]
}
idx
}
which when used on theData gives:
> CalculateIndex2(theData, 14000)
[1] 99.97901 99.92081 99.57714 101.46399 102.35708 101.28586 103.31497
[8] 103.80656 102.33612 104.35856 102.79509 103.56012 102.90879 103.67281
[15] 104.85296 104.60432 102.50553 102.90490 102.71800 104.06766
What you want is a recursive function (IIRC); the current index is some function of the previous index. These are hard to solve in a vectorised way in R, hence the loop.
I'm still slightly confused as to what exactly you want to do, but hopefully the following will be helpful.
Your R script gives the same answers as your Excel function for the first value. You see a difference because R doesn't print out all digits.
> tmp = CalculateIndex(thedata)
Warning message:
In if (x$TradeResult.Currency == head(x$TradeResult.Currency., n = 1)) { :
the condition has length > 1 and only the first element will be used
> print(tmp, digits=10)
[1] 99.97900857 99.94180357 99.65632286 101.88688500 100.89308643
<snip>
The reason for the warning message is because x$TradeResult.Currency is a vector that is being compared to a single number.
That warning message is also where your bug lives. In your if statement, you never execute the else part, since only the value of x$TradeResult.Currency is being used. As the warning message states, only the first element of x$TradeResult.Currency is being used.