xtable() changes hms output to number when printing in HTML [duplicate] - html

I have the following data:
transaction <- c(1,2,3);
date <- c("2010-01-31","2010-02-28","2010-03-31");
type <- c("debit", "debit", "credit");
amount <- c(-500, -1000.97, 12500.81);
oldbalance <- c(5000, 4500, 17000.81)
evolution <- data.frame(transaction, date, type, amount, oldbalance, row.names=transaction, stringsAsFactors=FALSE);
evolution <- transform(evolution, newbalance = oldbalance + amount);
evolution
Running
> library(xtable)
> xtable(evolution)
works fine. But if I add the line
evolution$date <- as.Date(evolution$date, "%Y-%m-%d");
to give
transaction <- c(1,2,3);
date <- c("2010-01-31","2010-02-28","2010-03-31");
type <- c("debit", "debit", "credit");
amount <- c(-500, -1000.97, 12500.81);
oldbalance <- c(5000, 4500, 17000.81)
evolution <- data.frame(transaction, date, type, amount, oldbalance, row.names=transaction, stringsAsFactors=FALSE);
evolution$date <- as.Date(evolution$date, "%Y-%m-%d");
evolution <- transform(evolution, newbalance = oldbalance + amount);
evolution
then running xtable gives
xtable(evolution)
Error in Math.Date(x + ifelse(x == 0, 1, 0)) :
abs not defined for Date objects
But it can be useful to use xtable in such a case to do some filtering of dates
evolution$date <- as.Date(evolution$date, "%Y-%m-%d")
startdate <-as.Date("2010-02-01");
enddate <-as.Date("2010-03-30");
newdate <-evolution[which (evolution$date >= startdate & evolution$date <= enddate),]
newdate
> newdate
transaction date type amount oldbalance newbalance
2 2 2010-02-28 debit -1000.97 4500 3499.03
> xtable(newdate)
Error in Math.Date(x + ifelse(x == 0, 1, 0)) :
abs not defined for Date objects

This is arguably a bug in xtable - you may want to report it to the maintainer.
A temporary work-around is to call as.character() on the classes that xtable misinterprets (apart from "Date" I can think of "POSIXt" but there may be others), e.g.:
xtable <- function(x, ...) {
for (i in which(sapply(x, function(y) !all(is.na(match(c("POSIXt","Date"),class(y))))))) x[[i]] <- as.character(x[[i]])
xtable::xtable(x, ...)
}

It does appear that xtable does not always play nicely with columns of class Date. (It does have zoo and ts methods, but those may not help if you have a single column of dates/times in a data frame, as coercion to zoo appears to alter the column names in the resulting table.) A few notes:
The error is actually being thrown by print.xtable, (not xtable.data.frame), which is called by default in order to display the results of xtable in the console. So you'd find that if you stored the results of xtable in a variable, you'd get no error, but then when you tried to print it, the same error would pop up.
Since you've wisely stored your dates in YYYY-MM-DD format, converting them to Date objects actually isn't necessary to use ordered selections, since they will sort properly as characters. So you could actually get away with simply keeping them as characters.
In cases with more complex date/time objects you could do the subsetting first and then convert those columns to characters. Or create a wrapper for xtable.data.frame and add the lines at the beginning,
dates <- sapply(x,FUN = function(x){class(x) == "Date"})
x[,dates] <- as.character(x[,dates])
checking for class Date, or whatever class you're dealing with.
IMHO, xtable.data.frame should probably be checking for Dates, and possibly for other POSIX classes as well and converting them to strings as well. This may be a simple change, and may be worth contacting the package author about.
Lastly, the semicolons as line terminators are not necessary. :) Habit from another language?

As the maintainer of xtable I would like to state what I see as the true position regarding dates in xtable.
This is not really a bug, but the absence of a feature you might think is desirable.
The problem is that xtable only can deal with three different classes of columns: logical; character; and numeric. If you try to submit a table where the class of a column is Date, then it cannot deal with it. The relevant code is the set of xtable methods, the most important of which are xtable.data.frame and xtable.matrix.
The first part of the code for those methods deals with checking the class of the columns being submitted so they can be treated appropriately.
It would be possible to add code to allow columns of class Date as well, but I am not willing to do that.
Firstly, there is an easy work around (at least for straight R code, I can't say for Shiny applications), which is to change any Date column to be a character column:
Second, to allow columns of class Date, would require the addition of an argument to xtable and xtable methods (of which there are currently 31) as well as to xtableFtable and xtableList. That is fraught with problems because of the large number of reverse dependencies for xtable. (Haven't counted, but if you look at xtable on CRAN you will see a stack of depends, imports and suggests.) I am going to break some packages, maybe a lot of packages if I make that sort of change. Backward compatibility is a serious problem with xtable.
Why is an extra argument necessary? Because the end result of using xtable, or more to the point print.xtable, is a string of characters. How the columns of the data frame, matrix or other structure submitted to xtable are treated is determined by firstly how they are classified (logical, character, or numeric), then by the arguments align, digits and display which can all be vectors to allow for different treatment of different columns. So if dates were to be allowed, you would need an extra argument to specify how they would be formatted, because at some point they need to be converted to character to produce the final table output.

Same answer as above, but replace sapply with vapply, slightly safer. Creates a new function xtable2 so you can compare the output. Don't quite understand #David Scott's reluctance to put this idea in xtable.
library(xtable)
xtable2 <- function(x, ...) {
# get the names of variables that are dates by inheritance
datevars <- colnames(x)[vapply(x, function(y) {
inherits(y, c("Date", "POSIXt", "POSIXct"))
}, logical(1))]
for (i in datevars){
x[ , i] <- as.character(x[, i])
}
xtable::xtable(x, ...)
}
example
> str(dat)
'data.frame': 200 obs. of 9 variables:
$ x5 : num 0.686 0.227 -1.762 0.963 -0.863 ...
$ x4 : num 1 3 3 4 4 4 4 5 6 1 ...
$ x3 : Ord.factor w/ 3 levels "med"<"lo"<"hi": 3 2 2 2 3 3 2 1 3 3 ...
$ x2 : chr "d" "c" "b" "d" ...
$ x1 : Factor w/ 5 levels "bobby","cindy",..: 3 2 4 2 3 5 2 2 5 5 ...
$ x7 : Ord.factor w/ 5 levels "a"<"b"<"c"<"d"<..: 4 2 2 2 4 5 4 5 5 4 ...
$ x6 : int 5 4 2 3 4 1 4 3 4 2 ...
$ date1: Date, format: "2020-03-04" "1999-01-01" ...
$ date2: POSIXct, format: "2020-03-04" "2005-04-04" ...
> xtable2(dat)
% latex table generated in R 4.0.3 by xtable 1.8-4 package
% Wed Dec 9 08:59:07 2020
\begin{table}[ht]
\centering
\begin{tabular}{rrrllllrll}
\hline
& x5 & x4 & x3 & x2 & x1 & x7 & x6 & date1 & date2 \\
\hline
1 & 0.69 & 1.00 & hi & d & greg & d & 5 & 2020-03-04 & 2020-03-04 \\
2 & 0.23 & 3.00 & lo & c & cindy & b & 4 & 1999-01-01 & 2005-04-04 \\
3 & -1.76 & 3.00 & lo & b & marcia & b & 2 & 2020-03-04 & 2020-03-04 \\
4 & 0.96 & 4.00 & lo & d & cindy & b & 3 & 2020-03-04 & 2020-03-04 \\
5 & -0.86 & 4.00 & hi & d & greg & d & 4 & 2005-04-04 & 2005-04-04 \\
6 & -0.30 & 4.00 & hi & b & peter & f & 1 & 2005-04-04 & 2020-03-04 \\
7 & -1.39 & 4.00 & lo & c & cindy & d & 4 & 1999-01-01 & 2005-04-04 \\
8 & -1.71 & 5.00 & med & f & cindy & f & 3 & 2005-04-04 & 2020-03-04 \\
[snip]
\hline
\end{tabular}
\end{table}

Related

Assign one of three values at random to a raster cell given constraints

I'm working on some post processing of a very large raster (437760000 cells) using other raster layers of the same extent/crs for constraints. The code is working for the most part but I'm running into an issue.
r1[r2== 6 & r3>= 40 & r3<= 60] <- sample(2:4, length(r1[r2== 6 & r3>= 40 & r3 <= 60]), replace = T)
Where r1, r2, and r3 are unique raster layers. r1 is being updated based on the constraints with the aim to improve the map.
This code executes with no issues but throws the following warning upon completion:
Warning message:
In .local(x, i, j = j, ..., value) :
the first replacement value is used for all cells
I want to ensure that all three values are being picked at random (and eventually I want to use the prob argument in sample to weight one of the values). I've tried numerous fixes and they all throw the same warning message, which I'm taking to mean that only one of the three values is being applied across the raster. I am working in terra for this.
Any thoughts? Thanks!
Here is a reproducible example for your problem:
library(terra)
set.seed(123)
r1 <- rast(matrix(round(runif(400, 0, 100)), 20, 20))
plot(r1)
r2 <- rast(matrix(round(runif(400, 0, 10)), 20, 20))
r3 <- rast(matrix(round(runif(400, 30, 70)), 20, 20))
Even if I wasn't able to reproduce your warning code, I think your problem is in your interpretation of this call: r2== 6 & r3>= 40 & r3 <= 60. This line produces a raster:
r2== 6 & r3>= 40 & r3 <= 60
class : SpatRaster
dimensions : 20, 20, 1 (nrow, ncol, nlyr)
resolution : 0.05, 0.05 (x, y)
extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)
coord. ref. :
source : memory
name : lyr.1
min value : 0
max value : 1
An therefore making this call r1[r2== 6 & r3>= 40 & r3 <= 60] produce a dataframe:
str(r1[r2== 6 & r3>= 40 & r3 <= 60])
'data.frame': 17 obs. of 1 variable:
$ lyr.1: num 2 2 2 2 2 2 2 2 2 2 ...
You don't want that because length of a 1 column data.frame = 1 and because you can't do value substitution with a data.frame.
Try this instead:
pixel_to_change <- values(r2== 6 & r3>= 40 & r3<= 60) == 1
r1[pixel_to_change] <- sample(2:4, sum(pixel_to_change), replace = T)
It may be what your are looking for.
Here are two alternative (but similar) solutions to Bastien's. They avoid using values which can be problematic with very large datasets.
library(terra)
set.seed(123)
r1 <- rast(matrix(round(runif(400, 0, 100)), 20, 20))
r2 <- rast(matrix(round(runif(400, 0, 10)), 20, 20))
r3 <- rast(matrix(round(runif(400, 30, 70)), 20, 20))
#1, use lapp
x <- c(r1, r2, r3)
z <- lapp(x, function(r1, r2, r3) {
i <- r2== 6 & r3>= 40 & r3<= 60
r1[i] <- sample(2:4, sum(i), replace = T)
r1
})
#2, use global (instead of length, take the global sum of the raster with TRUE/FALSE values)
i <- r2== 6 & r3>= 40 & r3<= 60
n <- unlist(global(i, "sum"))
r1[i] <- sample(2:4, n, replace = T)

dbGetQuery select all existing columns

I have a vector of columns that I would like to select from the databases. If the column is missing, I want to select all of the columns that exists. But, I am not sure how to specify this in my query?
For example, to select column drat I specify "SELECT drat FROM mtcars". Let's say my column names are drat and colMissing.
My query does not work "SELECT drat, colMissing FROM mtcars" as Error: no such column: colMissing .
However, I want drat exporting. How can I make sure that all existing columns will be exported, and non existing skipped? In my real data, I have a long vector of columns names and many databases, so I want to do it automatically.
Dummy example:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars)
dbGetQuery(con, "SELECT * FROM mtcars") # select all columns
dbGetQuery(con, "SELECT drat, wt, disp, colMissing FROM mtcars", n = 6) # does not work as contains non existing columns name. How to export only existing ones?
I don't think SQL gives you an easy way to dynamically set the columns to select in this fashion. I think the easiest way to do this type of filtering is to determine the columns to join dynamically and create the query programmatically.
cols <- c("drat", "wt", "disp", "colMissing")
cols_to_select <- intersect(dbListFields(con, "mtcars"), cols)
cols_to_select
# [1] "disp" "drat" "wt"
qry <- paste("select", paste(dbQuoteIdentifier(con, cols_to_select), collapse = ","), "from mtcars")
qry
# [1] "select `disp`,`drat`,`wt` from mtcars"
head(dbGetQuery(con, qry))
# disp drat wt
# 1 160 3.90 2.620
# 2 160 3.90 2.875
# 3 108 3.85 2.320
# 4 258 3.08 3.215
# 5 360 3.15 3.440
# 6 225 2.76 3.460
I'm taking deliberate steps here to mitigate the risk of inadvertent SQL-injection that comes with paste-ing a query together. It is feasible that column names of an existing frame could be rather stupidly-malicious. (And no, I don't think the risk of these names is real, this type of mistake is much more likely to create a syntax error.)
someframe <- data.frame(a=1,b=2)
names(someframe)[1] <- "Robert');DROP TABLE Students;--"
qry <- paste("select", paste(names(someframe), collapse = ","), "from mtcars")
qry
# [1] "select Robert');DROP TABLE Students;--,b from mtcars"
Okay, so that won't work here (despite https://xkcd.com/327/), but ... be careful when forming a query dynamically. dbQuoteIdentifier is one function with the intent of mitigating this risk. With comparison data (e.g., WHERE cyl > 5), it is much better to use parameter-binding (i.e., WHERE cyl > ?); this doesn't work in the SELECT portion, however, so caveat emptor.
As an aside ... I believe SQL-injection discussions normally focus on the parameters (within the WHERE clause) of the query, not on the fields to be selected. However, it is feasible to make this happen with field names, though it requires knowing the target table name in the injection. (I'm using SQL Server below.)
DBI::dbWriteTable(con, "#r2mt", mtcars[1:2,])
DBI::dbGetQuery(con, "select * from #r2mt")
# row_names mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
names(someframe)[1] <- 'cyl" from #r2mt;DROP TABLE #r2mt;--'
qry <- paste("select", paste(dQuote(names(someframe)), collapse = ", "), "from #r2mt")
qry
# [1] "select \"cyl\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
DBI::dbGetQuery(con, qry)
# cyl
# 1 6
# 2 6
DBI::dbGetQuery(con, "select * from #r2mt")
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name '#r2mt'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'select * from #r2mt'
I should note that while dQuote did not protect against this, dbQuoteIdentifer did:
DBI::dbWriteTable(con, "#r2mt", mtcars[1:2,])
qry <- paste("select", paste(DBI::dbQuoteIdentifier(con, names(someframe)), collapse = ", "), "from #r2mt")
qry
# [1] "select \"cyl\"\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
DBI::dbGetQuery(con, "select * from #r2mt")
# row_names mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
DBI::dbGetQuery(con, qry)
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'cyl" from #r2mt;DROP TABLE #r2mt;--'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'b'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'select "cyl"" from #r2mt;DROP TABLE #r2mt;--", "b" from #r2mt'
Where the clear difference in qry is shown here:
# [1] "select \"cyl\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
# [1] "select \"cyl\"\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
I was unable to defeat dbQuoteIdentifier in order to stop the escaping of " in this use.

Boolean function simplifier?

x = (a & b & d) | ~(a | ~b | c) | (~c & ~d & a) | (c & d)
~ = not
& = and
| = or
How do I simplify a function like this, with what should I start?
I've tried some simplifying programs but I don't understand them.
You should write out a truth table for the variables involved and the eventual output.
Then, for each of the rows in the truth table that turn out to be true, you write a logic equation based upon the variables' states to reproduce that logic "one", usually an AND function of the appropriate inputs and inverse inputs.
Say only 3 of the rows have a true or logic one output.
That would mean you'd have three logic equations.
You would complete the job by connecting those three equations together with OR operators.
By looking at the truth table, you might be able to notice that the output of the logical true lines do not depend on all of the variables. This is one way of simplifying the expression.
Solving an equation similar to the one you put above
(a & b & d) | (~a | b | ~c) | (~c & ~d & a) | (c & d)
I get the following result
x = 1 except for one case, i.,e., (a b c d) = (1 0 1 0), in which case it is zero.
Thus x = ~( a & ~b & c & ~d) or x = ~a | b | ~c | d
How to do this?
To make it easier to do this, you can rewrite your equation as
x = A | B | C | D, where
A = (a & b & d)
B = (~a | b | ~c)
C = ~c & ~d & a
D = c & d
variable B = 1 for all but two sets of inputs of (abcd) namely (1010) and (1011).
variable A = 1 for only only two input sets, which B already covers.
similarly with variable C.
Variable D = 1 for one of the two sets of inputs B didn't make = 1, namely (1011).
Thus x = 0 only when the inputs are exactly a=1, b=0, c=1, d=0, but we want to write it as an equation that is True (=1) when those inputs are given, so we write
x = ~(a & ~b & c & ~d) or x = ~a | b | ~c | d
So that is one way of simplifying. I'll add a second technique in a separate answer.
sorry it took so long to spell it out, but perhaps others will find it useful.
The original equation of the OP is fairly simplified as is. The truth table has nearly equal T and F entries, and thus doesn't lend itself well to a demonstration of the technique. One could rewrite it as
x = (a & b & d) | (~a & b & ~c) | (a & ~c & ~d) | (c & d)
which is fairly compact but could be written slightly differently combining the 1st and last terms and the middle two terms:
x = ((a & b | c) & d) | ((~a & b | a & ~d) & ~c)
see 2nd proposed answer below for a further explanation

MySql Query Table of Masks

I have a table that is filled with a variety of "masks" such at this:
Type Mask1 Mask2 Mask3
0 fff fff ff
1 aff fff ff
2 aff fff 92
3 001 fff 00
And basically I want to query the database and see if a particular query matches, say a00-111-12. Anywhere there is an f (this is all in hex) I want to say there is a match. So I take the value a00-111-12 and it should match with rows 0 and 1 but not 2 and 3 because in row 0, all f's appear and thus a value AND'd with them would result in that same value. BUT, AND-ing does not work since if testing with row 2, Mask3 column value 92 AND'd with 12 results in 12, however I don't want that row to be a match.
I find this a difficult question to ask, it may not be possible with a few MySQL Queries but I want to avoid importing the entire table into PHP and then finding the correct rows from there.
An idea of a query would be:
SELECT * FROM TABLE WHERE Mask1 = a00 AND Mask2 = 111 AND ...
However some operation would need to be done on either Mask1, 2, 3 or the value being sent to the query.
The end goal is to get the Type from the matching rows. If you need more information please ask.
Create a submasks table to make your job easier, add one row
z1 : z2 : z3
0xf : 0xf0 : 0xf00
Then use the following query
Select
t.*
from Table t
inner join submasks s
on (
((t.Mask1 & s.z1) = s.z1 || (t.Mask1 & s.z1) = (a00 & s.z1)) &&
((t.Mask1 & s.z2) = s.z2 || (t.Mask1 & s.z2) = (a00 & s.z2)) &&
((t.Mask1 & s.z2) = s.z2 || (t.Mask1 & s.z2) = (a00 & s.z2)) &&
((t.Mask2 & s.z1) = s.z1 || (t.Mask2 & s.z1) = (111 & s.z1)) &&
((t.Mask2 & s.z2) = s.z2 || (t.Mask2 & s.z2) = (111 & s.z2)) &&
((t.Mask2 & s.z2) = s.z2 || (t.Mask2 & s.z2) = (111 & s.z2)) &&
((t.Mask3 & s.z1) = s.z1 || (t.Mask3 & s.z1) = (12 & s.z1)) &&
((t.Mask3 & s.z2) = s.z2 || (t.Mask3 & s.z2) = (12 & s.z2))
)
The way this works is by comparing individual hex digits by performing a bitwise AND with z1,z2 and z2 to get each of the 3 digits respectively.
so
<any value> & z1 sets all hex digits except the last to 0, ie 0x123 becomes 0x003
<any value> & z2 sets all hex digits except the second from last to 0, ie 0x123 becomes 0x020
<any value> & z3 sets all hex digits except the third from last to 0, ie 0x123 becomes 0x100
Using this filter the test for each digit can be built as
((mask & filter) = filter) // is the digit f
|| // OR
((mask & filter) = (test & filter)) // is the digit the same.
Repeat the test for each of z1,z2 and z3 (ie 0x00f, 0x0f0, and 0xf00) combine the results with an and condition and you can check all 3 hex digits of the mask are either f or exactly the test value.
This is then repeated for Mask2 and Mask3 (but only z1,z2 as Mask3 is 2 digits).
By using inner join with the submasks table the result will only include the values from Table where the mask conditions are true.
UPDATE - you may want to perform select distinct instead of just select as if two masks match a single row in Table then 2 results will be returned.
Don't know if I explained my question well enough but I ended up coming to the conlusion that this works best:
val1 = 0xa00
val2 = 0x111
val3 = 0x12
SELECT * FROM TABLE WHERE
((Mask1 | val1)=val1 OR (Mask1 | val1)=0xfff) AND
((Mask1 | val2)=val2 OR (Mask1 | val2)=0xfff) AND
((Mask1 | val3)=val3 OR (Mask1 | val2)=0xfff);
The only problem is that val1=a00 will not match with Mask1=aff although I would like it to. Still working on it...

Use R or mysql to calculate time period returns?

I'm trying to calculate various time period returns (monthly, quarterly, yearly etc.) for each unique member (identified by Code in the example below) of a data set. The data set will contain monthly pricing information for a 20 year period for approximately 500 stocks. An example of the data is below:
Date Code Price Dividend
1 2005-01-31 xyz 1000.00 20.0
2 2005-01-31 abc 1.00 0.1
3 2005-02-28 xyz 1030.00 20.0
4 2005-02-28 abc 1.01 0.1
5 2005-03-31 xyz 1071.20 20.0
6 2005-03-31 abc 1.03 0.1
7 2005-04-30 xyz 1124.76 20.0
I am fairly new to R, but thought that there would be a more efficient solution than looping through each Code and then each Date as shown here:
uniqueDates <- unique(data$Date)
uniqueCodes <- unique(data$Code
for (date in uniqueDates) {
for (code in uniqueCodes) {
nextDate <- seq.Date(from=stock_data$Date[i], by="3 months",length.out=2)[2]
curPrice <- data$Price[data$Date == date]
futPrice <- data$Price[data$Date == nextDate]
data$ret[(data$Date == date) & (data$Code == code)] <- (futPrice/curPrice)-1
}
}
This method in itself has an issue in that seq.Date does not always return the final day in the month.
Unfortunately the data is not uniform (the number of companies/codes varies over time) so using a simple row offset won't work. The calculation must match the Code and Date with the desired date offset.
I had initially tried selecting the future dates by using the seq.Date function
data$ret = (data[(data$Date == (seq.Date(from = data$Date, by="3 month", length.out=2)[2])), "Price"] / data$Price) - 1
But this generated an error as seq.Date requires a single entry.
> Error in seq.Date(from = stock_data$Date, by = "3 month", length.out =
> 2) : 'from' must be of length 1
I thought that R would be well suited to this type of calculation but perhaps not. Since all the data is in a mysql database I am now thinking that it might be faster/easier to do this calc directly in the database.
Any suggestions would be greatly appreciated.
Load data:
tc='
Date Code Price Dividend
2005-01-31 xyz 1000.00 20.0
2005-01-31 abc 1.00 0.1
2005-02-28 xyz 1030.00 20.0
2005-02-28 abc 1.01 0.1
2005-03-31 xyz 1071.20 20.0
2005-03-31 abc 1.03 0.1
2005-04-30 xyz 1124.76 20.0'
df = read.table(text=tc,header=T)
df$Date=as.Date(df$Date,"%Y-%m-%d")
First I would organize the data by date:
library(plyr)
pp1=reshape(df,timevar='Code',idvar='Date',direction='wide')
Then you would like to obtain monthly, quarterly, yearly, etc returns.
For that there are several options, one could be:
Make the data zoo or xts class. i.e
library(xts)
pp1[2:ncol(pp1)] = as.xts(pp1[2:ncol(pp1)],order.by=pp1$Date)
#let's create a function for calculating returns.
rets<-function(x,lag=1){
return(diff(log(x),lag))
}
Since this database is monthly, the lags for the returns will be:
monthly=1, quaterly=3, yearly =12. for instance let's calculate monthly return
for xyz.
lagged=1 #for monthly
This calculates Monthly returns for xyz
pp1$returns_xyz= c(NA,rets(pp1$Price.xyz,lagged))
To get all the returns:
#create matrix of returns
pricelist= ls(pp1)[grep('Price',ls(pp1))]
returnsmatrix = data.frame(matrix(rep(0,(nrow(pp1)-1)*length(pricelist)),ncol=length(pricelist)))
j=1
for(i in pricelist){
n = which(names(pp1) == i)
returnsmatrix[,j] = rets(pp1[,n],1)
j=j+1
}
#column names
codename= gsub("Price.", "", pricelist, fixed = TRUE)
names(returnsmatrix)=paste('ret',codename,sep='.')
returnsmatrix
You can do this very easily with the quantmod and xts packages. Using the data in AndresT's answer:
library(quantmod) # loads xts too
pp1 <- reshape(df,timevar='Code',idvar='Date',direction='wide')
# create an xts object
x <- xts(pp1[,-1], pp1[,1])
# only get the "Price.*" columns
p <- getPrice(x)
# run the periodReturn function on each column
r <- apply(p, 2, periodReturn, period="monthly", type="log")
# merge prior result into a multi-column object
r <- do.call(merge, r)
# rename columns
names(r) <- paste("monthly.return",
sapply(strsplit(names(p),"\\."), "[", 2), sep=".")
Which leaves you with an r xts object containing:
monthly.return.xyz monthly.return.abc
2005-01-31 0.00000000 0.000000000
2005-02-28 0.02955880 0.009950331
2005-03-31 0.03922071 0.019608471
2005-04-30 0.04879016 NA