I have a vector of columns that I would like to select from the databases. If the column is missing, I want to select all of the columns that exists. But, I am not sure how to specify this in my query?
For example, to select column drat I specify "SELECT drat FROM mtcars". Let's say my column names are drat and colMissing.
My query does not work "SELECT drat, colMissing FROM mtcars" as Error: no such column: colMissing .
However, I want drat exporting. How can I make sure that all existing columns will be exported, and non existing skipped? In my real data, I have a long vector of columns names and many databases, so I want to do it automatically.
Dummy example:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars)
dbGetQuery(con, "SELECT * FROM mtcars") # select all columns
dbGetQuery(con, "SELECT drat, wt, disp, colMissing FROM mtcars", n = 6) # does not work as contains non existing columns name. How to export only existing ones?
I don't think SQL gives you an easy way to dynamically set the columns to select in this fashion. I think the easiest way to do this type of filtering is to determine the columns to join dynamically and create the query programmatically.
cols <- c("drat", "wt", "disp", "colMissing")
cols_to_select <- intersect(dbListFields(con, "mtcars"), cols)
cols_to_select
# [1] "disp" "drat" "wt"
qry <- paste("select", paste(dbQuoteIdentifier(con, cols_to_select), collapse = ","), "from mtcars")
qry
# [1] "select `disp`,`drat`,`wt` from mtcars"
head(dbGetQuery(con, qry))
# disp drat wt
# 1 160 3.90 2.620
# 2 160 3.90 2.875
# 3 108 3.85 2.320
# 4 258 3.08 3.215
# 5 360 3.15 3.440
# 6 225 2.76 3.460
I'm taking deliberate steps here to mitigate the risk of inadvertent SQL-injection that comes with paste-ing a query together. It is feasible that column names of an existing frame could be rather stupidly-malicious. (And no, I don't think the risk of these names is real, this type of mistake is much more likely to create a syntax error.)
someframe <- data.frame(a=1,b=2)
names(someframe)[1] <- "Robert');DROP TABLE Students;--"
qry <- paste("select", paste(names(someframe), collapse = ","), "from mtcars")
qry
# [1] "select Robert');DROP TABLE Students;--,b from mtcars"
Okay, so that won't work here (despite https://xkcd.com/327/), but ... be careful when forming a query dynamically. dbQuoteIdentifier is one function with the intent of mitigating this risk. With comparison data (e.g., WHERE cyl > 5), it is much better to use parameter-binding (i.e., WHERE cyl > ?); this doesn't work in the SELECT portion, however, so caveat emptor.
As an aside ... I believe SQL-injection discussions normally focus on the parameters (within the WHERE clause) of the query, not on the fields to be selected. However, it is feasible to make this happen with field names, though it requires knowing the target table name in the injection. (I'm using SQL Server below.)
DBI::dbWriteTable(con, "#r2mt", mtcars[1:2,])
DBI::dbGetQuery(con, "select * from #r2mt")
# row_names mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
names(someframe)[1] <- 'cyl" from #r2mt;DROP TABLE #r2mt;--'
qry <- paste("select", paste(dQuote(names(someframe)), collapse = ", "), "from #r2mt")
qry
# [1] "select \"cyl\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
DBI::dbGetQuery(con, qry)
# cyl
# 1 6
# 2 6
DBI::dbGetQuery(con, "select * from #r2mt")
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name '#r2mt'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'select * from #r2mt'
I should note that while dQuote did not protect against this, dbQuoteIdentifer did:
DBI::dbWriteTable(con, "#r2mt", mtcars[1:2,])
qry <- paste("select", paste(DBI::dbQuoteIdentifier(con, names(someframe)), collapse = ", "), "from #r2mt")
qry
# [1] "select \"cyl\"\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
DBI::dbGetQuery(con, "select * from #r2mt")
# row_names mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# 2 Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
DBI::dbGetQuery(con, qry)
# Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'cyl" from #r2mt;DROP TABLE #r2mt;--'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'b'. [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Statement(s) could not be prepared.
# <SQL> 'select "cyl"" from #r2mt;DROP TABLE #r2mt;--", "b" from #r2mt'
Where the clear difference in qry is shown here:
# [1] "select \"cyl\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
# [1] "select \"cyl\"\" from #r2mt;DROP TABLE #r2mt;--\", \"b\" from #r2mt"
I was unable to defeat dbQuoteIdentifier in order to stop the escaping of " in this use.
Related
I have the following data:
transaction <- c(1,2,3);
date <- c("2010-01-31","2010-02-28","2010-03-31");
type <- c("debit", "debit", "credit");
amount <- c(-500, -1000.97, 12500.81);
oldbalance <- c(5000, 4500, 17000.81)
evolution <- data.frame(transaction, date, type, amount, oldbalance, row.names=transaction, stringsAsFactors=FALSE);
evolution <- transform(evolution, newbalance = oldbalance + amount);
evolution
Running
> library(xtable)
> xtable(evolution)
works fine. But if I add the line
evolution$date <- as.Date(evolution$date, "%Y-%m-%d");
to give
transaction <- c(1,2,3);
date <- c("2010-01-31","2010-02-28","2010-03-31");
type <- c("debit", "debit", "credit");
amount <- c(-500, -1000.97, 12500.81);
oldbalance <- c(5000, 4500, 17000.81)
evolution <- data.frame(transaction, date, type, amount, oldbalance, row.names=transaction, stringsAsFactors=FALSE);
evolution$date <- as.Date(evolution$date, "%Y-%m-%d");
evolution <- transform(evolution, newbalance = oldbalance + amount);
evolution
then running xtable gives
xtable(evolution)
Error in Math.Date(x + ifelse(x == 0, 1, 0)) :
abs not defined for Date objects
But it can be useful to use xtable in such a case to do some filtering of dates
evolution$date <- as.Date(evolution$date, "%Y-%m-%d")
startdate <-as.Date("2010-02-01");
enddate <-as.Date("2010-03-30");
newdate <-evolution[which (evolution$date >= startdate & evolution$date <= enddate),]
newdate
> newdate
transaction date type amount oldbalance newbalance
2 2 2010-02-28 debit -1000.97 4500 3499.03
> xtable(newdate)
Error in Math.Date(x + ifelse(x == 0, 1, 0)) :
abs not defined for Date objects
This is arguably a bug in xtable - you may want to report it to the maintainer.
A temporary work-around is to call as.character() on the classes that xtable misinterprets (apart from "Date" I can think of "POSIXt" but there may be others), e.g.:
xtable <- function(x, ...) {
for (i in which(sapply(x, function(y) !all(is.na(match(c("POSIXt","Date"),class(y))))))) x[[i]] <- as.character(x[[i]])
xtable::xtable(x, ...)
}
It does appear that xtable does not always play nicely with columns of class Date. (It does have zoo and ts methods, but those may not help if you have a single column of dates/times in a data frame, as coercion to zoo appears to alter the column names in the resulting table.) A few notes:
The error is actually being thrown by print.xtable, (not xtable.data.frame), which is called by default in order to display the results of xtable in the console. So you'd find that if you stored the results of xtable in a variable, you'd get no error, but then when you tried to print it, the same error would pop up.
Since you've wisely stored your dates in YYYY-MM-DD format, converting them to Date objects actually isn't necessary to use ordered selections, since they will sort properly as characters. So you could actually get away with simply keeping them as characters.
In cases with more complex date/time objects you could do the subsetting first and then convert those columns to characters. Or create a wrapper for xtable.data.frame and add the lines at the beginning,
dates <- sapply(x,FUN = function(x){class(x) == "Date"})
x[,dates] <- as.character(x[,dates])
checking for class Date, or whatever class you're dealing with.
IMHO, xtable.data.frame should probably be checking for Dates, and possibly for other POSIX classes as well and converting them to strings as well. This may be a simple change, and may be worth contacting the package author about.
Lastly, the semicolons as line terminators are not necessary. :) Habit from another language?
As the maintainer of xtable I would like to state what I see as the true position regarding dates in xtable.
This is not really a bug, but the absence of a feature you might think is desirable.
The problem is that xtable only can deal with three different classes of columns: logical; character; and numeric. If you try to submit a table where the class of a column is Date, then it cannot deal with it. The relevant code is the set of xtable methods, the most important of which are xtable.data.frame and xtable.matrix.
The first part of the code for those methods deals with checking the class of the columns being submitted so they can be treated appropriately.
It would be possible to add code to allow columns of class Date as well, but I am not willing to do that.
Firstly, there is an easy work around (at least for straight R code, I can't say for Shiny applications), which is to change any Date column to be a character column:
Second, to allow columns of class Date, would require the addition of an argument to xtable and xtable methods (of which there are currently 31) as well as to xtableFtable and xtableList. That is fraught with problems because of the large number of reverse dependencies for xtable. (Haven't counted, but if you look at xtable on CRAN you will see a stack of depends, imports and suggests.) I am going to break some packages, maybe a lot of packages if I make that sort of change. Backward compatibility is a serious problem with xtable.
Why is an extra argument necessary? Because the end result of using xtable, or more to the point print.xtable, is a string of characters. How the columns of the data frame, matrix or other structure submitted to xtable are treated is determined by firstly how they are classified (logical, character, or numeric), then by the arguments align, digits and display which can all be vectors to allow for different treatment of different columns. So if dates were to be allowed, you would need an extra argument to specify how they would be formatted, because at some point they need to be converted to character to produce the final table output.
Same answer as above, but replace sapply with vapply, slightly safer. Creates a new function xtable2 so you can compare the output. Don't quite understand #David Scott's reluctance to put this idea in xtable.
library(xtable)
xtable2 <- function(x, ...) {
# get the names of variables that are dates by inheritance
datevars <- colnames(x)[vapply(x, function(y) {
inherits(y, c("Date", "POSIXt", "POSIXct"))
}, logical(1))]
for (i in datevars){
x[ , i] <- as.character(x[, i])
}
xtable::xtable(x, ...)
}
example
> str(dat)
'data.frame': 200 obs. of 9 variables:
$ x5 : num 0.686 0.227 -1.762 0.963 -0.863 ...
$ x4 : num 1 3 3 4 4 4 4 5 6 1 ...
$ x3 : Ord.factor w/ 3 levels "med"<"lo"<"hi": 3 2 2 2 3 3 2 1 3 3 ...
$ x2 : chr "d" "c" "b" "d" ...
$ x1 : Factor w/ 5 levels "bobby","cindy",..: 3 2 4 2 3 5 2 2 5 5 ...
$ x7 : Ord.factor w/ 5 levels "a"<"b"<"c"<"d"<..: 4 2 2 2 4 5 4 5 5 4 ...
$ x6 : int 5 4 2 3 4 1 4 3 4 2 ...
$ date1: Date, format: "2020-03-04" "1999-01-01" ...
$ date2: POSIXct, format: "2020-03-04" "2005-04-04" ...
> xtable2(dat)
% latex table generated in R 4.0.3 by xtable 1.8-4 package
% Wed Dec 9 08:59:07 2020
\begin{table}[ht]
\centering
\begin{tabular}{rrrllllrll}
\hline
& x5 & x4 & x3 & x2 & x1 & x7 & x6 & date1 & date2 \\
\hline
1 & 0.69 & 1.00 & hi & d & greg & d & 5 & 2020-03-04 & 2020-03-04 \\
2 & 0.23 & 3.00 & lo & c & cindy & b & 4 & 1999-01-01 & 2005-04-04 \\
3 & -1.76 & 3.00 & lo & b & marcia & b & 2 & 2020-03-04 & 2020-03-04 \\
4 & 0.96 & 4.00 & lo & d & cindy & b & 3 & 2020-03-04 & 2020-03-04 \\
5 & -0.86 & 4.00 & hi & d & greg & d & 4 & 2005-04-04 & 2005-04-04 \\
6 & -0.30 & 4.00 & hi & b & peter & f & 1 & 2005-04-04 & 2020-03-04 \\
7 & -1.39 & 4.00 & lo & c & cindy & d & 4 & 1999-01-01 & 2005-04-04 \\
8 & -1.71 & 5.00 & med & f & cindy & f & 3 & 2005-04-04 & 2020-03-04 \\
[snip]
\hline
\end{tabular}
\end{table}
I'm having the hardest time generating confidence intervals for my glmer poisson model. After following several very helpful tutorials (such as https://drewtyre.rbind.io/classes/nres803/week_12/lab_12/) as well as stackoverflow posts, I keep getting very strange results, i.e. the upper and lower limits of the CI are identical.
Here is a reproducible example containing a response variable called "production," a fixed effect called "Treatment_Num" and a random effect called "Genotype":
df1 <- data.frame(production=c(15,12,10,9,6,8,9,5,3,3,2,1,0,0,0,0), Treatment_Num=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), Genotype=c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2))
#run the glmer model
df1_glmer <- glmer(production ~ Treatment_Num +(1|Genotype),
data = df1, family = poisson(link = "log"))
#make an empty data set to predict from, that contains the explanatory variables but no response
require(magrittr)
df_empty <- df1 %>%
tidyr::expand(Treatment_Num, Genotype)
#create new column containing predictions
df_empty$PopPred <- predict(df1_glmer, newdata = df_empty, type="response",re.form = ~0)
#function for bootMer
myFunc_df1_glmer <- function(mm) {
predict(df1_glmer, newdata = df_empty, type="response",re.form=~0)
}
#run bootMer
require(lme4)
merBoot_df1_glmer <- bootMer(df1_glmer, myFunc_df1_glmer, nsim = 10)
#get confidence intervals out of it
predCL <- t(apply(merBoot_df1_glmer$t, MARGIN = 2, FUN = quantile, probs = c(0.025, 0.975)))
#enter lower and upper limits of confidence interval into df_empty
df_empty$lci <- predCL[, 1]
df_empty$uci <- predCL[, 2]
#when viewing df_empty the problem becomes clear: the lci and uci are identical!
df_empty
Any insights you can give me will be much appreciated!
Ignore my comment!
The issue is with the function you created to pass to bootMer(). You wrote:
myFunc_df1_glmer <- function(mm) {
predict(df1_glmer, newdata = df_empty, type="response",re.form=~0)
}
The argument mm should be a fitted model object derived from the bootstrapped data.
However, you don't pass this object to predict(), but rather the original model
object. If you change the function to:
myFunc_df1_glmer <- function(mm) {
predict(mm, newdata = df_empty, type="response",re.form=~0)
#^^ pass in the object created by bootMer
}
then it works:
> df_empty
# A tibble: 8 x 5
Treatment_Num Genotype PopPred lci uci
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 12.9 9.63 15.7
2 1 2 12.9 9.63 15.7
3 2 1 5.09 3.87 5.89
4 2 2 5.09 3.87 5.89
5 3 1 2.01 1.20 2.46
6 3 2 2.01 1.20 2.46
7 4 1 0.796 0.361 1.14
8 4 2 0.796 0.361 1.14
As an aside -- how many genotypes in your actual data? If less than 5-7 you might
do better using a straight up glm() with genotype as a factor using sum-to-zero
contrasts.
I have a SQLite database which I query using the RSQLite package for R. I have a categorical vector I would like to filter my query by such that my query would look something like this:
dbGetQuery(mydb,
'select PLT_CN, INVYR
from GRM
where ESTN_TYPE = "AL"')
This would normally work fine, and return all data where the level of ESTN_TYPE is AL.
HOWEVER.
It does not do this. This is because the within the .csv file in which the data are stored, the value AL is actually entered as "AL". So, when I query for AL, my query returns zero data. How can I fix this?
(thanks to #Parfait for making me realize this was my real problem in a previous question).
RSQLite
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
df <- data.frame(a=1:5, b=sprintf('"%s"', letters[1:5]), stringsAsFactors=F)
df
# a b
# 1 1 "a"
# 2 2 "b"
# 3 3 "c"
# 4 4 "d"
# 5 5 "e"
dbWriteTable(con, "tbl", df)
# [1] TRUE
dbGetQuery(con, 'select * from tbl')
# a b
# 1 1 "a"
# 2 2 "b"
# 3 3 "c"
# 4 4 "d"
# 5 5 "e"
dbGetQuery(con, 'select * from tbl where b="a"')
# [1] a b
# <0 rows> (or 0-length row.names)
Using parameterized queries is generally a good thing anyway, so two-birds-one-stone, so to speak:
dbGetQuery(con, 'select * from tbl where b=:x', params=list(x='"a"'))
# a b
# 1 1 "a"
dbGetQuery(con, 'select * from tbl where b in (:x)', params=list(x=c('"a"','"c"')))
# a b
# 1 1 "a"
# 2 3 "c"
RMySQL
(I don't have an instance of mysql handy, so this is a guess.)
Use #x instead of :x:
dbGetQuery(con, 'select * from tbl where b=#x', params=list(x='"a"'))
I am using the below code after creating my connection, "mydb" with my MySQL server to import the data into R and it's working fine.
my_data <- dbReadTable(mydb, "ar_data")
But I don't want to import or read the whole table, I just don't want to read first 5 columns. How can I do that ?
Maybe try dbSendQuery:
library(DBI)
library(RMySQL)
drv <- dbDriver("MySQL")
con <- dbConnect (drv, dbname="mydb", user="username")
dbWriteTable(con, "mtcars", mtcars)
dbReadTable(con, "mtcars") # full table
sql <- paste0("SELECT ", paste(dbListFields(con, "mtcars")[-(1:5)], collapse=","), " FROM mtcars LIMIT 5")
res <- dbSendQuery(con, sql)
dbFetch(res)
# drat wt qsec vs am gear carb
# 1 3.90 2.620 16.46 0 1 4 4
# 2 3.90 2.875 17.02 0 1 4 4
# 3 3.85 2.320 18.61 1 1 4 1
# 4 3.08 3.215 19.44 1 0 3 1
# 5 3.15 3.440 17.02 0 0 3 2
dbClearResult(res)
res <- dbSendQuery(con, 'DROP TABLE mtcars')
dbDisconnect(con)
OK, to set the scene, I have written a function to import multiple tables from MySQL (using RODBC) and run randomForest() on them.
This function is run on multiple databases (as separate instances).
In one particular database, and one particular table, the "error in as.POSIXlt.character(x, tz,.....): character string not in a standard unambiguous format" error is thrown. The function runs on around 150 tables across two databases without any issues except this one table.
Here is a head() print from the table:
MQLTime bar5 bar4 bar3 bar2 bar1 pat1 baXRC
1 2014-11-05 23:35:00 184 24 8 24 67 147 Flat
2 2014-11-05 23:57:00 203 184 204 67 51 147 Flat
3 2014-11-06 00:40:00 179 309 49 189 75 19 Flat
4 2014-11-06 00:46:00 28 192 60 49 152 147 Flat
5 2014-11-06 01:20:00 309 48 9 11 24 19 Flat
6 2014-11-06 01:31:00 24 177 64 152 188 19 Flat
And here is the function:
GenerateRF <- function(db, countstable, RFcutoff) {
'load required libraries'
library(RODBC)
library(randomForest)
library(caret)
library(ff)
library(stringi)
'connection and data preparation'
connection <- odbcConnect ('TTODBC', uid='root', pwd='password', case="nochange")
'import count table and check if RF is allowed to be built'
query.str <- paste0 ('select * from ', db, '.', countstable, ' order by RowCount asc')
row.counts <- sqlQuery (connection, query.str)
'Operate only on tables that have >= RFcutoff'
for (i in 1:nrow (row.counts)) {
table.name <- as.character (row.counts[i,1])
col.count <- as.numeric (row.counts[i,2])
row.count <- as.numeric (row.counts[i,3])
if (row.count >= 20) {
'Delete old RFs and DFs for input pattern'
if (file.exists (paste0 (table.name, '_RF.Rdata'))) {
file.remove (paste0 (table.name, '_RF.Rdata'))
}
if (file.exists (paste0 (table.name, '_DF.Rdata'))) {
file.remove (paste0 (table.name, '_DF.Rdata'))
}
'import and clean data'
query.str2 <- paste0 ('select * from ', db, '.', table.name, ' order by mqltime asc')
raw.data <- sqlQuery(connection, query.str2)
'partition data into training/test sets'
set.seed(489)
index <- createDataPartition(raw.data$baXRC, p=0.66, list=FALSE, times=1)
data.train <- raw.data [index,]
data.test <- raw.data [-index,]
'find optimal trees to grow (without outcome and dates)
data.mtry <- as.data.frame (tuneRF (data.train [, c(-1,-col.count)], data.train$baXRC, ntreetry=100,
stepFactor=.5, improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE))
best.mtry <- data.mtry [which (data.mtry[,2] == min (data.mtry[,2])), 1]
'compress df'
data.ff <- as.ffdf (data.train)
'run RF. Originally set to 1000 trees but M1 dataset is to large for laptop. Maybe train at the lab?'
data.rf <- randomForest (baXRC~., data=data.ff[,-1], mtry=best.mtry, ntree=500, keep.forest=TRUE,
importance=TRUE, proximity=FALSE)
'generate and print variable importance plot'
varImpPlot (data.rf, main = table.name)
'predict on test data'
data.test.pred <- as.data.frame( predict (data.rf, data.test, type="prob"))
'get dates and name date column'
data.test.dates <- data.frame (data.test[,1])
colnames (data.test.dates) <- 'MQLTime'
'attach dates to prediction df'
data.test.res <- cbind (data.test.dates, data.test.pred)
'force date coercion to attempt negating unambiguous format error '
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
'delete row names, coerce to dataframe, generate row table name and export outcomes to MySQL'
rownames (data.test.res)<-NULL
data.test.res <- as.data.frame (data.test.res)
root.table <- stri_sub(table.name, 0, -5)
sqlUpdate (connection, data.test.res, tablename = paste0(db, '.', root.table, '_outcome'), index = "MQLTime")
'save RF and test df/s for future use; save latest version of row_counts to MQL4 folder'
save (data.rf, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_RF.Rdata'))
save (data.test, file = paste0 ("C:/Users/user/Documents/RF_test2/", table.name, '_DF.Rdata'))
write.table (row.counts, paste0("C:/Users/user/AppData/Roaming/MetaQuotes/Terminal/71FA4710ABEFC21F77A62A104A956F23/MQL4/Files/", db, "_m1_rowcounts.csv"), sep = ",", col.names = F,
row.names = F, quote = F)
'end of conditional block'
}
'end of for loop'
}
'close all connection to MySQL'
odbcCloseAll()
'clear workspace'
rm(list=ls())
'end of function'
}
At this line:
data.test.res$MQLTime <- format(data.test.res$MQLTime, format = "%Y-%m-%d %H:%M:%S")
I have tried coercing MQLTime using various functions including: as.character(), as.POSIXct(), as.POSIXlt(), as.Date(), format(), as.character(as.Date())
and have also tried:
"%y" vs "%Y" and "%OS" vs "%S"
All variants seem to have no effect on the error and the function is still able to run on all other tables. I have checked the table manually (which contains almost 1500 rows) and also in MySQL looking for NULL dates or dates like "0000-00-00 00:00:00".
Also, if I run the function line by line in R terminal, this offending table is processed without any problems which just confuses the hell out me.
I've exhausted all the functions/solutions I can think of (and also all those I could find through Dr. Google) so I am pleading for help here.
I should probably mention that the MQLTime column is stored as varchar() in MySQL. This was done to try and get around issues with type conversions between R and MySQL
SHOW VARIABLES LIKE "%version%";
innodb_version, 5.6.19
protocol_version, 10
slave_type_conversions,
version, 5.6.19
version_comment, MySQL Community Server (GPL)
version_compile_machine, x86
version_compile_os, Win32
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
Edit: Str() output on the data as imported from MySQl showing MQLTime is already in POSIXct format:
> str(raw.data)
'data.frame': 1472 obs. of 8 variables:
$ MQLTime: POSIXct, format: "2014-11-05 23:35:00" "2014-11-05 23:57:00" "2014-11-06 00:40:00" "2014-11-06 00:46:00" ...
$ bar5 : int 184 203 179 28 309 24 156 48 309 437 ...
$ bar4 : int 24 184 309 192 48 177 48 68 60 71 ...
$ bar3 : int 8 204 49 60 9 64 68 27 192 147 ...
$ bar2 : int 24 67 189 49 11 152 27 56 437 67 ...
$ bar1 : int 67 51 75 152 24 188 56 147 71 0 ...
$ pat1 : int 147 147 19 147 19 19 147 19 147 19 ...
$ baXRC : Factor w/ 3 levels "Down","Flat",..: 2 2 2 2 2 2 2 2 2 3 ...
So I have tried declaring stringsAsfactors = FALSE in the dataframe operations and this had no effect.
Interestingly, if the offending table is removed from processing through an additional conditional statement in the first 'if' block, the function stops on the table immediately preceeding the blocked table.
If both the original and the new offending tables are removed from processing, then the function stops on the table immediately prior to them. I have never seen this sort of behavior before and it really has me stumped.
I watched system resources during the function and they never seem to max out.
Could this be a problem with the 'for' loop and not necessarily date formats?
There appears to be some egg on my face. The table following the table where the function was stopping had a row with value '0000-00-00 00:00:00'. I added another statement in my MySQL function to remove these rows when pre-processing the tables. Thanks to those that had a look at this.