Function where user supplies a column name and a tbl - function

I had a function that previously worked where I iterated the same operations over a user input of column names that no longer works. Here's a very simplified scenario in which a user should be able to input a data.frame and a columname and the function should return an operation done on that particular column. This was working about 1 month ago.
df = data.frame(X1 = c(1:10), Y1=letters[1:10])
df
df %>%
mutate(X2=X1 * 2)
tb.test = function(tbl, column) {
tbl2 = tbl %>%
mutate(X2 = {{column}} * 2)
return(tbl2)
}
tb.test(df, X1)
This no longer seems to be working with an error column X1 not found.

Related

How can I use the glue function to carry out the following sql query in R

I tried to use the glue function to pass a list of event Id's into the CALL for the Stored Procedure, but that did not work. I get the following Error: unexpected string constant in "df <- dbGetQuery(mydb, glue::gluesql('CALL reportProfitAndLossDetails(3,' ',{eventids_list},TRUE);'"
Any suggestions on passing the list into the call?
# query eventIds for MLB (433) and MLS (446) events and convert it into a list
eventids <- dbGetQuery(mydb, 'SELECT id
FROM Event
WHERE date >= CURDATE()
AND
(EventTypeId = 433 OR EventTypeId = 446)
ORDER BY date ASC;')
eventids_list <- paste0(eventids$id, collapse=',')
# execute reportProfitAndLossDetails for above eventids
df <- dbGetQuery(mydb, glue::gluesql('CALL reportProfitAndLossDetails(3,' ',{eventids_list},TRUE);'))

How to append new table into original table without overwriting original table?

I want to append new table to original(existing) table without overwriting original table. What is the query code for appending new table?
I have tried the following codes.
## choose one document from vector of strings
file <- x[j]
# read csv for file
doc <- read.csv(file, sep=";")
# indicate number of rows for each doc
n <- nrow(doc)
# create dataframe for doc
df <- data.frame(doc_id = numeric(n), doc_name = character(n), doc_number = character(n), stringsAsFactors = FALSE)
# loop to create df
for(k in 1:nrow(doc)){
df$doc_id[k] <- paste0(df$doc_id[k])
df$doc_name[k] <- paste0(doc$titles[k])
df$doc_number[k] <- paste0(doc$no[k])
}
# query for inserting table to mysql
query1 = sprintf('INSERT IGNORE INTO my_sql_table VALUES ("%s","%s","%s");', df[i,1],df[i,2],df[i,3])
# query for appending table to my_sql_table
query2 = sqlAppendTable(con, "doc", df)
# execute the queries
dbExecute(con, query1)
dbExecute(con, query2)
print(j)
} ## end of for loop
I also tried the following queries for appending, unfortunately, it didn't work.
INSERT IGNORE INTO my_sql_table VALUES ("%s","%s");
INSERT IGNORE INTO tableBackup(SELECT * FROM my_sql_table);
I expect to have appended new_table to original_table without deleting or overwriting the original table.

R and R-SQL API to execute SQL query

For an assignment, I need to "use SQL to extract all tweets in twitter message-table under those 3 user ids in the previous step." I am currently confused with grabbing the tweet info from MySQL using the vector,x, in R.
I keep getting this error message, "Error in .local(conn, statement, ...) :
unused argument (c(18949452, 34713362, 477583514))."
#use SQL to get a list of unique user id in twitter message table as a
#vector in R.
res <- dbSendQuery(con, statement = "select user_id from
twitter_message")
user_id <- dbFetch(res)
user_id
nrow(user_id)
#randomly selects : use R to randomly generate 3 user id
x <- user_id[sample(nrow(user_id), 3, replace = FALSE, prob = NULL),]
x
res2 = dbSendQuery(con, statement = 'SELECT twitter_message WHERE
user_id =',x)
tweets <- dbFetch(res2)
tweets
x is a vector, so you maybe you should use the dbSendQuery function in a loop. For each element in x, pass its value in your dbSendQuery statement. Does that make sense?

R loop to query all tables within a database

I am new to R and have a database, DatabaseX which contains multiple tables a,b,c,d,etc. I want to use R to count number of rows for a common attribute message_id among all these tables and store it separately.
I can count message_id for all tables using below code:
list<-dbListTables(con)
# print list of tables for testing
print(list)
for (i in 1:length(list)){
query <- paste("SELECT COUNT(message_id) FROM ",list[i], sep = "")
t <- dbGetQuery(con,query)
}
print(t)
This prints :
### COUNT(message_id)
## 1 21519
but I want to keep record of count(message_id) for each individual table. So for example table a = 200, b = 300, c = 500, and etc.
any suggestions how to do this?
As #kaiten65 suggested, one option would be to create a helper function which executes the COUNT query. Outside of your loop I have defined a numeric vector counts which will store the number of records for each table in your database. You can then perform descriptive stats on the tables along with this vector of record counts.
doCountQuery <- function(con, table) {
query <- paste("SELECT COUNT(message_id) FROM ", table, sep = "")
t <- dbGetQuery(con, query)
return(t)
}
list <- dbListTables(con)
counts <- numeric(0) # this will store the counts for all tables
for (i in 1:length(list)) {
count <- doCountQuery(con, list[i])
counts[i] <- count[[1]]
}
You can use functions to achieve what you need. For instance
readDB <- function(db, i){
query <- paste("SELECT COUNT(message_id) FROM ",db, sep = "")
t <- dbGetQuery(con,query)
return(print(paste("Table ", i, " count:", t)
}
list<-dbListTables(con)
for (i in 1:length(list)){
readDB(list[i]);
}
This should print your list recursively but the actual code is in a nice editable function. Your output will be
"Table 1 count: 2519"
"Table 2 count: ---- "
More information on R functions here: http://www.statmethods.net/management/userfunctions.html

In is(object, Cl) : error while fetching rows R

I have a MySQL table I am attempting to access with R using RMySQL.
There are 1690004 rows that should be returned from
dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'")
Unfortunately, I receive the following warning messages:
In is(object, Cl) : error while fetching row
In dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'", : pending rows
And only receive ~400K rows.
If I break the query into several "fetches" using dbSendQuery, the warning messages start appearing after ~400K rows are received.
Any help would be appreciated.
So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.
To get 1.6M rows I did the following:
library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
print(nextseries) # print the row it's on
con <- MySQLConnect()
d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
dbDisconnect(con)
# bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
if(nrow(tosave)>0){
tosave <- rbind(tosave, d1)
}else{
tosave <- d1
}
}