i have three variables a,b,c (Actually more than 300 variables in my case)
t<-c(a,b,d)
a<-dbGetQuery(con, "SELECT * FROM a")
b<-dbGetQuery(con, "SELECT * FROM b")
d<-dbGetQuery(con, "SELECT * FROM d")
How can I make a loop to request data from MySQL in R? The existing question does not have the explanation on how to write it into the variable names. I need a,b,c in my environment.
Not tested, but something as below should work.
myTables <- c("a","b","c")
res <- lapply(myTables,
function(myTable){
sqlStatement <- paste("select * from",myTable)
dbGetQuery(con, sqlStatement)
})
names(res) <- myTables
Related
I have a sql database with one column of ID #'s and a another column that has the corresponding info of each ID. I have a vector that contains the ID #'s that I want the corresponding info to. How do I query only those specific ID's while also getting there corresponding information, and store it into a table?
I've tried for loops, I've tried filtering, and hard coding it.
con <- dbConnect(RSQLite::SQLite(), "data.db")
df<- tbl(con,"kv")
newish <- data.frame(df)
filter(person %in% IDs) %>%
collect()
After connecting call all the ID's in the vector given and extract the corresponding information and store it into a table
If I tired a for loop the table would not print all of the information, but rather only the information of the last ID in the vector, the filtering wouldn't work because it suggested that the vector only had to be of vector length one instead of 90,000. The actual results should be a table that contains only the patient ID #'s and the corresponding information of the people who I have in the vector.
This is a sql problem that you can decompose with R as follows :
# not run
# con <- dbConnect(RSQLite::SQLite(), "data.db")
ids <- paste0("id_", sample(1:100, 10))
id_var <- "id_var" # you can put more ids here with a corresponding data.frame in ids
vars <- paste0("var", 1:10)
db_name <- "mydatabase"
tab_name <- "mytable"
whereq <- paste("where", id_var, "in", paste0("('", paste0(ids, collapse = "', '"), "')") )
rqt <- paste("select", paste(vars, collapse = ", "),
"from", paste0(db_name, ".", tab_name), whereq, ";")
# check the query
rqt
#> [1] "select var1, var2, var3, var4, var5, var6, var7, var8, var9, var10 from mydatabase.mytable where id_var in ('id_99', 'id_1', 'id_72', 'id_86', 'id_65', 'id_59', 'id_67', 'id_4', 'id_82', 'id_2') ;"
# to uncomment
# res <- DBI::dbGetQuery(con, rqt)
# res <- tibble::as_tibble(res) # OR data.table::data.table(res) OR as.data.frame(res)
Created on 2019-06-11 by the reprex package (v0.2.1)
For an assignment, I need to "use SQL to extract all tweets in twitter message-table under those 3 user ids in the previous step." I am currently confused with grabbing the tweet info from MySQL using the vector,x, in R.
I keep getting this error message, "Error in .local(conn, statement, ...) :
unused argument (c(18949452, 34713362, 477583514))."
#use SQL to get a list of unique user id in twitter message table as a
#vector in R.
res <- dbSendQuery(con, statement = "select user_id from
twitter_message")
user_id <- dbFetch(res)
user_id
nrow(user_id)
#randomly selects : use R to randomly generate 3 user id
x <- user_id[sample(nrow(user_id), 3, replace = FALSE, prob = NULL),]
x
res2 = dbSendQuery(con, statement = 'SELECT twitter_message WHERE
user_id =',x)
tweets <- dbFetch(res2)
tweets
x is a vector, so you maybe you should use the dbSendQuery function in a loop. For each element in x, pass its value in your dbSendQuery statement. Does that make sense?
I am new to R and have a database, DatabaseX which contains multiple tables a,b,c,d,etc. I want to use R to count number of rows for a common attribute message_id among all these tables and store it separately.
I can count message_id for all tables using below code:
list<-dbListTables(con)
# print list of tables for testing
print(list)
for (i in 1:length(list)){
query <- paste("SELECT COUNT(message_id) FROM ",list[i], sep = "")
t <- dbGetQuery(con,query)
}
print(t)
This prints :
### COUNT(message_id)
## 1 21519
but I want to keep record of count(message_id) for each individual table. So for example table a = 200, b = 300, c = 500, and etc.
any suggestions how to do this?
As #kaiten65 suggested, one option would be to create a helper function which executes the COUNT query. Outside of your loop I have defined a numeric vector counts which will store the number of records for each table in your database. You can then perform descriptive stats on the tables along with this vector of record counts.
doCountQuery <- function(con, table) {
query <- paste("SELECT COUNT(message_id) FROM ", table, sep = "")
t <- dbGetQuery(con, query)
return(t)
}
list <- dbListTables(con)
counts <- numeric(0) # this will store the counts for all tables
for (i in 1:length(list)) {
count <- doCountQuery(con, list[i])
counts[i] <- count[[1]]
}
You can use functions to achieve what you need. For instance
readDB <- function(db, i){
query <- paste("SELECT COUNT(message_id) FROM ",db, sep = "")
t <- dbGetQuery(con,query)
return(print(paste("Table ", i, " count:", t)
}
list<-dbListTables(con)
for (i in 1:length(list)){
readDB(list[i]);
}
This should print your list recursively but the actual code is in a nice editable function. Your output will be
"Table 1 count: 2519"
"Table 2 count: ---- "
More information on R functions here: http://www.statmethods.net/management/userfunctions.html
I have a MySQL table I am attempting to access with R using RMySQL.
There are 1690004 rows that should be returned from
dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'")
Unfortunately, I receive the following warning messages:
In is(object, Cl) : error while fetching row
In dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'", : pending rows
And only receive ~400K rows.
If I break the query into several "fetches" using dbSendQuery, the warning messages start appearing after ~400K rows are received.
Any help would be appreciated.
So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.
To get 1.6M rows I did the following:
library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
print(nextseries) # print the row it's on
con <- MySQLConnect()
d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
dbDisconnect(con)
# bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
if(nrow(tosave)>0){
tosave <- rbind(tosave, d1)
}else{
tosave <- d1
}
}
This seems ridiculous, but I just can't get this right - any help much appreciated please!
Basically: I'm using RMySQL to do some simple SQL, in order to get my head around how SQL works. I'd like to chain together a few SQL select queries, as a simple example. This is covered in the RMySQL PDF - but the example therein seems to be the incorrect syntax (http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf , page 3, example 6).
If I have three queries, say like this:
q1 <- "SELECT db.table FROM table WHERE stuff = 'blah' "
q2 <- "SELECT db.other_table FROM other_table WHERE stuff = 'different blah' "
q3 <- "SELECT db.table2 FROM table2 WHERE table2 = 1000"
and try to paste them as follows:
script <- paste(q1, q2, q3, sep=";")
the result is
> script
[1] "SELECT db.table FROM table WHERE stuff = 'blah' ;SELECT fb.other_table FROM
other_table WHERE stuff = 'different blah' ;SELECT db.table2 FROM table2 WHERE table2 =
'1000'
and so invoking dbSendQuery clearly fails.
I've tried \", but this also doesn't work:
q1 <- "SELECT db.table FROM table WHERE stuff = 'blah' \" "
q2 <- "SELECT db.other_table FROM other_table WHERE stuff = 'different blah' \""
q3 <- "SELECT db.table2 FROM table2 WHERE table2 = 1000 \" "
script <- paste(q1, q2, q3, sep=";")
> script
[1] "SELECT db.table FROM table WHERE stuff = 'blah' \" ; ;SELECT db.other_table FROM
other_table WHERE stuff = 'different blah' \";SELECT db.table2 FROM table2 WHERE table2
= 1000 \" "
Can anyone please point out what I'm doing wrong?
EDIT: just for clarification, executing this via RMySQL as follows:
my.queries <- dbGetQuery(my.con, script, client.flag = CLIENT_MULTI_STATEMENTS)
as per the RMySQL manual, I get
RS-DBI driver: (could not run statement: You have an error in your SQL syntax;
Presumably, this is because the result of the paste function should be:
"SELECT db.table FROM table WHERE stuff = 'blah'" ;"SELECT fb.other_table FROM
other_table WHERE stuff = 'different blah'" ;"SELECT db.table2 FROM table2 WHERE table2
= '1000'"
Each of the individual queries works just fine, so I'm assuming that it's my paste command that's causing the issue.
EDIT: to simplify this: suppose I have two strings, as follows:
t1 <- "the 'stuff'"
t2 <- "more 'stuff'"
paste(t1, t2, sep=";")
[1] "the 'stuff' ; more 'stuff' "
what I'd like is for the result of the paste command to be "the 'stuff'";"more 'stuff'".
You have to pass the argument client.flag = CLIENT_MULTI_STATEMENTS to the function dbConnection, not to dgGetQuery.
Then, your first approach should work:
q1 <- "SELECT db.table FROM table WHERE stuff = 'blah' "
q2 <- "SELECT db.other_table FROM other_table WHERE stuff = 'different blah' "
q3 <- "SELECT db.table2 FROM table2 WHERE table2 = 1000"
script <- paste(q1, q2, q3, sep=";")