I am new to R and have a database, DatabaseX which contains multiple tables a,b,c,d,etc. I want to use R to count number of rows for a common attribute message_id among all these tables and store it separately.
I can count message_id for all tables using below code:
list<-dbListTables(con)
# print list of tables for testing
print(list)
for (i in 1:length(list)){
query <- paste("SELECT COUNT(message_id) FROM ",list[i], sep = "")
t <- dbGetQuery(con,query)
}
print(t)
This prints :
### COUNT(message_id)
## 1 21519
but I want to keep record of count(message_id) for each individual table. So for example table a = 200, b = 300, c = 500, and etc.
any suggestions how to do this?
As #kaiten65 suggested, one option would be to create a helper function which executes the COUNT query. Outside of your loop I have defined a numeric vector counts which will store the number of records for each table in your database. You can then perform descriptive stats on the tables along with this vector of record counts.
doCountQuery <- function(con, table) {
query <- paste("SELECT COUNT(message_id) FROM ", table, sep = "")
t <- dbGetQuery(con, query)
return(t)
}
list <- dbListTables(con)
counts <- numeric(0) # this will store the counts for all tables
for (i in 1:length(list)) {
count <- doCountQuery(con, list[i])
counts[i] <- count[[1]]
}
You can use functions to achieve what you need. For instance
readDB <- function(db, i){
query <- paste("SELECT COUNT(message_id) FROM ",db, sep = "")
t <- dbGetQuery(con,query)
return(print(paste("Table ", i, " count:", t)
}
list<-dbListTables(con)
for (i in 1:length(list)){
readDB(list[i]);
}
This should print your list recursively but the actual code is in a nice editable function. Your output will be
"Table 1 count: 2519"
"Table 2 count: ---- "
More information on R functions here: http://www.statmethods.net/management/userfunctions.html
Related
I had a function that previously worked where I iterated the same operations over a user input of column names that no longer works. Here's a very simplified scenario in which a user should be able to input a data.frame and a columname and the function should return an operation done on that particular column. This was working about 1 month ago.
df = data.frame(X1 = c(1:10), Y1=letters[1:10])
df
df %>%
mutate(X2=X1 * 2)
tb.test = function(tbl, column) {
tbl2 = tbl %>%
mutate(X2 = {{column}} * 2)
return(tbl2)
}
tb.test(df, X1)
This no longer seems to be working with an error column X1 not found.
I want to append new table to original(existing) table without overwriting original table. What is the query code for appending new table?
I have tried the following codes.
## choose one document from vector of strings
file <- x[j]
# read csv for file
doc <- read.csv(file, sep=";")
# indicate number of rows for each doc
n <- nrow(doc)
# create dataframe for doc
df <- data.frame(doc_id = numeric(n), doc_name = character(n), doc_number = character(n), stringsAsFactors = FALSE)
# loop to create df
for(k in 1:nrow(doc)){
df$doc_id[k] <- paste0(df$doc_id[k])
df$doc_name[k] <- paste0(doc$titles[k])
df$doc_number[k] <- paste0(doc$no[k])
}
# query for inserting table to mysql
query1 = sprintf('INSERT IGNORE INTO my_sql_table VALUES ("%s","%s","%s");', df[i,1],df[i,2],df[i,3])
# query for appending table to my_sql_table
query2 = sqlAppendTable(con, "doc", df)
# execute the queries
dbExecute(con, query1)
dbExecute(con, query2)
print(j)
} ## end of for loop
I also tried the following queries for appending, unfortunately, it didn't work.
INSERT IGNORE INTO my_sql_table VALUES ("%s","%s");
INSERT IGNORE INTO tableBackup(SELECT * FROM my_sql_table);
I expect to have appended new_table to original_table without deleting or overwriting the original table.
I have a sql database with one column of ID #'s and a another column that has the corresponding info of each ID. I have a vector that contains the ID #'s that I want the corresponding info to. How do I query only those specific ID's while also getting there corresponding information, and store it into a table?
I've tried for loops, I've tried filtering, and hard coding it.
con <- dbConnect(RSQLite::SQLite(), "data.db")
df<- tbl(con,"kv")
newish <- data.frame(df)
filter(person %in% IDs) %>%
collect()
After connecting call all the ID's in the vector given and extract the corresponding information and store it into a table
If I tired a for loop the table would not print all of the information, but rather only the information of the last ID in the vector, the filtering wouldn't work because it suggested that the vector only had to be of vector length one instead of 90,000. The actual results should be a table that contains only the patient ID #'s and the corresponding information of the people who I have in the vector.
This is a sql problem that you can decompose with R as follows :
# not run
# con <- dbConnect(RSQLite::SQLite(), "data.db")
ids <- paste0("id_", sample(1:100, 10))
id_var <- "id_var" # you can put more ids here with a corresponding data.frame in ids
vars <- paste0("var", 1:10)
db_name <- "mydatabase"
tab_name <- "mytable"
whereq <- paste("where", id_var, "in", paste0("('", paste0(ids, collapse = "', '"), "')") )
rqt <- paste("select", paste(vars, collapse = ", "),
"from", paste0(db_name, ".", tab_name), whereq, ";")
# check the query
rqt
#> [1] "select var1, var2, var3, var4, var5, var6, var7, var8, var9, var10 from mydatabase.mytable where id_var in ('id_99', 'id_1', 'id_72', 'id_86', 'id_65', 'id_59', 'id_67', 'id_4', 'id_82', 'id_2') ;"
# to uncomment
# res <- DBI::dbGetQuery(con, rqt)
# res <- tibble::as_tibble(res) # OR data.table::data.table(res) OR as.data.frame(res)
Created on 2019-06-11 by the reprex package (v0.2.1)
For an assignment, I need to "use SQL to extract all tweets in twitter message-table under those 3 user ids in the previous step." I am currently confused with grabbing the tweet info from MySQL using the vector,x, in R.
I keep getting this error message, "Error in .local(conn, statement, ...) :
unused argument (c(18949452, 34713362, 477583514))."
#use SQL to get a list of unique user id in twitter message table as a
#vector in R.
res <- dbSendQuery(con, statement = "select user_id from
twitter_message")
user_id <- dbFetch(res)
user_id
nrow(user_id)
#randomly selects : use R to randomly generate 3 user id
x <- user_id[sample(nrow(user_id), 3, replace = FALSE, prob = NULL),]
x
res2 = dbSendQuery(con, statement = 'SELECT twitter_message WHERE
user_id =',x)
tweets <- dbFetch(res2)
tweets
x is a vector, so you maybe you should use the dbSendQuery function in a loop. For each element in x, pass its value in your dbSendQuery statement. Does that make sense?
I have a MySQL table I am attempting to access with R using RMySQL.
There are 1690004 rows that should be returned from
dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'")
Unfortunately, I receive the following warning messages:
In is(object, Cl) : error while fetching row
In dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'", : pending rows
And only receive ~400K rows.
If I break the query into several "fetches" using dbSendQuery, the warning messages start appearing after ~400K rows are received.
Any help would be appreciated.
So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.
To get 1.6M rows I did the following:
library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
print(nextseries) # print the row it's on
con <- MySQLConnect()
d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
dbDisconnect(con)
# bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
if(nrow(tosave)>0){
tosave <- rbind(tosave, d1)
}else{
tosave <- d1
}
}