How to join table using a pool connection? - mysql

I am trying to use pool to join two remote tables (City and Country) using below code:
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
src_pool(pool) %>%
tbl('City') %>%
left_join('Country', by=c('CountryCode'='Code'))
But this is the error I get when run the code:
Error: x and y don't share the same src.
Set copy = TRUE to copy y into x's source (this may be time consuming).
In addition: Warning message:
In force(expr) : You have a leaked pooled object. Destroying it.
Below a working example of the same query using dplyr:
srccon <- src_mysql(
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
dbname = "shinydemo",
user = "guest",
password = "guest"
)
tbl(srccon, 'City') %>%
left_join(tbl(srccon, 'Country'), by=c('CountryCode'='Code'))
And another example using pool::dbGetQuery
sql <- "SELECT * FROM City LEFT JOIN Country ON (CountryCode=Code)"
dbGetQuery(pool, sql)

I'm not familiar with pool, but the error message:
Error: x and y don't share the same src.
Set copy = TRUE to copy y into x's source (this may be time consuming).
is due to the fact that one of your tables is being manipulated in-database (I'd guess City) and the other is in memory/not in the database ('Country'), but it could be the other way around. In this instance, x and y are your two tables, and the error message is just telling you they're not bothin the same place. By setting copy = TRUE the tibble in y will be (temporarily) copied into wherever x is stored (e.g. either to the database or into RAM) in order to perform the computation.
You could try either bringing both objects into memory using dplyr::collect() which will mean that the join happens in memory; or ensure that both are on your database (in which case the join will occur there instead).
i.e.
# Keep both in RAM
City <- tbl('City') %>% collect()
Country <- tbl('Country') %>% collect()
joined <- City %>% left_join(Country, by = c('CountryCode' = 'Code))
I would guess that the Warning about a leaked pooled object relates to the same issue (one table is in-memory, one is on the database).

Related

RMySQL: Error in as.character.default() :

I am trying to use the following function in R:
heritblup <- function(name) {
library(RMySQL)
library(DBI)
con <- dbConnect(RMySQL::MySQL(),
dbname ="mydab",
host = "localhost",
port = 3306,
user = "root",
password = "")
value1 <- 23;
rss<- paste0 ("INSERT INTO namestable
(myvalue, person)
VALUES ('$value1', '",name,"')")
rs <<- dbGetQuery (con, rss)
}
heritblup("Tommy")
But I keep getting this error:
Error in as.character.default()
: no method for coercing this S4 class to a vector Called from:
as.character.default()
I tried to change the paste function to this:
rss<- paste0 ("INSERT INTO namestable
(myvalue, person)
VALUES ($value1, ",name,")")
the error persists;
I have no idea whats wrong.
Please help
Couple of issues in code. I'm not sure if OP is attempting to insert records in database or fetch from database.
Assuming, based on query that he is expecting to insert data in database table.
The rule is that query should be prepared in R the way it will be executed in MySQL. Value replacement (if any) should be performed in R as MySQL engine will not have any idea about variables from R.
Hence, the query preparation steps should be done as:
rss <- sprintf("INSERT INTO namestable (myvalue, person) VALUES (%d, '%s')", value1, name)
# "INSERT INTO namestable (myvalue, person) VALUES (23, 'test')"
If data insert is goal then dbGetQuery is not right option per R documentation instead dbSendStatement() should be used for data manipulation. The reference from help suggest:
However, callers are strongly encouraged to use dbSendStatement() for
data manipulation statements.
Based on that query execution line should be:
rs <- dbSendStatement(con, rss)
ret_val <- dbGetRowsAffected(rs)
dbClearResult(rs)
dbDisconnect(con)
return(ret_val)

How to stop large float numbers from being output in exponential form (scientific notation) in R?

I'm working with ID numbers such as: "868130240945684480", which are stored into a MySQL database.
Whenever I make a query with RMySQL, the output is as follows: ""8.681302e+17" and then R reads it as "868130200000000000", which is a totally different ID.
Is there anyway to avoid this?
Here is the query I make to retrieve my data:
require(RMySQL)
con <- dbConnect(MySQL(), user='xxx', password='xxx', dbname='xxx', host='xxx')
rs <- dbSendQuery(con, "select sprinklr_twitter.Tweet_Id from sprinklr_twitter
inner join twitter
on sprinklr_twitter.Tweet_id=twitter.Tweet_id")
Tweet_id<-fetch(rs, n=-1)
Tweet_id<-as.data.frame(Tweet_id)

Multiple DB connection in R

I was wondering if someone could help with this annoying issue.
I'm trying to create/make multiple connections to different database.
I have a data.frame with 3 connection credentials named conf - It works if I manually enter the connections variable like so:
conn <- dbConnect(MySQL(), user=conf$user, password=conf$passws, host=conf$host, dbname=conf$db)
which ends up creating a single connection.
However, what I want is to be able to refer to the connection as:
conf$conn <- dbConnect(MySQL(), user=conf$user, password=conf$passws, host=conf$host, dbname=conf$db)
here is the error message I'm getting.
Error in rep(value, length.out = nrows) :
attempt to replicate an object of type 'S4'
I think the problem is how I'm adding conf$conn
I used a combination of the pool and config package to solve a similar problem to set up a number of simultaneous PostgreSQL connections. Note that this solution needs a config.yml file with the connection properties for db1 and db2.
library(pool)
library(RPostgreSQL)
connect <- function(cfg) {
config <- config::get(config = cfg)
dbPool(
drv = dbDriver("PostgreSQL", max.con = 100),
dbname = config$dbname,
host = config$host,
port = config$port,
user = config$user,
password = config$password
)
}
conn <- lapply(c("db1", "db2"), connect)

How to extract create statements from different tables of MySQL DBs?

I would like to extract all Create Statements in my 50 MySQL Databases via SHOW CREATE TABLE db.table or SHOW CREATE TABLE db1.mytableor SHOW CREATE TABLE db2.sometableor SHOW CREATE TABLE db3.mytable1. Thus each of the DBs has some tables inside db1(table,mytable...) db2(table1,sometable) and so on
To illustrate the DBs via a example query:
SELECT *
FROM db.table1 m
LEFT JOIN db1.sometable o ON m.id = o.id
LEFT JOIN db2.sometables t ON p.id=t.id
LEFT JOIN db3.sometable s ON s.column='john'
library(RMySQL)
library(DBI)
con <- dbConnect(RMySQL::MySQL(),
username = "",
password = "",
host = "",
port = 3306,
dbname= mydbname)# when using dbs<-dbGetQuery(con ,"SHOW DATABASES") I have to ## dbname= mydbname## to get all DBs
Using dbs<-dbGetQuery(con ,"SHOW DATABASES")I can extract all 50 Databases in the dbConnection as character vector. I would like loop over each DB in the dbsand apply SHOW CREATE TABLE to each row/db. I suppose I have to parse the each row/db into dbname= mydbnameand dbs<-dbGetQuery(con ,"SHOW CREATE TABLE"). But I just cant figure out how to make the loops
I tried:
apply(dbs, 1, function(row) {
dbname <- row[]
for (i in 1:length(dbname)) {
create<-dbGetQuery(con,"SHOW CREATE TABLE") }
})
But that doesnt seem right. I suppose I have to include the con into the loop somehow. Otherwise I'll get:
Error in .local(drv, ...) : object 'dbname' not found
So I tried:
apply(dbs, 1, function(row) {
dbname <- row[]
for (i in 1:length(dbname)) {
con <- dbConnect(RMySQL::MySQL(),
username = "",
password = "",
host = "",
port = 3306,
dbname= [i])
create<-dbGetQuery(con,"SHOW CREATE TABLE") }})
I suppose that comes close to the solution but I miss something:
dbs<-dbGetQuery(con,"show databases")
library(foreach)
foreach(i = 1:(length(dbs))%dopar%{
query<-paste("SHOW CREATE TABLE",dbs[i])
creates<-dbGetQuery(con,query)
})
Consider this approach of importing a data frame of each database (leaving out the system ones, INFORMATION_SCHEMA and MYSQL) and their corresponding tables. Then, run SHOW CREATE TABLE statements. Finally, merge the original dataframe with binded dataframe of create statements.
Now, the one caveat is tables that repeat names across databases. To return distinct values of such combinations, the aggregate() by head function is used.
con <- dbConnect(RMySQL::MySQL(),
username = "****", password = "****",
host = "****", port = 3306,
dbname= "****")
dbtbls <- dbGetQuery(con, "SELECT `TABLE_SCHEMA` AS `Database`,
`TABLE_NAME` AS `Table`
FROM `INFORMATION_SCHEMA`.`TABLES`
WHERE `TABLE_TYPE` = 'BASE TABLE'
AND `TABLE_SCHEMA` NOT LIKE '%SCHEMA%'
AND `TABLE_SCHEMA` NOT LIKE '%MYSQL%' ")
# LIST OF SQL STATEMENTS
sql <- paste0("SHOW CREATE TABLE ", dbtbls$Database, ".", dbtbls$Table)
# LIST OF DATAFRAMES
createstmts <- lapply(sql, function(x) dbGetQuery(con, x))
dbDisconnect(con)
# ROW BIND LIST INTO ONE DATAFRAME TO MERGE WITH ORIGINAL
stmtsdf <- do.call(rbind, createstmts)
finaldf <- merge(dbtbls, stmtsdf, by='Table')
# RETURN DISTINCT RECORDS
finaldf <- aggregate(.~Database+Table, finaldf, FUN=head, 1)
mysqldump --no-data
does exactly what you are asking for. (There may be other parameters desirable to avoid/include CREATE DATABASE, etc.)
If the requirement is to subsequently pull the CREATEs into R, then I ask whether this is a one-time task, or a recurring task. For one-time, I would suggest that, overall, the mysqldump approach might be simpler.
First, you can just simply use
for (i in 1:length(dbs)) { }
Or you can look into apply functions, particularly, sapply. There you can do parsing per dbConnection string, connect and get all tables as list or vector. Then you can loop inside those to get create table statements.
So, it is basically apply inside apply.
For a good explanation of apply functions, you can look into http://www.r-bloggers.com/using-apply-sapply-lapply-in-r/

RMySQL dbWriteTable adding columns to table (dynamically?)

I just started using the R package called RMySQL in order to get around some memory limitations on my computer. I am trying to take a matrix with 100 columns in R (called data.df), then make a new table on an SQL database that has "100 choose 2" (=4950) columns, where each column is a linear combination of two columns from the initial matrix. So far I have something like this:
countnumber <- 1
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "myDB")
temp <- as.data.frame(data.df[,1] - data.df[,2])
colnames(temp) <- paste(pairs[[countnumber]][1], pairs[[countnumber]][2], sep = "")
dbWriteTable(con, "spreadtable", temp, row.names=T, overwrite = T)
for(i in 1:(n-1)){
for(j in (i+1):n){
if(!((i==1)&&(j==2))){ #this part excludes the first iteration already taken care of
temp <- as.data.frame(data.df[,i] - data.df[,j])
colnames(temp) <- "hola"
dbWriteTable(con, "spreadtable", value = temp, append = TRUE, overwrite = FALSE, row.names = FALSE)
countnumber <- countnumber + 1
}
}
}
I've also tried toying around with the "field.types" argument of RMySQL::dbWriteTable(), which was suggested at RMySQL dbWriteTable with field.types. Sadly it hasn't helped me out too much.
Questions:
Is making your own sql database a valid solution to the memory-bound nature of R, even if it has 4950 columns?
Is the dbWriteTable() the proper function to be using here?
Assuming the answer is "yes" to both of the previous questions...why isn't this working?
Thanks for any help.
[EDIT]: code with error output:
names <- as.data.frame(index)
names <- t(names)
#dim(names) is 1 409
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "taylordatabase")
dbGetQuery(con, dbBuildTableDefinition(MySQL(), name="spreadtable", obj=names, row.names = F))
#I would prefer these to be double types with 8 decimal spaces instead of text
#dim(temp) is 1 409
temp <- as.data.frame(data.df[,1] - (ratios[countnumber]*data.df[,2]))
temp <- t(temp)
temp <- as.data.frame(temp)
dbWriteTable(con, name = "spreadtable", temp, append = T)
The table is created successfully in the database (I will change variable type later), but the dbWriteTable() line produces the error:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'row_names' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
If I make a slight change, I get a different error message:
dbWriteTable(con, name = "spreadtable", temp, append = T, row.names = F)
and
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'X2011_01_03' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
I just want to use "names" as a bunch of column labels. They were initially dates. The actual data I would like to be "temp."
Having a query with 4950 rows is ok, the problem is that what columns you need.
If you always "select * ", you will eventually exhaust all you system memory (in the case that the table has 100 columns)
Why not give us some error message if you have encountered any problems ?