Connection lost when connect R to MySQL using collect - mysql

I would like to use dplyr and RMySQL to work with my large data. There is no issues with dplyr code. The problem (I think) is about exporting data out of MySQL to R. My connection is dropped every time even I am using n=Inf in collect. Theoretically, my data should have more than 50K rows, but I can only get around 15K back. Any suggestions are appreciated.
Approach 1
library(dplyr)
library(RMySQL)
# Connect to a database and select a table
my_db <- src_mysql(dbname='aermod_1', host = "localhost", user = "root", password = "")
my_tbl <- tbl(my_db, "db_table")
out_summary_station_raw <- select(my_tbl, -c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw)
Approach 2: using Pool
library(pool)
library(RMySQL)
library(dplyr)
pool <- dbPool(
drv = RMySQL::MySQL(),
dbname = "aermod_1",
host = "localhost",
username = "root",
password = ""
)
out_summary_station_raw <- src_pool(pool) %>% tbl("aermod_final") %>% select(-c(X, Y, AVERAGE_CONC))
out_station_mean_local <- collect(out_summary_station_raw, n = Inf)
Warning message (both approaches):
Warning messages:
1: In dbFetch(res, n) : error while fetching rows
2: Only first 15,549 results retrieved. Use n = Inf to retrieve all.
Update:
Checked log and it looks fine from the server side. For my example, the slow-log said Query_time: 79.348351 Lock_time: 0.000000 Rows_sent: 15552 Rows_examined: 16449696, but collect just could not retrieve the full data. I Am able to replicate the same move using MySQL Bench.

As discussed here
https://github.com/tidyverse/dplyr/issues/1968,
https://github.com/tidyverse/dplyr/blob/addb214812f2f45f189ad2061c96ea7920e4db7f/NEWS.md and https://github.com/tidyverse/dplyr/commit/addb214812f2f45f189ad2061c96ea7920e4db7fthis package issue seems to be addresed.
What version of the dplyr package are you using?

After the most recent RMySQL update, I noticed that I could not collect() data from large views, and I reported it as an issue. Your problem might be related.
One thing to try is to roll back to the last version.
devtools::install_version("RMySQL", version = "0.10.9",
repos = "http://cran.us.r-project.org")

Related

How to manipulate/clean data located in a MySQL database using base R commands?

I've connected to a MySQL database using the RMariaDB package, and, thanks to the dbplyr package, am able to adjust the data using dplyr commands directly from R studio. However, there are some basic things I want to do that require base R functions (there are no equivalents in dplyr to my knowledge). Is there a way to clean this data using base R commands? Thanks in advance.
The answer to this arises from how the dbplyr package works. dbplyr translates certain dplyr commands into SQL. For example:
library(dplyr)
library(dbplyr)
data(mtcars)
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre %>% head(5)
# view SQL transaltion
first_five %>% show_query()
# resulting SQL translation
<SQL>
SELECT *
FROM `df`
LIMIT 5
The major constrain for this approach is that dbplyr can only translate certain commands into SQL. So something like the following will fail:
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre[1:5,]
While head(df, 5) and df[1:5,] produce identical output for data.frames in local R memory, dbplyr can not translate developer intention, only specific dplyr commands. Hence these two commands are very different when working with database tables.
The other element to consider here is that databases are primarily read-only. In R we can do
df = df %>%
mutate(new_var = 2*old_var)
and this changes the data held in memory. However in databases, the original data is stored in the database and it is transformed based on your instructions when it is requested. There are ways to write completely new database tables from existing database tables - there are already several Q&A on this under the dbplyr tag.

How to run RMySQL in shinyapps (working fine locally)

I have a weird problem:
Using RMySQL from Shiny (running locally) I have no problem to retrieve data from MySQL database (small table, few rows only). But once the app is deployed (shinyapps.io) the query result contains zero rows (but column names are fine). Looking at the shinyapps.io log:
Warning in dbFetch(rs, n = n, ...) : error while fetching rows
What I am doing wrong? The exact same thing was working before and now I can't make it running. MySQL connection seems fine.
library(shiny)
library(DBI)
ui <- fluidPage(
numericInput("nrows", "Enter the number of rows to display:", 5),
tableOutput("tbl")
)
server <- function(input, output, session) {
output$tbl <- renderTable({
conn <- dbConnect(
drv = RMySQL::MySQL(),
dbname = "***",
host = "***",
username = "***",
password = "***")
on.exit(dbDisconnect(conn), add = TRUE)
dbGetQuery(conn, paste0(
"SELECT * FROM datasets LIMIT ", input$nrows, ";"))
})
}
shinyApp(ui, server)
EDIT:
When I use Shiny dummy database (from this example) it is working fine, so looks like some problem with MySQL but can't figure it out what... Any ideas?
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest")
EDIT2
I tried everything. Create new table, new database (same hosting though), different shinyapps account, fresh R installation with all updated packages, still the same problem. When app is running locally, everything is fine. But from shinyapps - error and zero results (except colnames).
Ok, I have no idea why, but looks like changing table engine fix the issue
ALTER TABLE table_name ENGINE = InnoDB

How to detect the type of data to create MySql table from R

I have a large data with 100+ columns and I want to decide what type of data is which column so I can create a MySQL table according to that data.
RMariadb is the answer : https://rmariadb.r-dbi.org/
With minimal reprex :
library(DBI)
# Connect to my-db as defined in ~/.my.cnf
con <- dbConnect(RMariaDB::MariaDB(), group = "my-db")
dbListTables(con)
dbWriteTable(con, "mtcars", mtcars)
Generally speaking, for connection you can look there : https://db.rstudio.com/databases/

Multiple (simultaneous) users in R Shiny with access to MySQL database

I have a Shiny application (hosted on shinyapps.io) that records a user's click of certain actionButtons to a MySQL database. I'd love some advice on a few things:
where to put the dbConnect code (i.e. inside or outside the shinyServer function)
when to close the connection (as I was running into the problem of too many open connections)
Each addition to the database just adds a new row, so users aren't accessing and modifying the same elements. The reason I ask this is I was running into problem of multiple users not being able to use the app at the same time (with the error "Disconnected from server") and I wasn't sure if it was from the MySQL connections.
Thank you!
Someone in the comments posted about the pool package, which serves this exact purpose! Here's the relevant parts of my server.R code:
library(shiny)
library(RMySQL)
library(pool)
pool <- dbPool(
drv = RMySQL::MySQL(),
user='username',
password='password',
dbname='words',
host='blahblahblah')
shinyServer(function(input, output) {
## function to write to databse
writeToDB <- function(word, vote){
query <- paste("INSERT INTO word_votes (vote, word) VALUES (", vote, ", '", word, "');", sep="")
conn <- poolCheckout(pool)
dbSendQuery(conn, query)
conn <- poolReturn(conn)
## rest of code
}
I added the poolCheckout and poolReturn to run successfully and prevent leaks.

Closing active connections using RMySQL

As per my question earlier today, I suspect I have an issue with unclosed connections that is blocking data from being injected into my MySQL database. Data is being allowed into tables that are not currently being used (hence I suspect many open connections preventing uploading into that particular table).
I am using RMySQL on Ubuntu servers to upload data onto a MySQL database.
I'm looking for a way to a) determine if connections are open b) close them if they are. The command exec sp_who and exec sp_who2 from the SQL command line returns an SQL code error.
Another note: I am able to connect, complete the uploading process, and end the R process successfully, and there is no data on the server (checked via the SQL command line) when I try only that table.
(By the way,: If all else fails, would simply deleting the table and creating a new one with the same name fix it? It would be quite a pain, but doable.)
a. dbListConnections( dbDriver( drv = "MySQL"))
b. dbDisconnect( dbListConnections( dbDriver( drv = "MySQL"))[[index of MySQLConnection you want to close]]). To close all: lapply( dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
Yes, you could just rewrite the table, of course you would lose all data. Or you can specify dbWriteTable(, ..., overwrite = TRUE).
I would also play with the other options, like row.names, header, field.types, quote, sep, eol. I've had a lot of weird behavior in RMySQL as well. I can't remember specifics, but it seems like I've had no error message when I had done something wrong, like forget to set row.names. HTH
Close all active connections:
dbDisconnectAll <- function(){
ile <- length(dbListConnections(MySQL()) )
lapply( dbListConnections(MySQL()), function(x) dbDisconnect(x) )
cat(sprintf("%s connection(s) closed.\n", ile))
}
executing:
dbDisconnectAll()
Simplest:
lapply(dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
List all connections and disconnect them by lapply
Closing a connection
You can use dbDisconnect() together with dbListConnections() to disconnect those connections RMySQL is managing:
all_cons <- dbListConnections(MySQL())
for(con in all_cons)
dbDisconnect(con)
Check all connections have been closed
dbListConnections(MySQL())
You could also kill any connection you're allowed to (not just those managed by RMySQL):
dbGetQuery(mydb, "show processlist")
Where mydb is..
mydb = dbConnect(MySQL(), user='user_id', password='password',
dbname='db_name', host='host')
Close a particular connection
dbGetQuery(mydb, "kill 2")
dbGetQuery(mydb, "kill 5")
lapply(dbListConnections(MySQL()), dbDisconnect)
In current releases the "dbListConnections" function is deprecated and DBI no longer requires drivers to maintain a list of connections. As such, the above solutions may no longer work. E.g. in RMariaDB the above solutions create errors.
I made with the following alternative that uses the MySQL server's functionality and that should work with current DBI / driver versions:
### listing all open connection to a server with open connection
query <- dbSendQuery(mydb, "SHOW processlist;")
processlist <- dbFetch(query)
dbClearResult(query)
### getting the id of your current connection so that you don't close that one
query <- dbSendQuery(mydb, "SELECT CONNECTION_ID();")
current_id <- dbFetch(query)
dbClearResult(query)
### making a list with all other open processes by a particular set of users
# E.g. when you are working on Amazon Web Services you might not want to close
# the "rdsadmin" connection to the AWS console. Here e.g. I choose only "admin"
# connections that I opened myself. If you really want to kill all connections,
# just delete the "processlist$User == "admin" &" bit.
queries <- paste0("KILL ",processlist[processlist$User == "admin" & processlist$Id != current_id[1,1],"Id"],";")
### making function to kill connections
kill_connections <- function(x) {
query <- dbSendQuery(mydb, x)
dbClearResult(query)
}
### killing other connections
lapply(queries, kill_connections)
### killing current connection
dbDisconnect(mydb)