I am trying to utilize dbWriteTable to write from R to MySQL. When I connect to MySQL I created an odbc connection so I can just utilize the command:
abc <- DBI::dbConnect(odbc::odbc(),
dsn = "mysql_conn")
In which I can see all my schemas for the MySQL instance. This is great when I want to read in data such as:
test_query <- dbSendQuery(abc, "SELECT * FROM test_schema.test_file")
test_query <- dbFetch(test_query)
The problem I have is when I want to create a new table in one of the schema's how to declare the schema I want to write to in
dbWriteTable(abc, value = new_file, name = "new_file", overwrite=T)
I imagine I have to define the test_schema in the dbWriteTable portion but haven't been able to get it to work. Thoughts?
Related
I've connected to a MySQL database using the RMariaDB package, and, thanks to the dbplyr package, am able to adjust the data using dplyr commands directly from R studio. However, there are some basic things I want to do that require base R functions (there are no equivalents in dplyr to my knowledge). Is there a way to clean this data using base R commands? Thanks in advance.
The answer to this arises from how the dbplyr package works. dbplyr translates certain dplyr commands into SQL. For example:
library(dplyr)
library(dbplyr)
data(mtcars)
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre %>% head(5)
# view SQL transaltion
first_five %>% show_query()
# resulting SQL translation
<SQL>
SELECT *
FROM `df`
LIMIT 5
The major constrain for this approach is that dbplyr can only translate certain commands into SQL. So something like the following will fail:
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre[1:5,]
While head(df, 5) and df[1:5,] produce identical output for data.frames in local R memory, dbplyr can not translate developer intention, only specific dplyr commands. Hence these two commands are very different when working with database tables.
The other element to consider here is that databases are primarily read-only. In R we can do
df = df %>%
mutate(new_var = 2*old_var)
and this changes the data held in memory. However in databases, the original data is stored in the database and it is transformed based on your instructions when it is requested. There are ways to write completely new database tables from existing database tables - there are already several Q&A on this under the dbplyr tag.
I was able to implement a connection from R through RMariaDB and DBI to a remote MariaDB-database. However, I am currently encountering a strange change of numbers when querying the database through R. I'll explain the differences:
I inserted one simple entry in my database with the following command:
INSERT INTO respondent ( id, name ) VALUES ( 2388793051, 'testuser' )
When I connect to this database directly on the remote server and execute a statement like this:
SELECT * FROM respondent;
it delivers these value
id: 2388793051, name: testuser
So I should also be able to connect to the database via R and receive the same results. So when I execute the following code in R, I expect to receive this inserted and saved information displayed above:
library(DBI)
library(RMariaDB)
conn <- DBI::dbConnect(drv=RMariaDB::MariaDB(), user="myusername", password="mypassword", host="127.0.0.1", port="1111", dbname="mydbname")
res <- dbGetQuery(conn, "SELECT * FROM respondent")
print(res)
However, the result of this query is the following
id name
-1906174245 testuser
As you can see, the id is now -1906174245 instead of the saved 2388793051 in the database. I don't understand this weird conversion of integers in the id-field. Can someone explain how this problem emerges and how I might solve it?
EDIT: I don't expect this to be a problem, but just to inform you: I am using an SSH tunnel to enable a connection via these specified ports from my local to my remote machine.
SOLUTION: What made the difference was to specify the id of a respondent in the database specification already as BIGINT instead of INT. Thanks to #JonnyCrunch
I am trying to read a few excel files into a dataframe and then write to a MySQL database. The following program is able to read the files and create the dataframe but when it tries to write to the db using dbWriteTable command, I get an error message -
Error in .local(conn, statement, ...) :
could not run statement: The used command is not allowed with this MySQL version
library(readxl)
library(RMySQL)
library(DBI)
mydb = dbConnect(RMySQL::MySQL(), host='<ip>', user='username', password='password', dbname="db",port=3306)
setwd("<directory path>")
file.list <- list.files(pattern='*.xlsx')
print(file.list)
dat = lapply(file.list, function(i){
print(i);
x = read_xlsx(i,sheet=NULL, range=cell_cols("A:D"), col_names=TRUE, skip=1, trim_ws=TRUE, guess_max=1000)
x$file=i
x
})
df = do.call("rbind.data.frame", dat)
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
dbDisconnect(mydb)
I checked the definition of the dbWriteTable function and looks like it is using load data local inpath to store the data in the database. As per some other answered questions on Stackoverflow, I understand that the word local could be the cause for concern but since it is already in the function definition, I don't know what I can do. Also, this statement is using "," as separator. But my data has "," in some of the values and that is why I was interested in using the dataframes hoping that it would preserve the source structure. But now I am not so sure.
Is there any other way/function do write the dataframe to the MySQL tables?
I solved this on my system by adding the following line to the my.cnf file on the server (you may need to use root and vi to edit!). In my this is just below the '[mysqld]' line
local-infile=1
Then restart the sever.
Good luck!
You may need to change
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
to
dbWriteTable(mydb, name="table_name", value=df,field.types = c(artist="varchar(50)", song.title="varchar(50)"), row.names=FALSE, append=TRUE)
That way, you specify the field types in R and append data to your MySQL table.
Source:Unknown column in field list error Rmysql
I am working on Spark 2.0 and SparkR libs. I want to get a sample code on how can I do following things in SparkR?
Connect to a MySQL or any other SQL database using SparkR.
Write SQL queries like SELECT , UPDATE etc. to modify a table in that database.
I know to do it using R. However I would need some help to use Spark Sessions or SparkSQL context. I am using R Studio for the development.
Moreover, how do we submit this R code as Spark Batch to run continuously at a regular intervals?
jdbcurl <- "jdbc:mysql://xxx.xxx.x.x:xxxx/database"
data <- read.jdbc(jdbcurl, "tablename", user = "user", password = "password" )
I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.