Encoding issues when trying to write data from R to MySQL - mysql

My data contains special characters like German umlauts.
p=structure(list(ppl_code = c(992621L, 992381L, 992136L, 991989L,
991898L, 991759L, 991681L, 991593L, 991294L, 991036L, 990934L,
990751L, 990535L, 990411L, 990182L, 989507L), proj_name = c("klo",
"Dalbygda", "Oosterhorn", "Hån", "Yatir", "Montigny la Cour",
"Valle Hermoso", "Acciona Honawad - 120 MW", "Apfeltrang", "RiaBlades",
"General Acha", "Lindau-Böhlitz", "Apfeltrang", "Alcazar Round 2",
"Peckelsheim", "Linnich 3")), .Names = c("ppl_code", "proj_name"
), row.names = 15:30, class = "data.frame")
When I try to write it into MySQL database :
conn <- dbConnect(
drv = RMySQL::MySQL(),
dbname = "mydb",
host = "#####",
username = "#####",
password = "#####")
dbWriteTable(conn, value = p, name = "MyTable",row.names=FALSE)
I'm getting the Encoding error :
could not run statement: Invalid utf8 character string: 'Lindau-B'
I have checked several posts regarding this issue like here and here but they are all general explanation without a clear solution !
can anybody help me with a clear query that could solve this issue ?

You need to announce that UTF-8 is being used.
Tool -> Global Options -> Code -> Saving and put UTF-8
rs <- dbSendQuery(con, 'SET NAMES utf8')

Related

How to use sparklyr spark_write_jdbc to connect to MySql

spark_write_jdbc(members_df,
name = "Mbrs",
options = list(
url = paste0("jdbc:mysql://",mysql_host,":",mysql_port,"/",dbname),
user = mysql_user,
password = mysql_password),
mode = "append")
Results in the following exception:
Error: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
The .jar file is in a folder on the server where RStudio is running, config details below. We're able to access MySql via the RMySql package so MySql is working and accessible.
config$`spark.sparklyr.shell.driver-class-path` <- "/dev/shm/temp/mysql-connector-java-5.1.44-bin.jar"
Even if the question is different, I think the answer still applies also to this:
How to use a predicate while reading from JDBC connection?
Code to connect to JDBC MySQL through sparklyr (I made some slight changes to simplify the code a bit, written by Jake Russ)
library(sparklyr)
library(dplyr)
config <- spark_config()
#config$`sparklyr.shell.driver-class-path` <- "E:\\spark232_hadoop27\\jars\\mysql-connector-java-5.1.47-bin.jar"
#in my case, using RStudio and Sparkly this seemed to be optional
sc <- spark_connect(master = "local")
db_tbl <- spark_read_jdbc(sc,
name = "table_name",
options = list(url = "jdbc:mysql://localhost:3306/schema_name",
user = "root",
password = "password",
dbtable = "table_name"))

Multiple DB connection in R

I was wondering if someone could help with this annoying issue.
I'm trying to create/make multiple connections to different database.
I have a data.frame with 3 connection credentials named conf - It works if I manually enter the connections variable like so:
conn <- dbConnect(MySQL(), user=conf$user, password=conf$passws, host=conf$host, dbname=conf$db)
which ends up creating a single connection.
However, what I want is to be able to refer to the connection as:
conf$conn <- dbConnect(MySQL(), user=conf$user, password=conf$passws, host=conf$host, dbname=conf$db)
here is the error message I'm getting.
Error in rep(value, length.out = nrows) :
attempt to replicate an object of type 'S4'
I think the problem is how I'm adding conf$conn
I used a combination of the pool and config package to solve a similar problem to set up a number of simultaneous PostgreSQL connections. Note that this solution needs a config.yml file with the connection properties for db1 and db2.
library(pool)
library(RPostgreSQL)
connect <- function(cfg) {
config <- config::get(config = cfg)
dbPool(
drv = dbDriver("PostgreSQL", max.con = 100),
dbname = config$dbname,
host = config$host,
port = config$port,
user = config$user,
password = config$password
)
}
conn <- lapply(c("db1", "db2"), connect)

How to add appending data to MySQL through R?

I am running an R script that adds data to mySQL database. I usually format the data and add it as as it comes in after couple of hours (data string is not continuous). My first set of data was added properly in MySQL database. The second string of data can not be added properly.
con = dbConnect(MySQL(), user='root', password='xxxxxx', dbname='test', host='localhost')
dbWriteTable(con, 'Tables', value = parseTweets(filterStream(file.name= "", track=c("lebron"), timeout=10, oauth=my_oauth)))
When I rerun the last code (dbWriteTable) again, it gives me following error
Error: Error in .local(conn, statement, ...) :
could not run statement: Table 'tables' already exists
I also used
dbWriteTable(con, 'Tables', value = parseTweets(filterStream(file.name= "", track=c("lebron"), timeout=10, oauth=my_oauth)), append = TRUE)
but it provides the same error
For some reason giving setting append to TRUE is not working. Instead give it a number. See the code below for better understanding
dbWriteTable(con, 'Tables', value = parseTweets(filterStream(file.name= "", track=c("lebron"), timeout=100, oauth=my_oauth)), overwrite = 0, row.names = 0, append = 1)

R Programming - Japanese Characters Insert into MySQL

I want to insert a tab-delimted file, which is conatining both japanese and english characters with special charcters. I am using RMySQL to do is. One of a solution i tried giving below error:
dbWriteTable(con, "japan_test2", d, append = T, row.names=FALSE);
Error in mysqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '˜¨å¤œã®ã‚³ãƒ³_ text)' at line 3)
In addition: Warning message:
In strsplit(msg, "\n") : input string 1 is invalid in this locale
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) :
could not create table: aborting mysqlWriteTable
Current Locale: LC_COLLATE=English_United States.1252;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
Locale Tried: US, Japanese.
Encoding Tried: UTF-8,16,ASCII.
System: Windows7
RStudio Version 0.98.977
MySQL 5.4.27 CE
Probably you aren't setting properly the encoding of the connection. You can try this:
con <- dbConnect(MySQL(), user=user, password=password,dbname=dbname, host=host, port=port)
# With the next line I try to get the right encoding (it works for Spanish keyboards)
encoding <- if(grepl(pattern = 'utf8|utf-8',x = Sys.getlocale(),ignore.case = T)) 'utf8' else 'latin1'
dbGetQuery(con,paste("SET names",encoding))
dbGetQuery(con,paste0("SET SESSION character_set_server=",encoding))
dbGetQuery(con,paste0("SET SESSION character_set_database=",encoding))
dbWriteTable( con, value = dfr, name = table, append = TRUE, row.names = FALSE )
dbDisconnect(con)
Remember that you have to use your local encoding as the right encoding of the connection. I try to get my encoding in the third line of the proposed code and then set the encoding according to my local encoding. Good luck!

RMySQL dbWriteTable adding columns to table (dynamically?)

I just started using the R package called RMySQL in order to get around some memory limitations on my computer. I am trying to take a matrix with 100 columns in R (called data.df), then make a new table on an SQL database that has "100 choose 2" (=4950) columns, where each column is a linear combination of two columns from the initial matrix. So far I have something like this:
countnumber <- 1
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "myDB")
temp <- as.data.frame(data.df[,1] - data.df[,2])
colnames(temp) <- paste(pairs[[countnumber]][1], pairs[[countnumber]][2], sep = "")
dbWriteTable(con, "spreadtable", temp, row.names=T, overwrite = T)
for(i in 1:(n-1)){
for(j in (i+1):n){
if(!((i==1)&&(j==2))){ #this part excludes the first iteration already taken care of
temp <- as.data.frame(data.df[,i] - data.df[,j])
colnames(temp) <- "hola"
dbWriteTable(con, "spreadtable", value = temp, append = TRUE, overwrite = FALSE, row.names = FALSE)
countnumber <- countnumber + 1
}
}
}
I've also tried toying around with the "field.types" argument of RMySQL::dbWriteTable(), which was suggested at RMySQL dbWriteTable with field.types. Sadly it hasn't helped me out too much.
Questions:
Is making your own sql database a valid solution to the memory-bound nature of R, even if it has 4950 columns?
Is the dbWriteTable() the proper function to be using here?
Assuming the answer is "yes" to both of the previous questions...why isn't this working?
Thanks for any help.
[EDIT]: code with error output:
names <- as.data.frame(index)
names <- t(names)
#dim(names) is 1 409
con <- dbConnect(MySQL(), user = "root", password = "password", dbname = "taylordatabase")
dbGetQuery(con, dbBuildTableDefinition(MySQL(), name="spreadtable", obj=names, row.names = F))
#I would prefer these to be double types with 8 decimal spaces instead of text
#dim(temp) is 1 409
temp <- as.data.frame(data.df[,1] - (ratios[countnumber]*data.df[,2]))
temp <- t(temp)
temp <- as.data.frame(temp)
dbWriteTable(con, name = "spreadtable", temp, append = T)
The table is created successfully in the database (I will change variable type later), but the dbWriteTable() line produces the error:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'row_names' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
If I make a slight change, I get a different error message:
dbWriteTable(con, name = "spreadtable", temp, append = T, row.names = F)
and
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column 'X2011_01_03' in 'field list')
[1] FALSE
Warning message:
In mysqlWriteTable(conn, name, value, ...) : could not load data into table
I just want to use "names" as a bunch of column labels. They were initially dates. The actual data I would like to be "temp."
Having a query with 4950 rows is ok, the problem is that what columns you need.
If you always "select * ", you will eventually exhaust all you system memory (in the case that the table has 100 columns)
Why not give us some error message if you have encountered any problems ?