RMySQL - dbWriteTable() - The used command is not allowed with this MySQL version - mysql

I am trying to read a few excel files into a dataframe and then write to a MySQL database. The following program is able to read the files and create the dataframe but when it tries to write to the db using dbWriteTable command, I get an error message -
Error in .local(conn, statement, ...) :
could not run statement: The used command is not allowed with this MySQL version
library(readxl)
library(RMySQL)
library(DBI)
mydb = dbConnect(RMySQL::MySQL(), host='<ip>', user='username', password='password', dbname="db",port=3306)
setwd("<directory path>")
file.list <- list.files(pattern='*.xlsx')
print(file.list)
dat = lapply(file.list, function(i){
print(i);
x = read_xlsx(i,sheet=NULL, range=cell_cols("A:D"), col_names=TRUE, skip=1, trim_ws=TRUE, guess_max=1000)
x$file=i
x
})
df = do.call("rbind.data.frame", dat)
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
dbDisconnect(mydb)
I checked the definition of the dbWriteTable function and looks like it is using load data local inpath to store the data in the database. As per some other answered questions on Stackoverflow, I understand that the word local could be the cause for concern but since it is already in the function definition, I don't know what I can do. Also, this statement is using "," as separator. But my data has "," in some of the values and that is why I was interested in using the dataframes hoping that it would preserve the source structure. But now I am not so sure.
Is there any other way/function do write the dataframe to the MySQL tables?

I solved this on my system by adding the following line to the my.cnf file on the server (you may need to use root and vi to edit!). In my this is just below the '[mysqld]' line
local-infile=1
Then restart the sever.
Good luck!

You may need to change
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
to
dbWriteTable(mydb, name="table_name", value=df,field.types = c(artist="varchar(50)", song.title="varchar(50)"), row.names=FALSE, append=TRUE)
That way, you specify the field types in R and append data to your MySQL table.
Source:Unknown column in field list error Rmysql

Related

How to manipulate/clean data located in a MySQL database using base R commands?

I've connected to a MySQL database using the RMariaDB package, and, thanks to the dbplyr package, am able to adjust the data using dplyr commands directly from R studio. However, there are some basic things I want to do that require base R functions (there are no equivalents in dplyr to my knowledge). Is there a way to clean this data using base R commands? Thanks in advance.
The answer to this arises from how the dbplyr package works. dbplyr translates certain dplyr commands into SQL. For example:
library(dplyr)
library(dbplyr)
data(mtcars)
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre %>% head(5)
# view SQL transaltion
first_five %>% show_query()
# resulting SQL translation
<SQL>
SELECT *
FROM `df`
LIMIT 5
The major constrain for this approach is that dbplyr can only translate certain commands into SQL. So something like the following will fail:
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre[1:5,]
While head(df, 5) and df[1:5,] produce identical output for data.frames in local R memory, dbplyr can not translate developer intention, only specific dplyr commands. Hence these two commands are very different when working with database tables.
The other element to consider here is that databases are primarily read-only. In R we can do
df = df %>%
mutate(new_var = 2*old_var)
and this changes the data held in memory. However in databases, the original data is stored in the database and it is transformed based on your instructions when it is requested. There are ways to write completely new database tables from existing database tables - there are already several Q&A on this under the dbplyr tag.

R MySQL to no-default schema

I am trying to utilize dbWriteTable to write from R to MySQL. When I connect to MySQL I created an odbc connection so I can just utilize the command:
abc <- DBI::dbConnect(odbc::odbc(),
dsn = "mysql_conn")
In which I can see all my schemas for the MySQL instance. This is great when I want to read in data such as:
test_query <- dbSendQuery(abc, "SELECT * FROM test_schema.test_file")
test_query <- dbFetch(test_query)
The problem I have is when I want to create a new table in one of the schema's how to declare the schema I want to write to in
dbWriteTable(abc, value = new_file, name = "new_file", overwrite=T)
I imagine I have to define the test_schema in the dbWriteTable portion but haven't been able to get it to work. Thoughts?

PostgreSQL multiple CSV import and add filename to each column

I've got 200k csv files and I need to import them all to a single postgresql table. It's a list of parameters from various devices and each csv's file name contains device's serial number and I need it to be in one of the colums for each row.
So to simplify, I've got few columns of data (no headers), let's say that columns in each csv file are: Date, Variable, Value and file name contains SERIALNUMBER_and_someOtherStuffIDontNeed.csv
I'm trying to use cygwin to write a bash script to iterate over files and do it for me, however for some reason it won't work, showing 'syntax error at or near "as" '
Here's my code:
#!/bin/bash
FILELIST=/cygdrive/c/devices/files/*
for INPUT_FILE in $FILELIST
do
psql -U postgres -d devices -c "copy devicelist
(
Date,
Variable,
Value,
SN as CURRENT_LOAD_SOURCE(),
)
from '$INPUT_FILE
delimiter ',' ;"
done
I'm learning SQL so it might be an obvious mistake, but I can't see it.
Also I know that in that form I will get full file name, not just the serial number bit I want but I can probably handle that somehow later.
Please advise.
Thanks.
I dont think there is a CURRENT_LOAD_SOURCE() function in postgres. A work-around is to leave the name-column NULL on copy, and patch is to the desired value just after the copy. I prefer a shell here-document because that make quoting inside the SQL body easier. (BTW: for 10K of files, the globbing needed to obtain FILELIST might exceed argmax for the shell ...)
#!/bin/bash
FILELIST="`ls /tmp/*.c`"
for INPUT_FILE in $FILELIST
do
echo "File:" $INPUT_FILE
psql -U postgres -d devices <<OMG
-- I have a schema "tmp" for testing purposes
CREATE TABLE IF NOT EXISTS tmp.filelist(name text, content text);
COPY tmp.filelist ( content)
from '/$INPUT_FILE' delimiter ',' ;
UPDATE tmp.filelist SET name = '$FILELIST'
WHERE name IS NULL;
OMG
done
For anyone interested in an answer, I've used a python script to change file names and then another script using psycopg2 to connect to the database and then done everyting in one connection. Took 10 minutes instead of 10 hours.
Here's the code:
Renaming files (also apparently to import from CSV you need all the rows to be filled and the information I needed was in first 4 columns anyway, therefore I've put together a solution to generate whole new CSVs instead of just renaming them):
import os
import csv
path='C:/devices/files'
os.chdir(path)
i=0
for file in os.listdir(path):
try:
i+=1
if i%10000 == 0:
#just to see the progress
print(i)
serial_number = (file[:8])
creader = csv.reader(open(file))
cwriter = csv.writer(open('processed_'+file, 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (4, 5, 6, 7)]
new_line.insert(0, serial_number)
#print(new_line)
cwriter.writerow(new_line)
except:
print('problem with file: ' + file)
pass
Updating database:
import os
import psycopg2
path="C:\\devices\\files"
directory_listing = os.listdir(path)
conn = psycopg2.connect("dbname='devices' user='postgres' host='localhost'")
cursor = conn.cursor()
print(len(directory_listing))
i=100001
while i < 218792:
current_file=(directory_listing[i])
i+=1
full_path = "C:/devices/files/" + current_file
with open(full_path) as f:
cursor.copy_from(file=f, table='devicelistlive', sep=",")
conn.commit()
conn.close()
Don't mind while and weird numbers, it's just because I was doing it in portions for testing purposes. Can easily be replaced with for

How to parsimoniously refer to a data frame in RMySQL

I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.

How to Create Master script file to run mysql scripts

I want to create DB structure for my application in mysql, I have some 100 scripts which will create tables , sp, functions in different schemas.
Please suggest how can i run script only one after other and how can i stop if previous script failed. I am using MySQL 5.6 version.
I am currrently runnning them using a text file.
mysql> source /mypath/CreateDB.sql
which contains
tee /logout/session.txt
source /mypath/00-CreateSchema.sql
source /mypath/01-CreateTable1.sql
source /mypath/01-CreateTable2.sql
source /mypath/01-CreateTable3.sql
But they are running simultaniously and I have Foreign key in below tables due to which it is giving error.
The scripts are not running simultaneously. The mysql client does not execute in a multi-threaded manner.
But it's possible that you are sourcing the scripts in an order that causes foreign keys to reference tables that you haven't defined yet, and this is a problem.
You have two possible fixes for this problem:
Create the tables in the order to avoid this problem.
Create all the tables without their foreign keys, then run another script that contains ALTER TABLE ADD FOREIGN KEY... statements.
I wrote a Python function to execute SQL files:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Download it at http://sourceforge.net/projects/mysql-python/?source=dlp
# Tutorials: http://mysql-python.sourceforge.net/MySQLdb.html
# http://zetcode.com/db/mysqlpython/
import MySQLdb as mdb
import datetime, time
def run_sql_file(filename, connection):
'''
The function takes a filename and a connection as input
and will run the SQL query on the given connection
'''
start = time.time()
file = open(filename, 'r')
sql = s = " ".join(file.readlines())
print "Start executing: " + filename + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n" + sql
cursor = connection.cursor()
cursor.execute(sql)
connection.commit()
end = time.time()
print "Time elapsed to run the query:"
print str((end - start)*1000) + ' ms'
def main():
connection = mdb.connect('127.0.0.1', 'root', 'password', 'database_name')
run_sql_file("my_query_file.sql", connection)
connection.close()
if __name__ == "__main__":
main()
I haven't tried it with stored procedure or large SQL statements. Also if you have SQL files containing several SQL queries, you might have to split(";") to extract each query and call cursor.execute(sql) for each query. Feel free to edit this answer to incorporate these improvements.