Cannot populate MySQL database from pandas dataframe with if_exists='append' - mysql

I am trying to write a script to populate a mySQL database with multiple pandas dataframes. For the sake of simplicity, I will demonstrate here the code to populate the db with a single pandas df
I am connecting to the db as follows:
import mysql.connector
import pandas as pd
# create the cursor and the connector
conn = mysql.connector.connect(
host='localhost',
user='root',
password='my_password')
c = conn.cursor(buffered=True)
# Create the database
c.execute('CREATE DATABASE IF NOT EXISTS ss_json_interop')
# Connect now to the ss_json_interop database
conn = mysql.connector.connect(
host='localhost',
user='root',
password='my_password',
database='ss_json_interop')
c = conn.cursor(buffered=True)
#### Create the table
c.execute("""CREATE TABLE IF NOT EXISTS sample_sheet_stats_json (
ss_ID int NOT NULL AUTO_INCREMENT,
panel text,
run_ID text,
sample_ID text,
i7_index_ID text,
i7_index_seq text,
i5_index_ID text,
i5_index_seq text,
number_reads_lane1 varchar(255),
number_reads_lane2 varchar(255),
total_reads varchar(255),
PRIMARY KEY (ss_ID)
)""")
#### create the engine
# more here: https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database
database_username = 'root'
database_password = 'my_password'
database_ip = '127.0.0.1'
database_name = 'ss_json_interop'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(database_username, database_password,
database_ip, database_name))
# define the engine
engine = create_engine("mysql+mysqldb://root:my_password#localhost/sample_sheet_stats_json")
I am trying to populate my df into a table called sample_sheet_stats_json. If I do:
df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace')
the command works and the table in the db is correctly populated. However, if I replace the if_exists='replace' by if_exists='append':
df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append')
I get a long error message, like so: (the error message is not complete. it continues replicating the structure of my df
(mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'index' in 'field list' [SQL: 'INSERT INTO sample_sheet_stats_json
Strange enough, I can do df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append') as long as I run df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='replace before') i.e. if the table is already populated.
The same problem was already reported here. However, If I do:
df.to_sql('sample_sheet_stats_json', engine, if_exists='append')
I get the following error message:
(_mysql_exceptions.OperationalError) (2002, "Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)") (Background on this error at: http://sqlalche.me/e/e3q8)
which does not make much sense, as I could already connect to the database with other commands, as shown above.
Does anyone know how can I fix it?

I have figure out what happened. The error message is telling that there is no column index in the pandas dataframe, which is in fact true.
Therefore I have to simply pass the argument index=False with the command df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append'):
df.to_sql('sample_sheet_stats_json', con=database_connection, if_exists='append', index=False)
And that solves the problem.

Related

to_mysql inserts more rows in SQL table than there are in pandas dataframe

So I have a MySQL database, let's call it "MySQLDB". When trying to create a new table (let's call it datatable) and insert data from a pandas dataframe, my code keeps adding rows to the SQL table, and I'm not sure if they are duplicates or not. For reference, there are around 50,000 rows in my pandas dataframe, but after running my code, the SQL table contains over 1 million rows. Note that I am using XAMPP to run a local MySQL server on which the database "MYSQLDB" is stored. Below is a simplified/generic version of what I am running. Note I have removed the port number and replaced it with generic [port] in this post.
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector
pandas_db = pd.read_csv('filename.csv', index_col = [0])
engine = create_engine('mysql+mysqlconnector://root:#localhost:[port]/MySQLDB', echo=False)
pandas_db.to_sql(name='datatable', con=engine, if_exists = 'replace', chunksize = 100, index=False)
Is something wrong with the code? Or could it be something to do with XAMPP or the way I set up my database? If there is anything I could improve, please let me know.
I haven't found any other good posts that describe having the same issue.

creating mysql tables through python

I have a problem creating tables. I use the code below and on one machine it works perfectly well. On another machine, it does not give any error but also does not create the tables. I believe it has something to do with the conda environment but I made a new environment and I still get the same error. There is no difference in library versions between the machine where it works and where it does not work
python=3.7
mysql-connector-python=8.0.18. The funny thing is if I execute a select statement I get valid results.
import mysql.connector
import configparser
config = configparser.RawConfigParser()
config.read('config.ini')
conn = mysql.connector.connect(host=config['mysql report server 8']['host'],
port= config['mysql report server 8']['port'],
user=config['mysql report server 8']['user'],
password=config['mysql report server 8']['password'],
allow_local_infile=True,
autocommit=1
)
mycursor = conn.cursor()
def create_tables(mycursor,name_of_import:str):
with open(r"../SupportFiles/Table_Create_Query.sql") as f:
create_tables_str = f.read()
create_tables_str = create_tables_str .replace("xxx_replaceme",name_of_import)
mycursor.execute(create_tables_str,multi=True)
create_tables(mycursor,"my_test_import")
conn.commit()
conn.close()
the file Table_Create_Query.sql has the following contents
use cb_bht3_0_20_048817_raw;
create table xxx_replaceme_categories (
cid int,
variable varchar(255),
name varchar(255),
value int,
ordr int,
label varchar(255)
);

Compromised saveguard of data due to bad encoding usage?

I am using jupyter & python 3.6.4 via anaconda.
I want to be able to process and store data from python to a MySQL database.
The libraries I am using to do this arepymysql and sqlalchemy.
For now, I am testing this localy with wamp (mysql version : 5.7.21), later I will apply it to a distant server.
Database creation function:
def create_raw_mysql_db(host,user,password,db_name):
conn=pymysql.connect(host=host,user=user,password=password)
conn.cursor().execute('DROP DATABASE '+db_name)
conn.cursor().execute('CREATE DATABASE '+db_name+' CHARACTER SET utf8mb4')
Function to convert a Dataframe to a relational table in MySql:
def save_raw_to_mysql_db(df,table_name,db_name,if_exists,username,password,host_ip,port):
engine = create_engine("mysql+pymysql://"+username+":#"+host_ip+":"+port+"/"+db_name+"?charset=utf8mb4")
df.to_sql(name=table_name,con=engine,if_exists=if_exists,chunksize=10000)
The execution code:
#DB info & credentials
host = "localhost"
port = "3306"
user= "root"
password= ""
db_name= "raw_data"
exade_light_tb = "exade_light"
#A simple dataframe
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),columns=['a', 'b', 'c', 'd', 'e'])
create_raw_mysql_db(host,user,password,db_name)
save_raw_to_mysql_db(df,exade_light_tb,db_name,"replace",user,password,host,port)
The warning I receive when I run this code:
C:\Users.... : Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 481")
result = self._query(query)
From these threads: /questions/34165523/ questions/47419943 questions/2108824/, I could conclude the problem must be related to the utf8 charset, but I am using utf8mb4 to create my db and I am not using Django (which supposedly also needed to be configured according to questions/2108824/).
My questions :
How is this warning really impacting my data and its integrity?
How come even though I change charset from utf8 to utf8mb4, it
doesn't seem to solve the warning? Do I need to configure something
further? In this case, what are the parameters I should keep in mind
to apply the same configuration to my distant server?
How do I get rid of this warning?
Annex:

RMySQL - dbWriteTable() - The used command is not allowed with this MySQL version

I am trying to read a few excel files into a dataframe and then write to a MySQL database. The following program is able to read the files and create the dataframe but when it tries to write to the db using dbWriteTable command, I get an error message -
Error in .local(conn, statement, ...) :
could not run statement: The used command is not allowed with this MySQL version
library(readxl)
library(RMySQL)
library(DBI)
mydb = dbConnect(RMySQL::MySQL(), host='<ip>', user='username', password='password', dbname="db",port=3306)
setwd("<directory path>")
file.list <- list.files(pattern='*.xlsx')
print(file.list)
dat = lapply(file.list, function(i){
print(i);
x = read_xlsx(i,sheet=NULL, range=cell_cols("A:D"), col_names=TRUE, skip=1, trim_ws=TRUE, guess_max=1000)
x$file=i
x
})
df = do.call("rbind.data.frame", dat)
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
dbDisconnect(mydb)
I checked the definition of the dbWriteTable function and looks like it is using load data local inpath to store the data in the database. As per some other answered questions on Stackoverflow, I understand that the word local could be the cause for concern but since it is already in the function definition, I don't know what I can do. Also, this statement is using "," as separator. But my data has "," in some of the values and that is why I was interested in using the dataframes hoping that it would preserve the source structure. But now I am not so sure.
Is there any other way/function do write the dataframe to the MySQL tables?
I solved this on my system by adding the following line to the my.cnf file on the server (you may need to use root and vi to edit!). In my this is just below the '[mysqld]' line
local-infile=1
Then restart the sever.
Good luck!
You may need to change
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
to
dbWriteTable(mydb, name="table_name", value=df,field.types = c(artist="varchar(50)", song.title="varchar(50)"), row.names=FALSE, append=TRUE)
That way, you specify the field types in R and append data to your MySQL table.
Source:Unknown column in field list error Rmysql

How to avoid encoding warning when inserting binary data into a blob column in MySQL using Python 2.7 and MySQLdb

I'm having trouble inserting binary data into a longblob column in MySQL using MySQLdb from Python 2.7, but I'm getting an encoding warning that I don't know how to get around:
./test.py:11: Warning: Invalid utf8 character string: '8B0800'
curs.execute(sql, (blob,))
Here is the table definition:
CREATE TABLE test_table (
id int(11) NOT NULL AUTO_INCREMENT,
gzipped longblob,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
And the test code:
#!/usr/bin/env python
import sys
import MySQLdb
blob = open("/tmp/some-file.gz", "rb").read()
sql = "INSERT INTO test_table (gzipped) VALUES (%s)"
conn = MySQLdb.connect(db="unprocessed", user="some_user", passwd="some_pass", charset="utf8", use_unicode=True)
curs = conn.cursor()
curs.execute(sql, (blob,))
I've searched here and elsewhere for the answer, but unfortunately although many questions seem like they are what I'm looking for, the posters don't appear to be having encoding issues.
Questions:
What is causing this warning?
How do I get rid of it?
After some more searching I've found the answers.
It is actually MySQL generating this warning.
It can be avoided by using _binary before the binary parameter.
https://bugs.mysql.com/bug.php?id=79317
So the Python code needs to be updated as follows:
sql = "INSERT INTO test_table (gzipped) VALUES (_binary %s)"