Compromised saveguard of data due to bad encoding usage? - mysql

I am using jupyter & python 3.6.4 via anaconda.
I want to be able to process and store data from python to a MySQL database.
The libraries I am using to do this arepymysql and sqlalchemy.
For now, I am testing this localy with wamp (mysql version : 5.7.21), later I will apply it to a distant server.
Database creation function:
def create_raw_mysql_db(host,user,password,db_name):
conn=pymysql.connect(host=host,user=user,password=password)
conn.cursor().execute('DROP DATABASE '+db_name)
conn.cursor().execute('CREATE DATABASE '+db_name+' CHARACTER SET utf8mb4')
Function to convert a Dataframe to a relational table in MySql:
def save_raw_to_mysql_db(df,table_name,db_name,if_exists,username,password,host_ip,port):
engine = create_engine("mysql+pymysql://"+username+":#"+host_ip+":"+port+"/"+db_name+"?charset=utf8mb4")
df.to_sql(name=table_name,con=engine,if_exists=if_exists,chunksize=10000)
The execution code:
#DB info & credentials
host = "localhost"
port = "3306"
user= "root"
password= ""
db_name= "raw_data"
exade_light_tb = "exade_light"
#A simple dataframe
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),columns=['a', 'b', 'c', 'd', 'e'])
create_raw_mysql_db(host,user,password,db_name)
save_raw_to_mysql_db(df,exade_light_tb,db_name,"replace",user,password,host,port)
The warning I receive when I run this code:
C:\Users.... : Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 481")
result = self._query(query)
From these threads: /questions/34165523/ questions/47419943 questions/2108824/, I could conclude the problem must be related to the utf8 charset, but I am using utf8mb4 to create my db and I am not using Django (which supposedly also needed to be configured according to questions/2108824/).
My questions :
How is this warning really impacting my data and its integrity?
How come even though I change charset from utf8 to utf8mb4, it
doesn't seem to solve the warning? Do I need to configure something
further? In this case, what are the parameters I should keep in mind
to apply the same configuration to my distant server?
How do I get rid of this warning?
Annex:

Related

Pandas to_sql to localhost table returns 'Engine' object has no attribute 'cursor'

I see lots of this question is about sqlite, but mine is to MySQL.
my entire script is like this:
df = pd.read_csv("df.csv")
engine = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(config.user, config.passwd,
config.host, config.db))
df.to_sql('SQL_table', con=engine, if_exists='append', index=False)
Then it returns the error:
'Engine' object has no attribute 'cursor'
I googled, and followed some solutions, one of them is:
df = pd.read_csv("df.csv")
engine = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.
format(config.user, config.passwd,
config.host, config.db))
connection = engine.raw_connection()
df.to_sql('SQL_table', con=connection, if_exists='append', index=False)
Then the error changed to:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement
I am using MySQL, not sqlite, i don't understand why it returns this error.
So basically, i think the solution is not working, and would anyone please tell me how to fix this problem, my SQLalchemy is 1.4.27
I have solved this, i reset my Mac, then come back to my VS Code, and start the notebook again, the problem is gone.
But before that, i tried also using command
reset
that can't do the trick. It has to be machine hard reset.

Incorrect string value - MySql

I have a problem with MySql.
My version of MYSql is : 5.7.33 - MySQL Community Server (GPL)
I have create a discord Bot in node.js, and i have a mistake when a new user with pseudo like this : legoshi🌌🌧
So i have try to follow this topic : How to fix "Incorrect string value" errors?
So i convert my Database in : utf8mb4_unicode_ci
And my error is still here.
At the begin my database was in utf8 and i have the error too.
code: 'ER_TRUNCATED_WRONG_VALUE_FOR_FIELD',
errno: 1366,
sqlMessage: "Incorrect string value: '\\xF0\\x9F\\x8C\\x8C\\xF0\\x9F...' for column 'user' at row 1",
sqlState: 'HY000',
index: 0,
sql: 'INSERT INTO registre (id, user, autohit, ultimate, platinium, `Date Inscription`) VALUES (210490816542670849, "legoshi🌌🌧", 0, 0, 0, CURRENT_TIMESTAMP())'
}
So i don't no how to change this. I have see a lot of topic and all seems to be fix with utf8mb4_unicode_ci but not in my case.
Thanks for you're help.
In MySQL, there are several places where you can set up a character set:
On the server level
On the database level
On the table level (for each table)
On the field level for all character-based fields
On your connection (telling the server what charset will be used in packets you send to the server)
Basically, server-level, database-level and table-level are just defaults for newly created items: New databases are generated with the server's default. New tables are created with the database's default, new fields are created with the table's default. However, only the field-level charset is what actually counts.
So first, you should make sure that the fields you want to store the data in actually are set up to utf8mb4_unicode_ci. Then, you need to connect to the server using exactly the same charset. Be aware that also the collation should match.
You can find out what character set is in use by issuing the following query:
SHOW VARIABLES LIKE 'character_set_%'
You'll see several variables indicating which default is set for various scopes. Have a look especially to the variables character_set_client and character_set_connection. If the connection does not have the correct character set specified, you need to set it up on connection.
It's a good practice to have all character sets match identically. Mixed values will sooner or later cause trouble.
To check the character set which is set up for the field, have it displayed with the command
SHOW CREATE TABLE registre

RMySQL encoding issue on Windows - Spanish Character ñ

While using RMySQL::dbWriteTable function in R to write a table to MySQL on Windows I get an error message concerning the character [ñ].
The simplified example is:
table <- data.frame(a=seq(1:3), b=c("És", "España", "Compañía"))
table
a b
1 1 És
2 2 España
3 3 Compañía
db <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")
RMySQL::dbWriteTable(db, name="test1", table, overwrite=T, append=F )
Error in .local(conn, statement, ...) :
could not run statement: Invalid utf8 character string: 'Espa'
As you can see, there is no problem with the accents ("És") but there is with the ñ character ("España").
On the other hand, there is no problem with MySQL since this query works fine:
INSERT INTO test.test1 (a,b)
values (1, "España");
Things I have already tried previous to write the table:
Encoding(x) <- "UTF-8" for all table.
iconv(x, "UTF-8", "UTF-8") for all table.
Sent pre-query: dbSendQuery(db, "SET NAMES UTF8;")
Change MySQL table Collation to: "utf-8-general, latin-1, latin-1-spanish...)
*Tried "Latin-1" encoding and didn't work either.
I have been looking for an answer to this question for a while with no luck.
Please help!
Versions:
MySQL 5.7.17
R version 3.3.0
Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=C"
PS: Works fine in Linux environment but I am stuck with Windows in my current project :(
At the end, it looks like it is a problem of the encoding setup of the connection. By default my connection was setup to utf-8 but my local encoding was setup to latin1. Therefore, my final solution was:
con <- dbConnect(MySQL(), user=user, password=password,dbname=dbname, host=host, port=port)
# With the next line I try to get the right encoding (it works for Spanish keyboards)
encoding <- if(grepl(pattern = 'utf8|utf-8',x = Sys.getlocale(),ignore.case = T)) 'utf8' else 'latin1'
dbGetQuery(con,paste("SET names",encoding))
dbGetQuery(con,paste0("SET SESSION character_set_server=",encoding))
dbGetQuery(con,paste0("SET SESSION character_set_database=",encoding))
dbWriteTable( con, value = dfr, name = table, append = TRUE, row.names = FALSE )
dbDisconnect(con)
This works for me in Windows:
write.csv(table, file = "tmp.csv", fileEncoding = "utf8", quote = FALSE, row.names = FALSE)
db <- dbConnect(MySQL(), user = "####", password = "####", dbname ="test", host= "localhost")
dbWriteTable( db, value = "tmp.csv", name = "test1", append = TRUE, row.names = FALSE, sep = ",", quote='\"', eol="\r\n")
I ran into this problem with a data table of about 60 columns and 1.5 million rows; there were many computed values and reconciled and corrected dates and times so I didn't want to reformat anything I didn't have to reformat. Since the utf-8 issue was only coming up in character fields, I used a kludgy-but-quick approach:
1) copy the field list from the dbWriteTable statement into a word processor or text editor
2) on your copy, keep only the fields that have descriptions as VARCHAR and TEXT
3) strip those fields down to just field names
4) use paste0 to write a character vector of statements that will ensure all the fields are character fields:
dt$x <- as.character(dt$x)
5) then use paste0 again to write a character vector of statements that set the encoding to UTF-8
Encoding(dt$x) <- "UTF-8"
Run the as.character group before the Encoding group.
It's definitely a kludge and there are more elegant approaches, but if you only have to do this now and then (as I did), then it has three advantages:
1) it only changes what needs changing (important when, as with my project, there is a great deal of work already in the data table that you don't want to risk in a reformat),
2) it doesn't require a lot of space and read/writes in the intermediate stage, and
3)it's fast to write and runs at an acceptable speed for at least the size of data table I'm working with.
Not elegant, but it will get you over this particular hitch very quickly.
The function dbConnect() has a parameter called encoding that can help you easily setup the connection encoding method.
dbConnect(MySQL(), user=user, password=password,dbname=dbname, host=host, port=port, encoding="latin1")
This has allowed me to insert "ñ" characters into my tables and also inserting data into columns that have "ñ" in their name. For example, I can insert data into a column named "año".

How to fix mysql uppercase query in php and mysql

I am currently working on the website that uses ADODB library. In entire website all the queries are written in UPPERCASE.
The problem is when I run the query it doesn't work because of table name which is UPPERCASE. But when I change the table name to lowercase it works.
$sql = "SELECT * FROM MEMBERS where USERNAME = '$username'";
$db = ADONewConnection('mysql');
$db->debug = true;
$db->Connect(DB_HOSTNAME, DB_USERNAME, DB_PASSWORD, DB_NAME);
$resultFriends = $db->Execute($sql);
while ($row = $resultFriends->FetchRow()) {
var_dump($row);
die;
}
Here is the error I get:
ADOConnection._Execute(SELECT * FROM MEMBERS where USERNAME = 'fury', false) % line 1012, file: adodb.inc.php
ADOConnection.Execute(SELECT * FROM MEMBERS where USERNAME = 'fury') % line 15, file: index.php
Bear in mind I don't want to change the scripts. There are 1000 files and 10000 places.
Is there any library or are there any way that I can run this queries without error?
The version for live sire was linux kernel. but the new dev site is ubuntu.
I have done this on ubuntu/ mysql CML and it didn't work.
The solution is I had to reconfigure the mySql database in AWS/rdbs
You have to modify the “lower_case_table_names” parameter for your DB Instance(s). Prior to today, the lower_case_table_names parameter was not modifiable, with a system default of zero (0) or “table names stored as specified and comparisons are case sensitive.” Beginning immediately, values of zero and one (table names are stored in lowercase and comparisons are not case sensitive) are allowed. See the MySQL documentation for more information on the lower_case_table_names parameter.
The lower_case_table_names parameter can be specified via the rds-modify-db-parameter-group API. Simply include the parameter name and specify the desired value, such as in the following example:
rds-modify-db-parameter-group example --parameters "name=lower_case_table_names, value=1, method=pending-reboot" --region us-east-1
Support for modifying parameters via the AWS Management Console is expected to be added later this year.
setting the lower_case_table_names parameter via a custom DB Parameter Group and doing so before creating an associated DB Instance. Changing the parameter for existing DB Instances could cause inconsistencies with point-in-time recovery backups and with Read Replicas.
Amazon RDS

How to set up MySQL on Windows to accept UTF-8 data via Groovy JDBC connections

I'm writing a script in Groovy that needs to store (and later retrieve) data which contains czech characters (such as č, š, í, ě, ...).
I use a standard JDBC connection like
def sql = Sql.newInstance(
'jdbc:mysql://localhost/db',
'root',
'',
'com.mysql.jdbc.Driver'
);
sql.executeInsert(
'INSERT INTO ... VALUES ...', [ ... ]
);
I have downloaded the latest MySQL ZIP archive for 64bit Windows machines and extracted it on my local hard drive. I now run it (for testing purposes) via mysqld launched manually from the command line.
When I store and retrieve the data some of the czech symbols are corrupted (?s are displayed instead of them). I believe that the database works with a wrong encoding. I would prefer the script to work with UTF-8 encoded data.
I have found a lot of (mutually different) information on the internet about how to set MySQL to work with UTF-8 data. Non of them worked for me, though.
Could you please provide instructions for my specific use case?
As long as your db/table/column is set to use UTF-8, you can try changing your connection params to:
def sql = Sql.newInstance(
'jdbc:mysql://localhost/db?useUnicode=true&characterEncoding=UTF-8',
'root',
'',
'com.mysql.jdbc.Driver'
)