Loading utf-8 encoded text into MySQL table - mysql

I have a large CSV file that I am going to load it into a MySQL table. However, these data are encoded into utf-8 format, because they include some non-english characters.
I have already set the character set of the corresponding column in the table to utf-8. But when I load my file. the non-english characters turn into weird characters(when I do a select on my table rows). Do I need to encode my data before I load the into the table? if yes how Can I do this. I am using Python to load the data and using LOAD DATA LOCAL INFILE command.
thanks

Try
LOAD DATA INFILE 'file'
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'

as said in http://dev.mysql.com/doc/refman/5.1/en/load-data.html,
you can specify the charset used by your CSV file with the "CHARACTER SET" optional parameter of LOAD DATA LOCAL INFILE

Do not need encode your characters in the file, but you need to make sure that your file is encoding at UTF-8 before load this file to database.

You should send
init_command = 'SET NAMES UTF8'
use_unicode = True
charset = 'utf8'
when doing MySQLdb.connect()
e.g.
dbconfig = {}
dbconfig['host'] = 'localhost'
dbconfig['user'] = ''
dbconfig['passwd'] = ''
dbconfig['db'] = ''
dbconfig['init_command'] = 'SET NAMES UTF8'
dbconfig['use_unicode'] = True
dbconfig['charset'] = 'utf8'
conn = MySQLdb.connect(**dbconfig)
edit: ah, sorry, I see you've added that you're using "LOAD DATA LOCAL INFILE" -- this wasn't clear from your initial question :)

Try something like,
LOAD DATA LOCAL INFILE "file"
INTO TABLE message_history
CHARACTER SET UTF8
COLUMNS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"';
Original Structure,
https://dev.mysql.com/doc/refman/8.0/en/load-data.html

Related

MySQL Invalid UTF8 character string when importing csv table

I want to import an .csv file into MySQL Database by:
load data local infile 'C:\\Users\\t_lichtenberger\\Desktop\\tblEnvironmentLog.csv'
into table tblenvironmentlog
character set utf8
fields terminated by ';'
lines terminated by '\n'
ignore 1 lines;
The .csv file looks like:
But I am getting the following error and I cannot explain why:
Error Code: 1300. Invalid utf8 character string: 'M'
Any suggestions?
Nothing else I tried worked for me, including ensuring that my .csv was saved with UTF-8 encoding.
This worked:
When using LOAD DATA LOCAL INFILE, set CHARACTER SET latin1 instead of CHARACTER SET utf8mb4 as shown in https://dzone.com/articles/mysql-57-utf8mb4-and-the-load-data-infile
Here is a full example that worked for me:
TRUNCATE homestead_daily.answers;
SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE homestead_daily.questions;
SET FOREIGN_KEY_CHECKS = 1;
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/questions.csv' INTO TABLE homestead_daily.questions
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(type, question, created_at, updated_at);
SELECT * FROM homestead_daily.questions;
See what the settings for the export were. Look for "UTF-8".
This suggests that "Truncated text" is caused by the data not being encoded as utf8mb4. Outside MySQL, "look for "UTF-8". (Inside, MySQL, utf8 and utf8mb4 work equally well for all European character sets, so the ü should not be a problem.
If it was exported as "cp1252" (or any of a number of encodings), the byte for ü would not be valid for utf8mb4, leading to truncation.
If this analysis is correct, there are two solutions:
Plan A: Export as UTF-8.
Plan B: Import as latin1. (You do not need to change the column/table definition, just the LOAD DATA.)
Just open the csv file in your text editor (like Nodepad++)
and change the file Encoding to UTF-8
then import your csv file
It's complaining about 'M' but I think it's in München and the actual problematic character is next one, the umlaut 'ü'.
One simple way to test would be to try loading a file with just the first 2 rows & see if that works. Then add the 3rd row, try again & see if that fails.
If you can't or don't want to replace these special characters in your data, then you'll need to start investigating the character sets configured in your CSV file, database, table, columns, tools etc...
Are you using MySQL 5.7 or above? Then something simple to try would be to change to character set utf8mb4 in your load data command.
See How MySQL 5.7 Handles 'utf8mb4' and the Load Data Infile for a similar issue.
Also see:
import geonames allCountries.txt into MySQL 5.7 using LOAD INFILE - ERROR 1300 (HY000)
Trouble with utf8 characters; what I see is not what I stored
“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

How to use unicode character as field separator in mysql LOAD DATA INFILE query

I'm trying to import a file into a MySql database using query below. The fields in the file are delimited by '¦' (broken vertical bar) U+00A6.
I'm executing the query from PhpMyAdmin
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
Somehow mysql ignores the field separator and loads all records into the first field of the table
You can represent it like X'A6'
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY X'A6'
SET NAMES utf8; -- You were probably missing this (or the equivalent)
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
(assuming you can type ¦ in utf8, not latin1)
The clue was "Â" gibrish character at the end of every sting, together with ¦ mojibaked becomes ¦.
The source document is probably encoded in unwanted characters, such as what Excel does. Try opening and saving in TextEdit, or another bare bones text editor, making sure the data is saved as UTF-8. Then:
LOAD DATA LOCAL INFILE "filepath/file.csv" INTO TABLE `table`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\¦';
With escape character before broken bar. Also use UTF-8 Unicode encoding on the entire table, not just the rows.

MySQL LOAD DATA INFILE issue with special characters

Im having problems with LOAD DATA LOCAL INFILE loading a utf8 file.
The Query looks like this:
LOAD DATA LOCAL INFILE 'file.csv' INTO TABLE artikel CHARACTER SET UTF8
FIELDS TERMINATED BY ';' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
But fields with special characters like "Dämmung" will be cut to "D".
SHOW VARIABLES LIKE '%_set_%';
Shows that it's all set to utf8, and I even used notepad++ to make sure the file is utf8.
It would really help, if someone knows advice, it's a huge file and I would like to avoid entering the wrong fields manually :-/
I was having the same problem.
There is a "hack" that solves this: maitain everything with UTF8. The exception is with the "LOAD DATA LOCAL....". When I use LATIN1 instead of UTF8 it works fine.
It worked for me.
LOAD DATA LOCAL INFILE 'D:/File.csv'
INTO TABLE my_table
CHARACTER SET latin1
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
Unfortunately, LOAD DATA does not accept utf8 input, no matter what charset is passed on cli options and used by the table. I didn't find it in documentation and assume it is a bug.
The workaround is to use mysqlimport instead. The shortcoming of this solution is that if you want to disable foreign key check during import (doesn't apply to MyISAM tables) you will have to disable it globally, which requires SUPER privilege:
echo "SET GLOBAL FOREIGN_KEY_CHECKS=0;" | mysql -u root -p
Here is my solution when I was troubleshooting this problem of a special characters:
I will explain:
you can change ; to , because I used csv ; separated not like you
I had line breaks inside my columns so I escaped with \\
The script is :
LOAD DATA LOCAL INFILE 'D:/File.csv'
INTO TABLE my_table
CHARACTER SET latin1
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
Please take care about the collation type of the database and table columns!
I'm using the utf8mb4_unicode_520_ci for both database and table columns.
Everything worked well with keeping CHARACTER SET utf8 in the query.

UTF-8 characters display differently after import to mySQL

I have a mySQL database full of accented characters. The DB is populated periodically from MS Access 2010.
In Access you'll see é è à Ü. In the export process in Access UTF-8 encoding is specified.
Open the resulting text file in UltraEdit on my PC and you'll see "Vieux Carré" and UE says the encoding is U8-DOS.
The files are uploaded via FTP and imported via LOAD DATA LOCAL INFILE queries like this
LOAD DATA LOCAL INFILE '$dir/$t.$ext' INTO TABLE `$t` FIELDS OPTIONALLY ENCLOSED BY '|' TERMINATED BY ';' LINES TERMINATED BY '\n'
In mySQL the field collation is set to utf8_general_ci.
If you query mySQL from the command line or from phpMyAdmin you'll see "Vieux Carré".
What am I doing wrong?
If you're using LOAD DATA INFILE with a file that has a charset that's different from your database's default, you need to specify what character set the file is in:
LOAD DATA LOCAL INFILE '$dir/$t.$ext'
INTO TABLE `$t`
CHARACTER SET utf8
FIELDS OPTIONALLY ENCLOSED BY '|' TERMINATED BY ';'
LINES TERMINATED BY '\n'
Note that the database character set is database-wide, and is different than the table character set and the column character set. SHOW CREATE DATABASE database_name will show you the database charset.

How do I import a CSV file with accented characters into MySQL

I have a file.csv with 300,000 rows. Many of the rows are the names of foreign cities with accented characters. When I try to do an import in mysql, I get warnings on the fields, and it truncates the field at the point of the special character.
LOAD DATA LOCAL INFILE '/var/tmp/geoip/location.csv' INTO TABLE Geolocation2 FIELDS TERMINATED BY ',' enclosed by '"' LINES TERMINATED BY '\n' (id, country, region, city, postalCode, latitude, longitude, metrocode, areacode );
I had this problem with CSV files created by MS Excel. If you are using Excel, or need to convert CSV files to UTF-8 for importing into MySQL the answer here may be of help.
Open and SaveAs all your SQL Query and Data Files with UTF-8 encoding
This will solve BULK INSERT problems
use option WITH (DATAFILETYPE = 'widenative')
It will also solve INSERT INTO Problems whether the data is in the same file as the CREATE TABLE instruction or chained :r "X:\Path\InsertIntoMyTable.sql"
You need to set the connection, database, table and column encodings to the same character set as the data was saved in the CSV file.
http://dev.mysql.com/doc/refman/5.0/en/charset.html
This sequence works to me.
create database {$databasename} DEFAULT CHARACTER SET latin1;
ALTER DATABASE {$databasename} DEFAULT CHARACTER SET latin1 DEFAULT COLLATE latin1_swedish_ci;
charset latin1;
load data infile '{$file.csv}' into table {$tablename} character set latin1 fields terminated by '|' enclosed by '"' lines terminated by '\n';
Tell MySQL what the codepage of the source file is when you import it. e.g. to import a file with codepage Windows-1252, use MySQL codepage latin1 (which is the same thing) like this:
LOAD DATA LOCAL INFILE '/path/to/file.csv'
INTO TABLE imported_table
CHARACTER SET 'latin1'
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;