MYSQL: West European Characters get Mangled using LOAD DATA LOCAL INFILE - mysql

When I run this at the MySQL command line, it works fine:
INSERT INTO MYTABLE VALUES(NULL,101942,'2015-05-08','sähkötupakalle');
The 'ä' and the 'ö' end up in the MySQL varchar column just fine.
However, when I put the same data in a file, and use
LOAD DATA LOCAL INFILE
then the 'ä' and the 'ö' get mangled, and I end up with data in the MySQL varchar column that looks like this:
sähkötupakalle
Any ideas for how I can get these characters to load correctly using "LOAD DATA LOCAL INFILE" ?? FYI, my table has CHARSET=utf8.

Apparently the file you are loading is correctly encoded with utf8? But you did not include the CHARACTER SET utf8 clause?
Symptom of "Mojibake":
When SELECTing the text, each non-english character is replaced by 2-3 characters that you could call jibberish or garbage.
How you got in the mess:
The client's bytes to be INSERTed into the table were encoded as utf8 (good), and
The charset for the connection was latin1 (eg, via SET NAMES latin1), and
The table column was declared CHARACTER SET latin1
How to fix the text and the table:
Do the 2-step ALTER:
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
where the lengths are big enough and the other "..." have whatever else (NOT NULL, etc) was already on the column.
That converts the column definition while leaving the bits alone.
How to fix the code (in general):
Change the client's declaration of charset to utf8 - via SET NAMES utf8 or equivalent.

Related

What is the correct character set to use for '\xE7'

I am trying to load data from a file into a MySQL table using
load data infile 'outfile2.txt'
into table parsed_ingredients
fields terminated by ","
lines terminated by "\n";
All appears fine until I reach a line where this error gets thrown
ERROR 1366 (HY000): Incorrect string value: '\xE7ao' for column 'ingredient' at row 5875
I looked at the line and it's
ç (c-cedilla)
I have tried all sorts of combinations of encoding
utf8
latin1
uft8mb4
They are all leading to the same error. What is the correct encoding to use and what needs to be set to this encoding?
Any of cp1250, cp1256, dec8, latin1, latin2, latin5 map the single byte E7 to ç. Use latin1.
Apparently your client is using E7, so you need to tell MySQL that fact. Do that in the LOAD DATA statement with this between lines 2 and 3:
CHARACTER SET latin1
Meanwhile, the column in the table can be latin1 or utf8 or utf8mb4; it does not matter. (Or, at least, it does not matter to this Question.) The E7 will be translated to the suitable encoding for the table column. For utf8/utf8mb4, expect to find hex C2E7 in the table.

MySQL Multilingual Encoding | Error Code: 1366. Incorrect string value: '\xCE\x09DIS'

I am trying to set up a database to store string data that is in multiple languages and includes Chinese letters among many others.
Steps I have taken so far:
I have created a schema which uses utf8mb4 character set and utf8mb4_unicode_ci collation.
I have created a table which includes CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; at the end of the CREATE statement.
I am attempting to LOAD DATA INFILE from a CSV file with CHARACTER SET utf8mb4 specified in the LOAD statement.
However, I am receiving an error Error Code: 1366. Incorrect string value: '\xCE\x09DIS' for column 'company_name' at row 43630.
Did it successfully parse 43629 rows? Then croak on that row? It may actually be garbage in the file.
Do you know what that company name should be? What does the rest of the line say?
Do you have another example? Remove that one line and run the LOAD again.
CE can be interpreted by any 1-byte charset, but not necessarily in a meaningful way.
09 is the "tab" character in virtually all charsets; is it reasonable to have a tab in a company name??

MySQL Invalid UTF8 character string when importing csv table

I want to import an .csv file into MySQL Database by:
load data local infile 'C:\\Users\\t_lichtenberger\\Desktop\\tblEnvironmentLog.csv'
into table tblenvironmentlog
character set utf8
fields terminated by ';'
lines terminated by '\n'
ignore 1 lines;
The .csv file looks like:
But I am getting the following error and I cannot explain why:
Error Code: 1300. Invalid utf8 character string: 'M'
Any suggestions?
Nothing else I tried worked for me, including ensuring that my .csv was saved with UTF-8 encoding.
This worked:
When using LOAD DATA LOCAL INFILE, set CHARACTER SET latin1 instead of CHARACTER SET utf8mb4 as shown in https://dzone.com/articles/mysql-57-utf8mb4-and-the-load-data-infile
Here is a full example that worked for me:
TRUNCATE homestead_daily.answers;
SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE homestead_daily.questions;
SET FOREIGN_KEY_CHECKS = 1;
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/questions.csv' INTO TABLE homestead_daily.questions
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(type, question, created_at, updated_at);
SELECT * FROM homestead_daily.questions;
See what the settings for the export were. Look for "UTF-8".
This suggests that "Truncated text" is caused by the data not being encoded as utf8mb4. Outside MySQL, "look for "UTF-8". (Inside, MySQL, utf8 and utf8mb4 work equally well for all European character sets, so the ü should not be a problem.
If it was exported as "cp1252" (or any of a number of encodings), the byte for ü would not be valid for utf8mb4, leading to truncation.
If this analysis is correct, there are two solutions:
Plan A: Export as UTF-8.
Plan B: Import as latin1. (You do not need to change the column/table definition, just the LOAD DATA.)
Just open the csv file in your text editor (like Nodepad++)
and change the file Encoding to UTF-8
then import your csv file
It's complaining about 'M' but I think it's in München and the actual problematic character is next one, the umlaut 'ü'.
One simple way to test would be to try loading a file with just the first 2 rows & see if that works. Then add the 3rd row, try again & see if that fails.
If you can't or don't want to replace these special characters in your data, then you'll need to start investigating the character sets configured in your CSV file, database, table, columns, tools etc...
Are you using MySQL 5.7 or above? Then something simple to try would be to change to character set utf8mb4 in your load data command.
See How MySQL 5.7 Handles 'utf8mb4' and the Load Data Infile for a similar issue.
Also see:
import geonames allCountries.txt into MySQL 5.7 using LOAD INFILE - ERROR 1300 (HY000)
Trouble with utf8 characters; what I see is not what I stored
“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

How to bypass invalid utf8 character string in mysql

I have a large text file containing Arabic text data. When I try to load it into a MySQL table, I get error saying Error code 1300: invalid utf8 character string. This is what I have tried so far:
LOAD DATA INFILE '/var/lib/mysql-files/text_file.txt'
IGNORE INTO TABLE tblTest
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';
I tried to ignore this error, but it does not work. I have tried LOCAL INFILE but it did not work, too. My database was created using DEFAULT CHAR SET UTF8 and DEFAULT COLLATE utf8_general_ci. The text file is utf-8 encoded.
I do not want the records which contain invalid utf8 characters. So how I can load the data with ignoring the records containing such invalid chars?
Thank in advance!
It would help to have the HEX of the naughty character.
A possible approach to reading all the text, then dealing with any bad characters:
Read into a column of type VARBINARY or BLOB.
Loop through the rows, trying to copy to a VARCHAR or TEXT column.
Another plan is to use utf8mb4 instead of utf8. It could be that the bad character is an Emoji or Chinese character that will work in utf8mb4, but not utf8.
Ignore errors
This may let you ignore errors:
SET #save := ##sql_mode;
LOAD DATA ...;
SET ##sql_mode := #save;
I have this problem when try use MySQL 5.7.14, too.
I returned to MySQL 5.6 and this problem is disappeared

How to use unicode character as field separator in mysql LOAD DATA INFILE query

I'm trying to import a file into a MySql database using query below. The fields in the file are delimited by '¦' (broken vertical bar) U+00A6.
I'm executing the query from PhpMyAdmin
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
Somehow mysql ignores the field separator and loads all records into the first field of the table
You can represent it like X'A6'
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY X'A6'
SET NAMES utf8; -- You were probably missing this (or the equivalent)
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
(assuming you can type ¦ in utf8, not latin1)
The clue was "Â" gibrish character at the end of every sting, together with ¦ mojibaked becomes ¦.
The source document is probably encoded in unwanted characters, such as what Excel does. Try opening and saving in TextEdit, or another bare bones text editor, making sure the data is saved as UTF-8. Then:
LOAD DATA LOCAL INFILE "filepath/file.csv" INTO TABLE `table`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\¦';
With escape character before broken bar. Also use UTF-8 Unicode encoding on the entire table, not just the rows.