How to bypass invalid utf8 character string in mysql - mysql

I have a large text file containing Arabic text data. When I try to load it into a MySQL table, I get error saying Error code 1300: invalid utf8 character string. This is what I have tried so far:
LOAD DATA INFILE '/var/lib/mysql-files/text_file.txt'
IGNORE INTO TABLE tblTest
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';
I tried to ignore this error, but it does not work. I have tried LOCAL INFILE but it did not work, too. My database was created using DEFAULT CHAR SET UTF8 and DEFAULT COLLATE utf8_general_ci. The text file is utf-8 encoded.
I do not want the records which contain invalid utf8 characters. So how I can load the data with ignoring the records containing such invalid chars?
Thank in advance!

It would help to have the HEX of the naughty character.
A possible approach to reading all the text, then dealing with any bad characters:
Read into a column of type VARBINARY or BLOB.
Loop through the rows, trying to copy to a VARCHAR or TEXT column.
Another plan is to use utf8mb4 instead of utf8. It could be that the bad character is an Emoji or Chinese character that will work in utf8mb4, but not utf8.
Ignore errors
This may let you ignore errors:
SET #save := ##sql_mode;
LOAD DATA ...;
SET ##sql_mode := #save;

I have this problem when try use MySQL 5.7.14, too.
I returned to MySQL 5.6 and this problem is disappeared

Related

What is the correct character set to use for '\xE7'

I am trying to load data from a file into a MySQL table using
load data infile 'outfile2.txt'
into table parsed_ingredients
fields terminated by ","
lines terminated by "\n";
All appears fine until I reach a line where this error gets thrown
ERROR 1366 (HY000): Incorrect string value: '\xE7ao' for column 'ingredient' at row 5875
I looked at the line and it's
ç (c-cedilla)
I have tried all sorts of combinations of encoding
utf8
latin1
uft8mb4
They are all leading to the same error. What is the correct encoding to use and what needs to be set to this encoding?
Any of cp1250, cp1256, dec8, latin1, latin2, latin5 map the single byte E7 to ç. Use latin1.
Apparently your client is using E7, so you need to tell MySQL that fact. Do that in the LOAD DATA statement with this between lines 2 and 3:
CHARACTER SET latin1
Meanwhile, the column in the table can be latin1 or utf8 or utf8mb4; it does not matter. (Or, at least, it does not matter to this Question.) The E7 will be translated to the suitable encoding for the table column. For utf8/utf8mb4, expect to find hex C2E7 in the table.

Special character data migration

I have to migrate a database from Oracle to MySql having billions of data. I found a strategy to create a schema and get the data in CSV from Oracle and load data to MySQL. I have created a CSV file with a delimiter of a quote(") and field terminated with a comma(,). Now the problem is that CSV file contains special character which is not going to be imported in MySql.
I am using the command:
LOAD DATA LOCAL infile 'C:/Users/NNCP4659/export.csv' INTO TABLE dbk_address_master
CHARACTER SET utf8 fields terminated BY "," enclosed by '"' lines terminated
BY "\r\n"(id, country_code,address,city_id,latitude,longitude,
#is_active,google_address,old_address,building_number,street_name,created_by)
set is_active=cast(#is_active as signed);
My data is like:
4113973,"CHE","167 Bernerstrasse Süd","57066","47.3943271","8.4865849",1,"Bernerstrasse Süd 167, 8048 Zürich,
Switzerland","167 Bernerstrasse Süd","Y","167","Bernerstrasse Süd","migration"
And error is:
ERROR 1300 (HY000): Invalid utf8 character string: '"167
Bernerstrasse S'
167 Bernerstrasse S looks like the truncation of 167 Bernerstrasse Süd at the first non-utf8 character.
You have specified that the incoming data is utf8 via
LOAD DATA ... CHARACTER SET utf8 ...
I conclude that the incoming file is not encoded correctly. It is probably latin1, in which case the hex would be FC. Assuming this is the case, you should switch to
LOAD DATA ... CHARACTER SET latin1 ...
It does not matter if the CHARACTER SET in the target column is not latin1; MySQL will transcode it in flight.
(Alternatively, you could change the incoming data to have utf8 (hex: C3BC), but that may be more hassle.)
Reference: "truncated" in Trouble with UTF-8 characters; what I see is not what I stored
(As for how to check the hex, or do SHOW CREATE TABLE, we need to know what OS you are using and what tools you have available.)

MySQL Error Code: 1300. Invalid utf8 character string: '' with '\' infront of unicode character

I have a bunch of csv-files and I came across a case where strangely enough '\' utf8 are in front of unicode characters e.g.
Tom;\Éscobar;123
and when doing a bulk insert via:
LOAD DATA LOCAL INFILE 'test.csv'
INTO TABLE TEST_TABLE
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ';'
ENCLOSED BY '"';
the error: Invalid utf8 character string: '' is thrown.
I wonder if this is a bug in MySQL LOAD DATA or if I am missing something here.
MySQL-Version 5.7.16
If you can figure out what the pattern is, ...
load the CSV into VARBINARY or BLOB column(s).
run a SQL, probably using the REPLACE() function`, to cleanse the data (remove the blackslashes).
do a suitable ALTER, probably MODIFY COLUMN ... VARCHAR(...) CHARACTER SET utf8mb4 to get it into a text format.

MySQL Invalid UTF8 character string when importing csv table

I want to import an .csv file into MySQL Database by:
load data local infile 'C:\\Users\\t_lichtenberger\\Desktop\\tblEnvironmentLog.csv'
into table tblenvironmentlog
character set utf8
fields terminated by ';'
lines terminated by '\n'
ignore 1 lines;
The .csv file looks like:
But I am getting the following error and I cannot explain why:
Error Code: 1300. Invalid utf8 character string: 'M'
Any suggestions?
Nothing else I tried worked for me, including ensuring that my .csv was saved with UTF-8 encoding.
This worked:
When using LOAD DATA LOCAL INFILE, set CHARACTER SET latin1 instead of CHARACTER SET utf8mb4 as shown in https://dzone.com/articles/mysql-57-utf8mb4-and-the-load-data-infile
Here is a full example that worked for me:
TRUNCATE homestead_daily.answers;
SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE homestead_daily.questions;
SET FOREIGN_KEY_CHECKS = 1;
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/questions.csv' INTO TABLE homestead_daily.questions
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(type, question, created_at, updated_at);
SELECT * FROM homestead_daily.questions;
See what the settings for the export were. Look for "UTF-8".
This suggests that "Truncated text" is caused by the data not being encoded as utf8mb4. Outside MySQL, "look for "UTF-8". (Inside, MySQL, utf8 and utf8mb4 work equally well for all European character sets, so the ü should not be a problem.
If it was exported as "cp1252" (or any of a number of encodings), the byte for ü would not be valid for utf8mb4, leading to truncation.
If this analysis is correct, there are two solutions:
Plan A: Export as UTF-8.
Plan B: Import as latin1. (You do not need to change the column/table definition, just the LOAD DATA.)
Just open the csv file in your text editor (like Nodepad++)
and change the file Encoding to UTF-8
then import your csv file
It's complaining about 'M' but I think it's in München and the actual problematic character is next one, the umlaut 'ü'.
One simple way to test would be to try loading a file with just the first 2 rows & see if that works. Then add the 3rd row, try again & see if that fails.
If you can't or don't want to replace these special characters in your data, then you'll need to start investigating the character sets configured in your CSV file, database, table, columns, tools etc...
Are you using MySQL 5.7 or above? Then something simple to try would be to change to character set utf8mb4 in your load data command.
See How MySQL 5.7 Handles 'utf8mb4' and the Load Data Infile for a similar issue.
Also see:
import geonames allCountries.txt into MySQL 5.7 using LOAD INFILE - ERROR 1300 (HY000)
Trouble with utf8 characters; what I see is not what I stored
“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

How to use unicode character as field separator in mysql LOAD DATA INFILE query

I'm trying to import a file into a MySql database using query below. The fields in the file are delimited by '¦' (broken vertical bar) U+00A6.
I'm executing the query from PhpMyAdmin
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
Somehow mysql ignores the field separator and loads all records into the first field of the table
You can represent it like X'A6'
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY X'A6'
SET NAMES utf8; -- You were probably missing this (or the equivalent)
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
(assuming you can type ¦ in utf8, not latin1)
The clue was "Â" gibrish character at the end of every sting, together with ¦ mojibaked becomes ¦.
The source document is probably encoded in unwanted characters, such as what Excel does. Try opening and saving in TextEdit, or another bare bones text editor, making sure the data is saved as UTF-8. Then:
LOAD DATA LOCAL INFILE "filepath/file.csv" INTO TABLE `table`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\¦';
With escape character before broken bar. Also use UTF-8 Unicode encoding on the entire table, not just the rows.