How to set special character with LOAD DATA LOCAL INFILE - mysql

How to set right special character in SQL with LOAD DATA LOCAL INFILE query
Inserting my data with LOAD DATA LOCAL INFILE. few special character are replace
“Ee-ock-ee” and “tweet” are examples of this "sounds like" literary
device"
TO
?Ee-ock-ee? and ?tweet? are examples of this "sounds like" literary
device

Looks like a character set problem. Read more about the LOAD LOCAL DATA options: MySQL docs
sample:
The server uses the character set indicated by the
character_set_database system variable to interpret the information in
the file. SET NAMES and the setting of character_set_client do not
affect interpretation of input. If the contents of the input file use
a character set that differs from the default, it is usually
preferable to specify the character set of the file by using the
CHARACTER SET clause. A character set of binary specifies “no
conversion.”
Note: It is not possible to load data files that use the ucs2, utf16, utf16le, or utf32 character set.
It may be worth double checking that the file is saved in the character set you think it is! Then continue.
-- Show all available character sets
SHOW CHARACTER SET;
LOAD LOCAL DATA INFILE 'course.txt' INTO TABLE Course
CHARACTER SET swe7
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

Related

Special character data migration

I have to migrate a database from Oracle to MySql having billions of data. I found a strategy to create a schema and get the data in CSV from Oracle and load data to MySQL. I have created a CSV file with a delimiter of a quote(") and field terminated with a comma(,). Now the problem is that CSV file contains special character which is not going to be imported in MySql.
I am using the command:
LOAD DATA LOCAL infile 'C:/Users/NNCP4659/export.csv' INTO TABLE dbk_address_master
CHARACTER SET utf8 fields terminated BY "," enclosed by '"' lines terminated
BY "\r\n"(id, country_code,address,city_id,latitude,longitude,
#is_active,google_address,old_address,building_number,street_name,created_by)
set is_active=cast(#is_active as signed);
My data is like:
4113973,"CHE","167 Bernerstrasse Süd","57066","47.3943271","8.4865849",1,"Bernerstrasse Süd 167, 8048 Zürich,
Switzerland","167 Bernerstrasse Süd","Y","167","Bernerstrasse Süd","migration"
And error is:
ERROR 1300 (HY000): Invalid utf8 character string: '"167
Bernerstrasse S'
167 Bernerstrasse S looks like the truncation of 167 Bernerstrasse Süd at the first non-utf8 character.
You have specified that the incoming data is utf8 via
LOAD DATA ... CHARACTER SET utf8 ...
I conclude that the incoming file is not encoded correctly. It is probably latin1, in which case the hex would be FC. Assuming this is the case, you should switch to
LOAD DATA ... CHARACTER SET latin1 ...
It does not matter if the CHARACTER SET in the target column is not latin1; MySQL will transcode it in flight.
(Alternatively, you could change the incoming data to have utf8 (hex: C3BC), but that may be more hassle.)
Reference: "truncated" in Trouble with UTF-8 characters; what I see is not what I stored
(As for how to check the hex, or do SHOW CREATE TABLE, we need to know what OS you are using and what tools you have available.)

MySQL Invalid UTF8 character string when importing csv table

I want to import an .csv file into MySQL Database by:
load data local infile 'C:\\Users\\t_lichtenberger\\Desktop\\tblEnvironmentLog.csv'
into table tblenvironmentlog
character set utf8
fields terminated by ';'
lines terminated by '\n'
ignore 1 lines;
The .csv file looks like:
But I am getting the following error and I cannot explain why:
Error Code: 1300. Invalid utf8 character string: 'M'
Any suggestions?
Nothing else I tried worked for me, including ensuring that my .csv was saved with UTF-8 encoding.
This worked:
When using LOAD DATA LOCAL INFILE, set CHARACTER SET latin1 instead of CHARACTER SET utf8mb4 as shown in https://dzone.com/articles/mysql-57-utf8mb4-and-the-load-data-infile
Here is a full example that worked for me:
TRUNCATE homestead_daily.answers;
SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE homestead_daily.questions;
SET FOREIGN_KEY_CHECKS = 1;
LOAD DATA LOCAL INFILE 'C:/Users/me/Desktop/questions.csv' INTO TABLE homestead_daily.questions
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(type, question, created_at, updated_at);
SELECT * FROM homestead_daily.questions;
See what the settings for the export were. Look for "UTF-8".
This suggests that "Truncated text" is caused by the data not being encoded as utf8mb4. Outside MySQL, "look for "UTF-8". (Inside, MySQL, utf8 and utf8mb4 work equally well for all European character sets, so the ü should not be a problem.
If it was exported as "cp1252" (or any of a number of encodings), the byte for ü would not be valid for utf8mb4, leading to truncation.
If this analysis is correct, there are two solutions:
Plan A: Export as UTF-8.
Plan B: Import as latin1. (You do not need to change the column/table definition, just the LOAD DATA.)
Just open the csv file in your text editor (like Nodepad++)
and change the file Encoding to UTF-8
then import your csv file
It's complaining about 'M' but I think it's in München and the actual problematic character is next one, the umlaut 'ü'.
One simple way to test would be to try loading a file with just the first 2 rows & see if that works. Then add the 3rd row, try again & see if that fails.
If you can't or don't want to replace these special characters in your data, then you'll need to start investigating the character sets configured in your CSV file, database, table, columns, tools etc...
Are you using MySQL 5.7 or above? Then something simple to try would be to change to character set utf8mb4 in your load data command.
See How MySQL 5.7 Handles 'utf8mb4' and the Load Data Infile for a similar issue.
Also see:
import geonames allCountries.txt into MySQL 5.7 using LOAD INFILE - ERROR 1300 (HY000)
Trouble with utf8 characters; what I see is not what I stored
“Incorrect string value” when trying to insert UTF-8 into MySQL via JDBC?

How to bypass invalid utf8 character string in mysql

I have a large text file containing Arabic text data. When I try to load it into a MySQL table, I get error saying Error code 1300: invalid utf8 character string. This is what I have tried so far:
LOAD DATA INFILE '/var/lib/mysql-files/text_file.txt'
IGNORE INTO TABLE tblTest
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';
I tried to ignore this error, but it does not work. I have tried LOCAL INFILE but it did not work, too. My database was created using DEFAULT CHAR SET UTF8 and DEFAULT COLLATE utf8_general_ci. The text file is utf-8 encoded.
I do not want the records which contain invalid utf8 characters. So how I can load the data with ignoring the records containing such invalid chars?
Thank in advance!
It would help to have the HEX of the naughty character.
A possible approach to reading all the text, then dealing with any bad characters:
Read into a column of type VARBINARY or BLOB.
Loop through the rows, trying to copy to a VARCHAR or TEXT column.
Another plan is to use utf8mb4 instead of utf8. It could be that the bad character is an Emoji or Chinese character that will work in utf8mb4, but not utf8.
Ignore errors
This may let you ignore errors:
SET #save := ##sql_mode;
LOAD DATA ...;
SET ##sql_mode := #save;
I have this problem when try use MySQL 5.7.14, too.
I returned to MySQL 5.6 and this problem is disappeared

Special characters (Cyrillic, Chinese) to MySQL database

I have a csv file, which contains some rows, which I want to insert to a MySQL table, using the LOAD DATA INFILE MySQL command. When I use the command, and the insert is ready, the special characters that are inserted are all messed up. The file stores the characters correctly (I think so, because when I open the file with an editor like EditPlus, the special characters are all mangled, but when opening with another editor, like EmEditor, the special characters appear correctly), the columns which will hold text with special characters are of colation utf8_general_ci, and they are either varchar columns or text columns. The table is an InnoDB table, with collation set to utf8_general_ci. I run the LOAD DATA INFILE command, from MariaDB command line, with the following parameters:
LOAD DATA INFILE '/path/to/csv/file' INTO TABLE tablename FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
EDIT: I also tried using the SET NAMES "utf8"; command, before using the LOAD DATA INFILE one, with no success:|
MySQL needs to know what encoding (character set) the file is saved in in order to read and interpret it correctly.
The server uses the character set indicated by the
character_set_database system variable to interpret the information
in the file. SET NAMES and the setting of character_set_client do
not affect interpretation of input. If the contents of the input file
use a character set that differs from the default, it is usually
preferable to specify the character set of the file by using the
CHARACTER SET clause. A character set of binary specifies “no
conversion.”
Figure out what encoding your file is actually saved in, or explicitly save it in a specific encoding from your text editor (the editor that does interpret the characters correctly already), then add CHARACTER SET ... into the LOAD DATA statement. See the documentation for details: http://dev.mysql.com/doc/refman/5.7/en/load-data.html
Probably your file is not UTF8. In your editor, when saving, check that your character encoding of the file is UTF8. The fact that the editor renders characters correctly, does not mean it is saved as UTF8. Character encoding is either an option when saving the file, either a file property somewhere in the menus (depends on the editor).

How to use unicode character as field separator in mysql LOAD DATA INFILE query

I'm trying to import a file into a MySql database using query below. The fields in the file are delimited by '¦' (broken vertical bar) U+00A6.
I'm executing the query from PhpMyAdmin
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
Somehow mysql ignores the field separator and loads all records into the first field of the table
You can represent it like X'A6'
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY X'A6'
SET NAMES utf8; -- You were probably missing this (or the equivalent)
LOAD DATA LOCAL INFILE "/path/filename.csv" INTO TABLE tablename
CHARACTER SET 'UTF8'
FIELDS TERMINATED BY '¦'
(assuming you can type ¦ in utf8, not latin1)
The clue was "Â" gibrish character at the end of every sting, together with ¦ mojibaked becomes ¦.
The source document is probably encoded in unwanted characters, such as what Excel does. Try opening and saving in TextEdit, or another bare bones text editor, making sure the data is saved as UTF-8. Then:
LOAD DATA LOCAL INFILE "filepath/file.csv" INTO TABLE `table`
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\¦';
With escape character before broken bar. Also use UTF-8 Unicode encoding on the entire table, not just the rows.