MySQL Error #1366 -- Chinese Characters Fail with big5_chinese encoding - mysql

The idea: I'm just trying to save some Chinese characters to a MySQL database.
The issue: apparently, some save while others don't. I've tried to just put em in via phpMyAdmin, but when I try to save them, they turn out to be question marks "?".
The query: UPDATE a9286500_chinese.chinese SET chinese = '贵' WHERE chinese.id =23 LIMIT 1 ;
The error: Warning: #1366 Incorrect string value: '\xE8\xB4\xB5' for column 'chinese' at row 1
The collation of the table is big5_chinese_ci.
Characters like 我 (wo) and 你 (ni) work, whereas characters like 贵 (gui) don't.
Thoughts?

That character (贵) is not encodable in Big5. If you need to handle both Simplified and Traditional Chinese, then you should use a Unicode encoding, like UTF-8.

Related

Invalid unicode character causing MySQL string error

I need to add a record to our MySQL database (via Omeka) that includes an invalid unicode character (this one)
The error message I get via Omeka is:
Mysqli statement execute error : Incorrect string value: '\xF0\xAA\xA8\xA7\xE7\x94...' for column 'text' at row 1
The database field is longtext with collation utf8_unicode_ci. There are already a lot of records in this table and I'm not quite sure what I should change without affecting the other data already in it. Suggestions?
ALTER TABLE tbl CONVERT TO utf8mb4;
Meanwhile, the text for that row in that column is probably truncated or the whole row is missing.
As best as I can tell, F0AAA8A7 is not yet assigned, but I think it is in the area of Chinese characters, not Emoji, which also need utf8mb4. It is Unicode "codepoint" 2AA27.

detect with Python if the string will lead to "Incorrect string value" error in MySQL

I have a table in MySQL (5.7) database, which has collation utf8_unicode_ci,
and where I'm inserting some data with Python (3.6).
With some of the strings (for example, '\xCE\xA6') I get "Incorrect string value" error. On the DB side, I can mute this error by turning off the strict mode in MySQL, or changing the field's collation to utf8mb4.
However, such strings are "anomalies", and it is not desirable to change a collation or the sql_mode.
How can I detect in Python 3, that a given string will lead to "incorrect string value" error with MySQL, before inserting into a Table ?
Where do you get the error message? What operation is being performed?
C3A6 is the UTF-8 (cf MySQL's utf8 or utf8mb4) hex for æ; does it seem likely that that was the desired character?
To handle utf8 (or utf8mb4), you need to determine what the client's encoding. Sounds like UTF-8. So, when connecting to MySQL, tell it that -- use these in the connect call:
charset="utf8", use_unicode=True
If the character is in the python source, you need
# -*- coding: utf-8 -*-
at the beginning of the source.
Also the column you are inserting into needs to be CHARACTER SET utf8 (or utf8mb4).
utf8mb4 is needed for Emoji and some of Chinese; otherwise it is 'equivalent' to utf8.
Do not use decode() or any other conversion functions; that will just make things harder to fix. In this arena, two wrongs does not make a right; it makes a worse mess.
If you have other symptoms of garbled characters, see Trouble with UTF-8 characters; what I see is not what I stored
To discuss further, please provide the connection call, the SQL statement involved, SHOW CREATE TABLE, and anything else involved.
C3A6 is a valid utf8/utf8mb4 character æ, and could be interpreted as valid, though unlikely, latin1 æ. But it is invalid for CHARACTER SET ascii. (I don't know how the error message occurred unless the connection said ascii or some obscure charset.)

MYSQL: Inserting Traditional & Simplified Chinese in the same 'cell‘

newbie here!
I have source data that contains both simplified and traditional Chinese in the same 'cell' (sorry, newbie using Excel speak here!), which I'm trying to load into MYSQL using "Load Data Infile".
The offending text is "到达广州新冶酒吧!一杯芝華士 嘈雜的音樂 行行色色的男女". It's got both simplified Chinese ("广") and traditional Chinese ("華").
When I load it into MySQL, I get the following error:
Error Code: 1366. Incorrect string value: '\xF0\xA3\x8E\xB4\xE8\x83...' for column > 'Description' at row 2
The collation of the database is UTF-8 default collation, and the input file is also UTF-8 encoded.
Is there any way I can either:
a) Make SQL accept this row of data (ideal), or
b) Get SQL to skip inserting this line of data?
Thanks! Do let me know if you need further detail.
Kevin
If 😼 was tripping it up, that's because 😼 is not in the Basic Multilingual Plane of Unicode; it's in the Supplementary Multilingual Plane, which is above U+FFFF and takes up 4 bytes in UTF-8 instead of 3. Fully conformant Unicode implementations treat them no differently, but MySQL charset utf8 doesn't accept characters above U+FFFF. If you have a recent version of MySQL, you can ALTER TABLE to use utf8mb4 which properly handles all Unicode characters. There are some catches to changing, as MySQL allocates 4 bytes per character instead of 3; see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for the details.
This issue is a duplicate of Inserting UTF-8 encoded string into UTF-8 encoded mysql table fails with "Incorrect string value" .

How to get rid all strange characters that can't get into mysql from a string in vb.net

I have a problem with strange character.
In mysql this character 💀 cause error Incorrect string value: '\xF0\x9F\x92\x80'
Samples of these characters are from https://foursquare.com/v/shabushi-%E0%B8%8A%E0%B8%B2%E0%B8%9A%E0%B8%8A/4b72452cf964a5203c762de3
Say I want to analyze data from the web and found some strange character
how to remove this character 💀 ??
The occurance is rare.
In fact, how to know all characters that may be problematic for mysql and remove that? I am not saying escapeable characers. I am saying characters that are neither numeric, alphabeth, or chinese scripts, or punctuation. Characters that's totally bizarre.
How do I get rid of that from a string?
Older versions of MySQL can't deal with characters outside the BMP; upgrade your MySQL to at least 5.5 and set the column to use the utf8mb4 charset.

How can I process data to avoid MySQL "incorrect string value" error?

I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.
I have the encoding for Rails set as "utf8" in database.yml.
Also, the default character set for MySQL is utf8.
99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:
Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name'
at row 1:
INSERT INTO `organizations` ( [...] )
VALUES('Lawyers’ Committee', [...] )
It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".
Here's another one...
Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department'
at row 1:
INSERT INTO `addresses`
[...]
'TRInfo™ aoc'
[....]
Looks like it's choking on the "TM" after "TRInfo".
Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?
Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".
Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.
It looks like your input data is not in utf-8.
I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).
Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).
I had the same problem when putting contents of UTF-16 encoded files - which usually store one character per 16bit block - into mysql tables with java. The problem was that the UTF-16 encoded string contained so called surrogate pairs. It means two consecutive 16bit UTF-16 blocks encode one special character but cannot be translated into a corresponding UTF-8 encoding individually. See wikipedia for further explanation.
The solution was to simply replace these characters with spaces. This is the character range you might want to strip out of your string: U+D800–U+DFFF
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
It looks like your old database is in one string format (utf8?) and your rails is expecting something else. If you input is in utf8, have you tried configuring your rails to support it?
I encountered the same problem today.
After tried many times, I found out the reason and fix it at last.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), so you need to specify the character set and collation to utf8/utf8_general_ci when your create your database or table.
e.g.:
$sql = "CREATE TABLE " . $table_name . " (
id mediumint(9) NOT NULL AUTO_INCREMENT,
bookname varchar(128) NOT NULL,
author varchar(64) NOT NULL,
PRIMARY KEY (id),
KEY (bookname)
)CHARACTER SET utf8 COLLATE utf8_general_ci;";
Reference:
《mysql create table problem? SOLVED!!!!!!!!!!!》
http://forums.mysql.com/read.php?121,193883,193883
《10.1.5. Configuring the Character Set and Collation for Applications》
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
Hoping this can help you.
Adding binary before the weirdcolumn solves the problem.
In my case, I have an update trigger on tableA to insert data into other table.
There are some special characters in column weirdcolumn, and the update failed with message: "ERROR 1366 (HY000): Incorrect string value: '\xE7....'"
After I dig in a lot, I found the solution by adding binary before the string column name, or using cast(weirdcolumn as binary);
Hope this can help.
I had the same issue importing data from SQL Server to MySql using Php.
My solution was utf8_encode() when inserting into MySql and use utf8_decode() when retrieving from MySql to display into the browser.
Here you have my FULL code, that works good.
//For string values
$Gro2=(is_null($row["GrpNm"]))?"NULL":"\"".mysql_escape_string(utf8_encode($row["GrpNm"]))."\"";
$sqlMy ="INSERT INTO `tbl_name` VALUES ($Gro2)";
Please note: For new projects use
mysqli_escape_string()
link