mysql Incorrect string value for a column [duplicate] - mysql

This is my environment: Client -> iOS App, Server ->PHP and MySQL.
The data from client to server is done via HTTP POST.
The data from server to client is done with json.
I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.
My questions are the following:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?
Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿ or á.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8 (if I get them in utf8mb4 the json_decode function doesn't work), then such symbols are encoded (e.g., ¿ is encoded to \u00bf).

MySQL's utf8 charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4 is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Yes.
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?
God no, use raw UTF-8 (utf8mb4) for all that is holy.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
Well, there's your problem; channeling your data through MySQL's utf8 charset will discard any characters above U+FFFF. Use utf8mb4 all the way through MySQL.
if I get them in utf8mb4 the json_decode function doesn't work
You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
echo json_encode('😀');
"\ud83d\ude00"
echo json_decode('"\ud83d\ude00"');
😀

Use utf8mb4 throughout MySQL:
SET NAMES utf8mb4
Declare the table/columns CHARACTER SET utf8mb4
Emoji and certain Chinese characters will work in utf8mb4, but not in MySQL's utf8.
Use UTF-8 throughout other things:
HTML:
¿ or á are (or at least can be) encoded in utf8 (utf8mb4)

Related

Python3 show Unicode symbols loaded from Mysql

I have strings (English words + foreign word + emojis) stored in the Mysql DB.
The data is loaded with
charset = 'latin1'
Then I preproccess the data with
str = str.encode('latin-1').decode('utf-8')
After doing so everything looks good except for the Unicode symbols that look like \u'******'
I would appreciate any help.
Don't use encode/decode, it only adds to your woes.
Your description not clear on the path taken for Emoji. Were they correctly encoded in UTF-8, but then mangled when stored into a latin1 column in the table?
Or was it something else?
See "Best practice" in Trouble with UTF-8 characters; what I see is not what I stored
If erroneously stored into latin1 column see "CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset" in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

MySQL Character Encodings for Connector/C

MySQL's character encoding mechanism is legendary in both it's complexity and it's opaqueness, and I have a question about how to correctly interpret string data being returned from a MySQL Connector/C query.
If my Connector/C code is set to UTF-8 (using mysql_set_character_set()), will the MySQL library (and/or server) transcode data in latin1 that's stored in the server to UTF-8 or am I still required to use mysql_fetch_field on a per-field basis to determine the character set of any string data?
Give this page a read: http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html
Since mysql_set_character_set() works like SET NAMES statement, it will modify the character set that the server sends back to the client.
SET NAMES indicates what character set the client will use to send SQL statements to the server. Thus, SET NAMES 'cp1251' tells the server, “future incoming messages from this client are in character set cp1251.” It also specifies the character set that the server should use for sending results back to the client. (For example, it indicates what character set to use for column values if you use a SELECT statement.)
It is extremely improbable to have collisions between properly encoded strings in Latin1 and UTF-8. You can check proper UTF-8 encoding and assume Latin1 for badly encoded strings and convert them yourself on a case by case basis. Assuming correct configuration and encoding everywhere is risky.

Special characters in mysql - charset

I have problem with MySQL charset.
Here is how it should look.
(https://ctrlv.cz/2VV2)
And here is what I get from db
(https://ctrlv.cz/E7w6)
So which charset I should use? I try utf8, utf8mb4...
Cheers
Mojibake. This is the classic case of
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
Conversion:
CONVERT(CONVERT(BINARY('★') USING latin1) USING utf8) = '★'

UTF8 database text not working on cakephp3

I upgrade my application from CakePHP 2.7.7 to CakePHP 3.1.5
The old app (Cake 2) is working perfectly with UFT-8 encoding. But on CakePHP 3 UTF-8 text which is comming from mysql db is not showing correctly.
I changed the encoding on app.php file and also change the db encoding config.
What can be the reason for wrong encoding after updating from CakePHP 2 to 3?
If the Unicode runes are coming back garbled in some cases, try changing your character set to "utf8mb4" instead.
It's important to note that on MySQL, the "utf8" character set does not actually encode the full UTF-8 set of Unicode runes. This is for historical reasons (specifically, UTF-8 wasn't entirely defined when MySQL implemented it).
The "utf8mb4" character set encodes the full Unicode rune set and most of the time is actually the one you want.
All that said, you must look carefully at all the character set settings for your connection. PHP and MySQL have very finicky character set interactions, and if PHP isn't properly telling MySQL the character set it wants to use, things can break even if you've done all the above correctly.
For more info about PHP and MySQL character sets:
http://php.net/manual/en/function.mysql-set-charset.php
This is my favourite resource on "utf8" vs "utf8mb4":
https://mathiasbynens.be/notes/mysql-utf8mb4

MYSQL not recognizing some special characters

Why won't mysql recognize é and a lot more characters including em dash (—) ?? This is driving me nuts. i keep getting such errors like Incorrect string value: '\xE9' for column
I am using mysql 5.5.6 , my tables are innodb and using collation utf8-default collation.
I don't know if this is important but I am doing bulk insert from a csv file which contains special characters and my fields are of type TEXT
I had a similar problem trying to SELECT ... WHERE table_col LIKE "%–%" (long dash) turned out it wasn't working because my .php file which was sending the query wasn't in UTF8 but instead in ANSI! Converting it to UTF8 did the trick!!
Your problem sounds like one I have dealt with in the past, and I concur with Synchro that the client connection settings may be where you need to look. You probably need to specify UTF8 character set when starting the connection.
I use PDO, and initiate the connection with this:
$this->dbConn = new PDO("mysql:host=$this->host;dbname=$this->dbname", $this->user, $this->pass, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
Before I started using PDO, I used this:
mysql_query("SET NAMES 'utf8'");
See http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Just make sure the CSV file is in UTF8 and not the default ANSI. To do this open the csv file in notepad and using the save as option, ensure the encoding is in UTF8.
It's probably down to your PHP MySQL client's connection settings. Rob Allen's post can probably sort you out.
Rather than using a SET NAMES utf8 query, which the PHP docs explicitly warns against, there is a built-in function to do this for you in the mysqli extension: $mysqli->set_charset('utf8');.
An alternative explanation for bad characters if you're already doing this is that MySQL's utf8 charset isn't actually proper UTF-8... It only supports up to 3-byte characters and there are some increasingly common ones that use 4, specifically Emojis. Fortunately MySQL has a fix for this as of version 5.5.3: use the utf8mb4 charset instead.
On a related note, the sort order in the default utf8 charset (with the utf8_general_ci collation) has a number of problems that may affect you in, for example, German. The fix here is to use the utf8mb4_unicode_ci collation, which provides a more accurate, though slightly slower collation.