Special characters in mysql - charset - mysql

I have problem with MySQL charset.
Here is how it should look.
(https://ctrlv.cz/2VV2)
And here is what I get from db
(https://ctrlv.cz/E7w6)
So which charset I should use? I try utf8, utf8mb4...
Cheers

Mojibake. This is the classic case of
The bytes you have in the client are correctly encoded in utf8 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8.)
The column in the tables may or may not have been CHARACTER SET utf8, but it should have been that.
Conversion:
CONVERT(CONVERT(BINARY('★') USING latin1) USING utf8) = '★'

Related

mysql Incorrect string value for a column [duplicate]

This is my environment: Client -> iOS App, Server ->PHP and MySQL.
The data from client to server is done via HTTP POST.
The data from server to client is done with json.
I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.
My questions are the following:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?
Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿ or á.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8 (if I get them in utf8mb4 the json_decode function doesn't work), then such symbols are encoded (e.g., ¿ is encoded to \u00bf).
MySQL's utf8 charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4 is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Yes.
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?
God no, use raw UTF-8 (utf8mb4) for all that is holy.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
Well, there's your problem; channeling your data through MySQL's utf8 charset will discard any characters above U+FFFF. Use utf8mb4 all the way through MySQL.
if I get them in utf8mb4 the json_decode function doesn't work
You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
echo json_encode('😀');
"\ud83d\ude00"
echo json_decode('"\ud83d\ude00"');
😀
Use utf8mb4 throughout MySQL:
SET NAMES utf8mb4
Declare the table/columns CHARACTER SET utf8mb4
Emoji and certain Chinese characters will work in utf8mb4, but not in MySQL's utf8.
Use UTF-8 throughout other things:
HTML:
¿ or á are (or at least can be) encoded in utf8 (utf8mb4)

How to enter the Indian Rupee symbol in MySQL Server 5.1?

I am unable to find the exact solution for MySQL
The thing is the column supports by default UTF-8 encoding which consists of 3 bytes. The Indian Rupee Symbol, since it is new has a 4 byte encoding. So we have to change the character encoding to utf8_general_ci by,
ALTER TABLE test_tb MODIFY COLUMN col VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
After executing the above query simply execute the following query to insert the symbol,
insert into test_tb values("₹");
Ta-Da!!!
You are talking Oracle, yet it is tagged MySQL. Which do you want? And what language and/or client tool are you using?
Copy and paste it. Which Rupee do you like? ৲ ৳ ૱ ௹ ₨ ꠸
Probably you want this one:
UNHEX('E282A8') = '₨'
which is U+20A8 or 8360 in non-MySQL contexts
You need to have CHARACTER SET utf8 on the table/column.
You need to have done SET NAMES utf8 (or equivalent) when connecting.
Simplest way to do it is, utf8mb4 stores all the symbols
ALTER TABLE AsinBuyBox CONVERT TO CHARACTER SET utf8mb4;

Can I convert MySQL database character set from latin1 to utf8 without losing data?

I want to convert my database to store unicode symbols.
Currently the tables have:
latin_swedish_ci collation and latin1 character set
OR
utf8_general_ci collation and utf8 character set
I am not sure how the existing data is encoded, but I suppose it is utf-8 encoded, as I am using Django which I think encodes the data in utf-8 before sending to the database.
My question is:
Can I convert the tables to utf8_unicode_ci collation and utf-8 character set using the following queries without messing up the existing data? (as sugested in this post)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Considering latin1 is subset of utf-8, I think it sould work. What do you guys think?
Thank you in advance.
P.S: The version of MySQL is: 5.1
Latin1 is not a subset of UTF-8 - ASCII is. Latin1, however, is represented in Unicode.
CONVERT TO should work, as long as the data was stored in the correct encoding in the first place. Django may have used UTF-8 on the database connection, but the database should have re-encoded on the fly.
To check the actual encoding used - Use the mysql command-line tool to execute an SQL query that selects a row that you know contains non-ASCII characters. Then use the mysql HEX() function to check the bytes used. If you see bytes greater than > 0x7f, check that they don't correspond to valid characters in https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout
If you have c396 sitting in a latin1 column, and you want it to mean Ö, then you are half way to "double encoding". Do not use CONVERT TO; that will really get you into "double encoding".
Instead, you need the 2-step ALTER.
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8 ...;
If you have already messed it up further, and now the Ö is hex C383E28093, then you need to fix double encoding.
This gets you the latin1 byte in 2 steps:
CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1) --> 'Ö' (C396)
HEX(CONVERT(CONVERT(UNHEX('C396') USING utf8) USING latin1)) --> 'Ö' in latin1 (D6)
This gets you the 2-byte utf8 encoding:
CONVERT(BINARY(CONVERT(CONVERT(UNHEX('C383E28093') USING utf8) USING latin1)) USING utf8)
Do you want the column to be latin1? Or utf8?

Unicode characters emoticons in MySQL with 4 bytes

I have to insert in Mysql Strings that may contain characters like '😂' . I tried this:
ALTER TABLE `table_name`
DEFAULT CHARACTER SET utf8mb4,
MODIFY `colname` VARCHAR(200)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;
and when I insert '😂';
INSERT INTO `table_name` (`col_name`) VALUES ('😂');
I get the following
SELECT * FROM `table_name`;
????
How can I do to get the correct value in the select statements?
Thanks a lot.
You will need to set the connection encoding to utf8mb4 as well. It depends on how you connect to MySQL how to do this. SET NAMES utf8mb4 is the API-independent SQL query to do so.
What MySQL calls utf8 is a dumbed down subset of actual UTF-8, covering only the BMP (characters 0000 through FFFF). utf8mb4 is actual UTF-8 which can encode all Unicode code points. If your connection encoding is utf8, then all data is squeezed though this subset of UTF-8 and you cannot send or receive characters above the BMP to or from MySQL.

utf8 and utf8_general_ci

I have problem inserting rows to my DB.
When a row contains characters like: 'è', 'ò', 'ò', '€', '²', '³' .... etc ... it returns an error like this (charset set to utf8):
Incorrect string value: '\xE8 pass...' for column 'descrizione' at row 1 - INSERT INTO materiali.listino (codice,costruttore,descrizione,famiglia) VALUES ('E 251-230','Abb','Relè passo passo','Relè');
But, if I set the charset to latin1 or *utf8_general_ci* it works fine, and no errors are found.
Can somebody explain me why does this happens? I always thought that utf8 was "larger" than latin1
EDIT: I also tried to use mysql_real_escape_string, but the error was always the same!!!!
mysql_real_escape_string() is not relevant, as it merely escapes string termination quotes that would otherwise enable an attacker to inject SQL.
utf8 is indeed "larger" than latin1 insofar as it is capable of representing a superset of the latter's characters. However, not every byte-sequence represents valid utf8 characters; whereas every possibly byte sequence does represent valid latin1 characters.
Therefore, if MySQL receives a byte sequence it expects to be utf8 (but which isn't), some characters could well trigger this "incorrect string value" error; whereas if it expects the bytes to be latin1 (even if they're not), they will be accepted - but incorrect data may be stored in the table.
Your problem is almost certainly that your connection character set does not match the encoding in which your application is sending its strings. Use the SET NAMES statement to change the current connection's character set, e.g. SET NAMES 'utf8' if your application is sending strings encoded as UTF-8.
Read about connection character sets for more information.
As an aside, utf8_general_ci is not a character set: it's a collation for the utf8 character set. The manual explains:
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set.
According to the doc for UTF-8, the default collation is utf8_general_ci.
If you want a specific order in your alphabet that is not the general_ci one, you should pick one of the utf8_* collation that are provided for the utf8 charset, whichever match your requirements in term of ordering.
Both your table and your connection to the DB should be encoded in utf8, preferably the same collation, read more about setting connection collation.
To be completely safe you should check your table collation and make sure it's utf8_* and that your connection is too, using the complete syntax of SET NAMES
SET NAMES 'utf8' COLLATE 'utf8_general_ci'
You can find information about the different collation here
mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
Eurika, the above did it :-)