I upgrade my application from CakePHP 2.7.7 to CakePHP 3.1.5
The old app (Cake 2) is working perfectly with UFT-8 encoding. But on CakePHP 3 UTF-8 text which is comming from mysql db is not showing correctly.
I changed the encoding on app.php file and also change the db encoding config.
What can be the reason for wrong encoding after updating from CakePHP 2 to 3?
If the Unicode runes are coming back garbled in some cases, try changing your character set to "utf8mb4" instead.
It's important to note that on MySQL, the "utf8" character set does not actually encode the full UTF-8 set of Unicode runes. This is for historical reasons (specifically, UTF-8 wasn't entirely defined when MySQL implemented it).
The "utf8mb4" character set encodes the full Unicode rune set and most of the time is actually the one you want.
All that said, you must look carefully at all the character set settings for your connection. PHP and MySQL have very finicky character set interactions, and if PHP isn't properly telling MySQL the character set it wants to use, things can break even if you've done all the above correctly.
For more info about PHP and MySQL character sets:
http://php.net/manual/en/function.mysql-set-charset.php
This is my favourite resource on "utf8" vs "utf8mb4":
https://mathiasbynens.be/notes/mysql-utf8mb4
Related
Dear nodejs experts and database gurus,
We face issues with storing emojis and other special characters in our MySQL database. We get an error by Prisma, which is the "ORM" we use:
Conversion from collation utf8_general_ci into utf8mb4_unicode_520_ci impossible for parameter
prisma:query INSERT INTO `app`.`show` [...]
PrismaClientUnknownRequestError: Error occurred during query execution:
ConnectorError(ConnectorError { user_facing_error: None, kind: QueryError(Server(ServerError {
code: 3988,
message: "Conversion from collation utf8_general_ci into utf8mb4_unicode_520_ci impossible for parameter",
state: "HY000" })) })
at cb (/app/node_modules/#prisma/client/runtime/index.js:38695:17) { clientVersion: '3.8.1' }
prisma:query ROLLBACK
We tried the following solutions already:
Using different collations on the database (db level, table level, field level): utf8mb4_unicode_520_ci, utf8mb4_unicode_ci, but the goal is to use a set which is most modern like utf8mb4_0900_ai_ci
Changing it to utf8 instead of utf8mb4 resulted in an error like Incorrect string value: '\\xF0\\x9F\\x92\\x95' and is not our goal as we want to be able to store all emojis
Dumping, changing all encoding statements in the sql dump and re-importing with parameters. We used a bunch of parameters in hope it will encode it differently like --default-character-set=utf8mb4 or --hex-blob
Defining a charset in our backend as we believe that the database is not the issue, but more the input we send to it like the old PHP way we knew ini_set("default_charset", "UTF-8"), but as we understood Nodejs does not care about the encoding/charset, but maybe there is a way
Defining a charset at the MySQL client, but we did not find a way to do so as we use prisma.io and maybe do not have access to the mysql client underhood. Prisma by itself seems to support utf8mb4 as this is in the docs https://www.prisma.io/docs/concepts/components/prisma-client/case-sensitivity#database-collation-and-case-sensitivity and we found a recent discussion about defining it on a more granular level here https://github.com/prisma/prisma/discussions/4743
Maybe two more valuable insights:
The data comes from a React Native app, but before the database migration, we had no issues, so we hope it is not a problem on the client-side, but for sure, we can specify something there as well, eg. setting HEADER information on the API call, which we do with axios
Before we used the db-as-a-service tool Airtable, which was quite a hassle to "break free" from them. We exported everything in CSV format and imported it in a local MySQL db powered with XAMPP (which had different options for encoding) and then imported the dump to our new MySQL cluster
Our goal is to
Set the right encoding character set in our new MySQL cluster
Send the data in the right format to the DB (setting the same set in NodeJS and React)
Hope you can help us! Thanks!
\\xF0\\x9F\\x92\\x95 (💕) needs the CHARACTER SET utf8mb4, not utf8. The COLLATION is irrelevant, other than that it must be consistent (that is, utf8mb4_...). (Note the 4 byte hex, and the 4 in utf8mb4.)
There are several places to specify the charset. It sounds like you missed a place.
I don't have any notes on Prisma or React, but for node.js:
var connection = mysql.createConnection({ ... , charset : 'utf8mb4'});
See Emoji are not inserting in database node js mysql
That's the connection; the table also needs it. Please provide SHOW CREATE TABLE so we can check that the column has the charset specified or, if not, then the table has the appropriate default.
Yes, the COLLATION utf8mb4_0900_ai_ci is probably best for MySQL 8.0. (It is not available in older versions of MySQL nor in any version of MariaDB.)
Note: Specifying just a COLLATION causes the CHARACTER SET to be set to the beginning part of the collation.
PHP's ini_set("default_charset", "UTF-8") should be fine, but is probably not relevant to the problem).
Please provide SHOW VARIABLES LIKE 'char%'; The _client, _connection and _results should all be utf8mb4.
Character set refers to 'encoding', as in what the bits represent. Collation deals with comparisons and sorting -- "A" < "a" or not, plus many other variants on that.
I had the exact same issue. I could solve it for the current connection by running the query SET NAMES utf8mb4 - but this doesn't solve it for the connection bool Prisma uses.
I'm currently experimenting with running SET NAMES utf8mb4 via init_connect which is applied to all new connections to mysql. It does seem to work but feels a bit like a workaround instead of a solution. Still not sure what causes prisma to use the "wrong" encoding.
This is my environment: Client -> iOS App, Server ->PHP and MySQL.
The data from client to server is done via HTTP POST.
The data from server to client is done with json.
I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.
My questions are the following:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?
Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿ or á.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8 (if I get them in utf8mb4 the json_decode function doesn't work), then such symbols are encoded (e.g., ¿ is encoded to \u00bf).
MySQL's utf8 charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4 is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Yes.
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?
God no, use raw UTF-8 (utf8mb4) for all that is holy.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
Well, there's your problem; channeling your data through MySQL's utf8 charset will discard any characters above U+FFFF. Use utf8mb4 all the way through MySQL.
if I get them in utf8mb4 the json_decode function doesn't work
You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
echo json_encode('😀');
"\ud83d\ude00"
echo json_decode('"\ud83d\ude00"');
😀
Use utf8mb4 throughout MySQL:
SET NAMES utf8mb4
Declare the table/columns CHARACTER SET utf8mb4
Emoji and certain Chinese characters will work in utf8mb4, but not in MySQL's utf8.
Use UTF-8 throughout other things:
HTML:
¿ or á are (or at least can be) encoded in utf8 (utf8mb4)
Whenever I try to save ñ it becomes ? in the mysql database. After some few readings it is suggested that I have to change my jsp charset to UTF-8. For some reasons I have to stick to ISO-8859-1. My database table encoding is latin1. How can I fix this? Please help.
Go to your database administration with MySQL WorkBench for example, put the Engine to InnoDB and the collation to utf8-utf8_general_ci.
You state in your question that you require a ISO-8859-1 backend (latin1), and a Unicode (UTF-8) frontend. This setup is crazy, because the set on the frontend is much larger than that allowed in the database. The sanest thing would be using the same encoding through the software stack, but also using Unicode only for storage would make sense.
As you should know, a String is a human concept for a sequence of characters. In computer programs, a String is not that: it can be viewed as a sequence of characters, but it's really a pair data structure: a stream of bytes and an encoding.
Once you understand that passing a String is really passing bytes and a scheme, let's see who sends what:
Browser to HTTP server (usually same encoding as the form page, so UTF-8. The scheme is specified via Content-Type. If missing, the server will pick one based on its own strategy, for example default to ISO-8859-1 or a configuration parameter)
HTTP Server to Java program (it's Java to Java, so the encoding doesn't matter since we pass String objects)
Java client to MySQL server (the Connector/J documentation is quite convoluted - it uses the character_set_server system variable, possibly overridden by the characterEncoding connection parameter)
To understand where the problem lies, first assure that the column is really stored as latin1:
SELECT character_set_name, collation_name
FROM information_schema.columns
WHERE table_schema = :DATABASE
AND table_name = :TABLE
AND column_name = :COLUMN;
Then write the Java string you get from the request to a log file:
logger.info(request.getParameter("word"));
And finally see what actually is in the column:
SELECT HEX(:column) FROM :table
At this point you'll have enough information to understand the problem. If it's really a question mark (and not a replacement character) likely it's MySQL trying to transcode a character from a larger set (let's say Unicode) to a narrower one which doesn't contain it. The strange thing here is that ñ belongs to both ISO-8859-1 (0xF1, decimal 241) and Unicode (U+00F1), so it'd seem like there's a third charset (maybe a codepage?) involved in the round trip.
More information may help (operating system, HTTP server, MySQL version)
Change your db table content encoding to UTF-8
Here's the command for whole DB conversion
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
And this is for single tables conversion
ALTER TABLE db_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
change your table collate to utf8_spanish_ci
where ñ is not equal to n but if you want both characters to be equal use
utf8_general_ci instead
I try several combinations, but this works for me:
VARCHAR(255) BINARY CHARACTER SET utf8 COLLATE utf8_bin
When data retrieve in dbforge express, shows like:
NIÑA
but in the application shows like:
NIÑA
I had the same problem. Found out that is not an issue about encoding UTF-8 or whatever charset. I imported my data from windows ANSI and all my Ñ and ñ where put in the database perfectly as it should be. Example last names showed on database last_name = "MUÑOZ". I was able to select normally from the database with query Select * from database where last_name LIKE "%muñoz%" and phpmyadmin show me results fine. It selected all "MUÑOZ" and "MUNOZ" without a problem. So phpmyadmin does show all my Ñ and ñ without any problems.
The problem was the program itself. All my characters mention, showed as you describe with the funky "MU�OZ" question mark. I had follow all advice everywhere. Set my headers correctly and tried all my charsets available. Even used google fonts and whatsoever font available to display correctly those last names, but no success.
Then I remembered an old program that was able to do the trick back and forth transparently and peeked into the code to figure it out: The database itself, showing all my special characters was the problem. Remember, I uploaded using windows ANSI encoding. Phpmyadmin did as expected, uploaded all as instructed.
The old program fixed this problem translating the Ñ to its UNICODE HTML Entity: Ñ (see chart here https://www.compart.com/en/unicode/U+00D1 ) a process done back and forth from MySQL to the app.
So you just need to change your database strings containing the letter Ñ and ñ to their corresponding UNICODE to reflect correctly on your browser with UTF charset.
In my case, I solved my issues replacing all my Ñ and ñ for their corresponding UNICODE in all the last names in my database.
UPDATE database_name
SET
last_name = REPLACE(last_name,
'MUÑOZ',
'MUÑOZ');
Now, Im able to display, browse, even search all my correct last names and accents/tildes, proper to spanish language. I hope this helps. It was a pain to figure it out, but an old program solved the problem. Best regards and happy coding !
Why won't mysql recognize é and a lot more characters including em dash (—) ?? This is driving me nuts. i keep getting such errors like Incorrect string value: '\xE9' for column
I am using mysql 5.5.6 , my tables are innodb and using collation utf8-default collation.
I don't know if this is important but I am doing bulk insert from a csv file which contains special characters and my fields are of type TEXT
I had a similar problem trying to SELECT ... WHERE table_col LIKE "%–%" (long dash) turned out it wasn't working because my .php file which was sending the query wasn't in UTF8 but instead in ANSI! Converting it to UTF8 did the trick!!
Your problem sounds like one I have dealt with in the past, and I concur with Synchro that the client connection settings may be where you need to look. You probably need to specify UTF8 character set when starting the connection.
I use PDO, and initiate the connection with this:
$this->dbConn = new PDO("mysql:host=$this->host;dbname=$this->dbname", $this->user, $this->pass, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
Before I started using PDO, I used this:
mysql_query("SET NAMES 'utf8'");
See http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Just make sure the CSV file is in UTF8 and not the default ANSI. To do this open the csv file in notepad and using the save as option, ensure the encoding is in UTF8.
It's probably down to your PHP MySQL client's connection settings. Rob Allen's post can probably sort you out.
Rather than using a SET NAMES utf8 query, which the PHP docs explicitly warns against, there is a built-in function to do this for you in the mysqli extension: $mysqli->set_charset('utf8');.
An alternative explanation for bad characters if you're already doing this is that MySQL's utf8 charset isn't actually proper UTF-8... It only supports up to 3-byte characters and there are some increasingly common ones that use 4, specifically Emojis. Fortunately MySQL has a fix for this as of version 5.5.3: use the utf8mb4 charset instead.
On a related note, the sort order in the default utf8 charset (with the utf8_general_ci collation) has a number of problems that may affect you in, for example, German. The fix here is to use the utf8mb4_unicode_ci collation, which provides a more accurate, though slightly slower collation.
In a very busy PHP script we have a call at the beginning to "Set names utf8" which is setting the character set in which mysql should interpret and send the data back from the server to the client.
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
I want to get rid of it so I set default-character-set=utf8 In our server ini file. (see link above)
The setting seems to be working since the relevant server parameters are :
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'latin1'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'latin1'
'character_set_system', 'utf8'
But after this change and commenting out set names utf8 call still the data starts to come out garbled.
Please advise....
Setting the encoding of the connection between PHP and MySQL is PHP's job; I don't think the MySQL setting will affect that.
I'd really recommend keeping some code in the application to set the connection character set to UTF-8. The application needs to ensure that the encoding is UTF-8, because it will presumably be telling web browsers its pages are UTF-8, and if those don't match you've got problems.
Since it's already part of the application's responsibility to decide the charset, you might as well make the application the one to specify the database connection character set, rather than leave it as a deployment issue and one more thing to get wrong when installing the app on a new server.
However I would personally use mysql_set_charset to do this rather than SET NAMES.
Similarly, if the application has code in it to create the schema, make sure that code tells MySQL to make the tables UTF-8, rather than leaving it up to the database's default settings and giving yourself another deployment issue to worry about.
skip-character-set-client-handshake
but I don't think that Set names query is too much job. You've got your problem out of nowhere. And going to limit your database functionality