utf8mb4 characters lost in export/import in Sequel Pro - mysql

I'm using the Encoding UTF-8 Unicode (utf8mb4) and Collation utf8mb4_unicode_520_ci for both tables and fields in my MySQL database.
When I export the database from Sequel Pro and open the exported .sql file in a text editor my test character πŒ† appears correctly, but when I import the file back into Sequel Pro it appears as ???? both in Sequel Pro and in my PHP/MySQL app.
In the import window I've tried Autodetect and Unicode (UTF-8) without success. Any ideas?
Also, is there any newer encoding out there that I should use instead () and is there any benefit of using utf8mb4_unicode_520_ci instead of just utf8mb4_unicode_ci?
Edit / Here's a picture of what I'm trying to do. It seems like my "odd" character is on track all the way until I'm trying to import the .sql file back into Sequel PRO.

The COLLATION does not matter except for ordering. The CHARACTER SET does matter, since this is a 4-byte code.
Somehow CHARACTER SET utf8 got involved, in spite of what you say. See "question marks" in Trouble with utf8 characters; what I see is not what I stored for the likely causes.
Do SELECT HEX(...) ... to verify that that character was actually stored as hex F09D8C86.
Provide SHOW CREATE TABLE so we can verify that the column is utf8mb4.
And, let's see the connection parameters.

Related

How to get mysqldump to export a database in the right encoding and collation?

I'm having a problem with the encoding when dumping a database using mysqldump.
The issue is that the file being generated is breaking non-ASCII characters (for ex. german and spanish characters). The data in the DB is right, but it is exported wrong.
I have tried the following:
using --default-character-set to utf8, utf8mb4, and latin1 (the last option because although the tables are using utf8_general_ci collation, the database itself is set to latin1, I don't know why). Weirdly enough, the output differs in filesize, but the content (specially the problematic characters) shows the same issue in all three cases. As if the option would be ignored.
importing the dumped file into a new mysql service, but since the characters are broken in the file, the import is also broken. for ex. the dump with the utf8mb4 option is imported in a fresh database with character encoding utf8mb4, but since the source file is wrongly encoded, it is not being "transcoded back" to a right form.
Initially I thought that it could be an issue with the version of the mysql server being different (5.7 in the source, 8.0 in the destination server), but since the file seems to be already broken, I now think that this might not be the root-cause. Still lost, so I prefer to mention it just in case it helps.
An example of the sentence I'm running:
mysqldump --default-character-set=utf8mb4 --no-tablespaces -u database_user -p database_name > /home/username/database_name-utf8mb4-20220712.sql
No errors appear neither during the export nor during the import in the new server. Everything seems to run smooth, but the character encoding is messed up, so something isn't OK.
Any support is much appreciated. Thank you!
but the character encoding is messed up
Give us an example. Include a hex dump of a small portion of the file where garbage shows up.
It is likely that the original data was either in character set utf8 or latin1, but the dumping and/or reloading specified the wrong character set. Please provide more details of the dump and load.
Also see: Trouble with UTF-8 characters; what I see is not what I stored

Issue when deploying mysql db (utf8mb4_unicode_520_ci -> utf8mb4_unicode_ci)

I started working on a wordpress on my dev machine. mysql version is 5.6, and worpdress is 4.7 so its already using the utf8mb4_unicode_520_ci encoding if it detects its possible.
My problem is that on my hosting (mysql 5.5) utf8mb4_unicode_520_ci is not recognized as a valid encoding. So I'm trying to target utf8mb4_unicode_ci encoding as my hosting knows about this one, and if I understand correctly, this would - in opposition to going to utf8 - allow me to keep the 4 bytes.
I tried several different combinaison of encoding and collation set up for the db, but nothing successful (from here How to convert an entire MySQL database characterset and collation to UTF-8?).
I tried several combination of encoding and collation in the wp-config, but nothing.
Everything that is coming from the database (like post titles and post contents displays badly encoded char for all diatrics, anything else is displayed appropriately )
menu label from the database display incorrectly, where the hardcoded/translated label display correctly
I think I need to convert the actual content of the database, changing charset and collation does not seems to be enough.
I found this but it does not address my problem directly, or if it does I missed it.
Any help would be appreciated
β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”
UPDATE :
here is the precise procedure I went through:
Initial situation:
I installed a wordpress (4.6.1) locally (on my dev machine, mysql 5.6.28).
I worked on the theme and plugin locally
(at this moment I have, locally, a database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci
Problem:
I want to deploy my wordpress on my hosting (mysql: 5.5 - db collation seems to be utf8mb4_unicode_ci).
I mysqldump the db locally, then try to import it on my hostings' phpmyadmin.
This gives error :
Unknown collation: 'utf8mb4_unicode_520_ci'
solution 1 change the tables charset to utf8mb4_unicode_ci:
On my hosting sql server, utf8mb4_unicode_520_ci is not available and I can't get a more recent version of mysql.
utf8mb4_unicode_ci seems like the closest and is available on my hosting sql server.
from various so question, I adapt a bash script to change charset and collation of my tables
for tbl in wp_sij2017_commentmeta wp_sij2017_comments wp_sij2017_cwa wp_sij2017_links wp_sij2017_options wp_sij2017_postmeta wp_sij2017_posts wp_sij2017_term_relationships wp_sij2017_term_taxonomy wp_sij2017_termmeta wp_sij2017_terms wp_sij2017_usermeta wp_sij2017_users wp_sij2017_woocommerce_api_keys wp_sij2017_woocommerce_attribute_taxonomies wp_sij2017_woocommerce_downloadable_product_permissions wp_sij2017_woocommerce_order_itemmeta wp_sij2017_woocommerce_order_items wp_sij2017_woocommerce_payment_tokenmeta wp_sij2017_woocommerce_payment_tokens wp_sij2017_woocommerce_sessions wp_sij2017_woocommerce_shipping_zone_locations wp_sij2017_woocommerce_shipping_zone_methods wp_sij2017_woocommerce_shipping_zones wp_sij2017_woocommerce_tax_rate_locations wp_sij2017_woocommerce_tax_rates; do
mysql --execute="ALTER TABLE wp_sij_2017_original_copy.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
done
I run this script on the local db
I now have all my tables set to collation utf8mb4_unicode_ci
My db collation is still utf8
I mysqldump the db, then import it to my hosting and...
Import is successful.
I search and replace siteurl in the db.
I then visit the online website, I got SOME diatrics that renders a "question mark char"
Any text coming from the db has decoding issue AT SOME POINT
The source/html markup also has those "question mark char"
I have no idea where to look or what to do next
Clarification: CHARACTER SETs utf8 and utf8mb4 specify how characters are encoded into bytes. COLLATIONs *_unicode_*, etc, specify how those character compare.
The encoding for utf8mb4_unicode_ci and utf8mb4_unicode_520_ci are the same because they are encoded in the character set utf8mb4.
"database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci" -- that probably means that new tables in that database, unless specifically stated, will be CHARACTER SET utf8 COLLATION utf8_general_ci. That is the database setting is just a default for CREATE TABLE. Since your tables are already CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci, the database default is not relevant to them.
As long as the CHARACTER SET stays utf8mb4, no Emoji, Chinese, etc will be lost or otherwise mangled.
Do not use mysql40; it did not know about any CHARACTER SETs. Do not use CONVERT or CAST. Etc.
I assume the 520 is coming from the output of mysqldump? Do you have an editor that can handle a file that big? If so, simply edit it to change utf8mb4_unicode_520_ci to utf8mb4_unicode_ci throughout. Then load the dump. Problem solved?
Your fix
You did ALTER ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci on your local machine. That is probably an even better way -- since it will put your dev and prod machine in line with each other. That should have worked. Don't worry about what the "database" claims.
I'm find 'utf8mb4_unicode_520_ci' and replace with 'utf8mb4_unicode_ci' in .sql file.
Its simplest why to solve this.

incorrect output 4 bytes symbols in mysql table with utf8mb4 encode

I want to insert via phpmyadmin 4 bytes character in the tabel. (phpmyadmin version is 5.5.33).
I assigned Server connection collation to utf8mb4_general_ci collation;
Database has utf8mb4 encode;
Table and column has utf8mb4 encode;
I tryed to insert π©Έ½ symbol and it was success and without any errors! But this symbol in the table is displayed as ????.
Can someone help, please?
So I would reccomend you to check what is the application web encoding because your problem is not the data itself is the program that is printing it. If your php administration tool or the web container (apache most probably) that is hosting this application doesn't have your character encoding you wont see your character. Most of theese application use just UTF8 as encoding therefore I suggest you to change your database to this encoding just UTF8 and the collation to utf8_general_ci.
Your question is most probably related with this one How to display UTF-8 characters in phpMyAdmin?

Problematic Turkish Characters in phpMyAdmin at Media Temple

I am using media temple and I create my tables like this using a PHP file (encoded in UTF-8 without BOM):
CREATE TABLE table_name (
...
) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_unicode_ci
I have two situations:
1 - Inserted some rows into table via php code. Turkish characters are displayed weirdly in phpMyAdmin, however when I print them on browser, they look correct.
2 - I add some data with Turkish characters into the table via phpMyAdmin SQL Query tab. This time I see correct characters in phpMyAdmin, however, when I print table rows to browser, I got quesionmarks instead of Turkish characters.
My browser uses UTF-8 as character encoding. I tried "utf8_turkish_ci" as collation for the table but no effect. I changed phpMyAdmin language to Turkish but it didn't work neither. When I export database from Media Temple, all Turkish character are replaced with weird ones. Do I missing something?
I solve my problem by using mysqli. I was in a paradox while using mysql extension in PHP such that when I get correct characters in phpMyAdmin, I get wrong on browser or vice versa. I just convert codes, nothing more (connect, set names to utf8 and do whatever you want). Every operation looks fine now. I don't why but changing extension solved the problem.

Transfer old 3.23.49 MySQL database to 5.0.51 MySQL database - Encoding in ANSI and UTF-8

I want to transfer a 3.23.49 MySQL database to a 5.0.51 MySQL database. Now I have exported the SQL file and I'm ready for import. I looked in the sql-file and Notepad++ shows me that the files is encoded in ANSI. I looked in the values and some of them are in ANSI and some of them are in UTF-8. What is the best way to proceed?
Should I change the encoding within Notepad++?
Should I use ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;?
Should I use iconv?
Do I have to look through each table and make the necessary changes?
Whate are the settings for the import? MYSQL323 compatibility mode and encoding latin1?
Do I have to be aware of something if the php-scripts are using another encoding?
Thank you for your hints!
If the problem is to import a utf8-encoded mysql dump, the solution is usually to add --default-character-set=utf8 to mysql options:
mysql --default-character-set=utf8 -Ddbname -uuser -p < dump.sql
UPD1: In case the dump file is corrupted, I would try to export the database once again table by table so that the dump would result in a correct utf8 encoded file.
I have converted a MySQL 4.0 database (which also had no notion of character encoding yet) to MySQL 5.0 four years ago, so BTDT.
But first of all, there is no "ANSI" character encoding; that is a misconception and a misnomer that has caught on from the early versions of Windows (there are ANSI escape sequences, but they have nothing to do with character encoding). You are most certainly looking at Windows‑1252-encoded text. You should convert that text to UTF‑8 as then you have the best chance of keeping all used characters intact (UTF‑8 is a Unicode encoding, and Unicode contains all characters that can be encoded with Windows-125x, but at different code points).
I had used both the iconv and recode programs (on the Debian GNU/Linux system that the MySQL server ran on) to convert Windows‑1252-encoded text of a MySQL export (created by phpMyAdmin) to UTF‑8. Use whatever program or combination of programs works best for you.
As to your questions:
You can try, but it might not work. In particular, you might have trouble opening a large database dump with Notepad++ or another text editor.
Depends. ALTER TABLE … CONVERT TO … does more than just converting encodings.
See the paragraph above.
Yes. You should set the character encoding of every table and every text field that you are importing data into, to utf8 (use whatever utf8_… collation fits your purpose or data best). ALTER TABLE … CONVERT TO … does that. (But see 2.)
I don't think MYSQL323 matters here, as your export would contain only CREATE, INSERT and ALTER statements. But check the manual first (the "?" icon next to the setting in phpMyAdmin). latin1 means "Windows-1252" in MySQL 5.0, so that might work and you must skip the manual conversion of the import then.
I don't think so; PHP is not yet Unicode-aware. What matters is how the data is processed by the PHP script. Usually the Content-Type header field for your generated text resources using that data should end with ; charset=UTF-8.
On an additional note, you should not be using MySQL 5.0.x anymore. The current stable version is MySQL 5.5.18. "Per the MySQL Support Lifecycle policy, active support for MySQL 5.0 ended on December 31, 2009. MySQL 5.0 is now in the Extended support phase." MySQL 5.0.0 Alpha having been released on 2003-12-22, Extended Support is expected to end 8 full years after that, on 2011‑12‑31 (this year).