Convert UTF8MB4 additional characters back to UTF8 counterpart - mysql

Due to the problems with the driver of a specific software program(we can't change or upgrade) we need to translate the additional characters of UTF8MB4 back to UTF8 for example the Bold and other styled characters in the MB4, must turn to regular.
Does anybody have a MySQL 8 query to do this, or is there a python3 library that we could use?
Or any other way we could do this? The software program can't solve the driver issue.
For example: 𝗖𝗼𝗿𝗼𝗻𝗮𝘁𝗲𝘀𝘁𝗲𝗻 to Coronatesten

Related

Chinese characters encode error when move database from aws to google cloud sql

We have a website which is dealing with Chinese characters and was hosted on AWS.
Here I can save Chinese characters in database without any problem.
Now we move to Google Cloud and I am facing issue saving Chinese characters in database.
They display as 一地兩檢
I am following all rules like "column should be utf8-unicode-ci" and "database connection as utf8".
It is working fine on localhost.
Any Idea what can be problem ?
Thanks.
If the data (column) in the database holds (similar) UTF8-encoded data in both cases and the code/platform which handles the data in the web-page is the same (meaning not python 2 vs python 3 for example), the difference might be the current local setting, either of the Google server (environment-variables), the SQL-client (UTF8-settings) or the php-settings.
Lets start with the sql-client:
Try to run the php - function
mysqli_character_set_name()
to get the encoding. If it is not UTF-8 then set it with
mysqli_set_charset('utf8')
If this is not working ensure the php-html stuff by setting the charset in the META html-tag to utf-8
charset=utf-8
and enforce it with
declare(encoding='utf8')
Looks like you have latin1 somewhere in the processing.
一地兩檢 is "Mojibake" for 一地兩檢
See Mojibake in Trouble with UTF-8 characters; what I see is not what I stored
Some Chinese characters take 4 bytes, not just 3 bytes. So, I recommend you use utf8mb4, not simply utf8.

Excel CSV String Not Fully Uploading To Excel

I have this string in Excel (I've UTF encoded It) when I save as CSV and import to MySql I get only the below, I know it's probably a charset issue but could you explain why as I'm having difficulty understanding it.
In Excel Cell:
PARTY HARD PAYDAY SPECIAL â UPTO £40 OFF EVENT PACKAGES INCLUDING HOTTEST EVENTS! MUST END SUNDAY! http://bit.ly/1Gzrw9H
Ends up in DB:
PARTY HARD PAYDAY SPECIAL
The field is structured to be utf8_general_ci encoded and VARCHAR(10000)
Mysql does not support full unicode utf8. There are some 4 byte characters that cannot be processed and, I guess, stored properly in regular utf8. I am assuming that upon import it is truncating the value after SPECIAL since mysql does not know how to process or store the character in the string that comes after that.
In order to handle full utf8 with 4 byte characters you will have to switch over to the utf8mb4.
This is from the mysql documentation:
The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters. The utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters...
You can read more here #dev.mysql
Also, Here is a great detailed explanation on reg-utf8 issues in mysql and how to switch to utf8mb4.

How to set unicode characters to database

I am working on twitter API in java I want to save search tweets in mysql database,I have changed default encoding type of table to utf-8 and collate to utf8_unicode_ci,also for column for which I am getting unicode values I have set default encoding type of to utf-8
and collate to utf8_unicode_ci. But stiil I am gettin data truncated for column,my data is not saved properly.
Please help me out.
Thanks in advance
Try to set the Connection Character Sets and Collations too using:
SET NAMES 'charset_name' [COLLATE 'collation_name']
and
SET CHARACTER SET charset_name
This post is quite old but since I was looking into the same issue today I stumbled into your question.
Since twitter supports emoticons aka Emoji you will have to switch to utf8mb4 instead of utf8. In a nutshell turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported!
Since astral symbols (whose code points range from U+010000 to U+10FFFF) each consist of four bytes in UTF-8, you cannot store them using MySQL’s utf8 implementation.
Here is a link to a tutorial discussing the matter and detaily explains how to do the conversion to utf8mb4.

Rendering Chinese/UTF8 characters in MySQL Select using PuTTY & commandline client

Is there any good, straightforward way to connect to a MySQL database using MySQL's normal commandline client while connected using PuTTY and get it to render UTF8 fields that include non-Western characters properly?
My tables use UTF8 encoding, and in normal use the values come from an Android app and are displayed by an Android app. The problem is that occasionally, Things Go Wrong, and when they do, it's almost impossible for me to figure out what's going on because MySQL's commandline client forcibly casts UTF8 values to (what appears to be) ISO-8859-1 (ie, quasi-random gibberish when shown on the screen). For what it's worth, Toad for MySQL (both free and beta) seem to mangle UTF8 output the same way.
On a semi-related note, my favorite monospaced font is Andale Mono (yeah, I really like the forcibly-disambiguated 0/O and 1/l characters). I'm pretty sure it doesn't include CJK characters. Is there any (free) utility that can be used to rip the lower 127 or 256 characters from one Truetype font (like Andale Mono), and create a new Truetype font based on some UTF8 CJK Truetype font that replaces the lower 127 or 256 characters with the font data ripped from Andale Mono?
First you should make sure that your console encoding is set to UTF-8.
Using PuTTY you need to set the charset dropdown in "Window" > "Translation" to UTF-8
Second MySQL distincts the data charset and the connection charset.
When your data is UTF-8 encoded but your connection charset is set to e.g. "ISO-8859-1" MySQL will automatically convert the output.
The easiest way to set the charsets permanently is to update your client my.cnf with the following:
[client]
default-character-set=utf8
Detailed information about the connection charset you can find here:
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html
When using the MySQL API functions ( PHP client e.g. ) you can set the connection charset by sending the query
SET NAMES utf8
Various implementations of the MySQL API also support setting the charset directly.
e.g. http://www.php.net/manual/en/mysqli.set-charset.php

UTF-8 Character Encoding in SQL

I am trying out Bullzip's Access to mySQL app on an Access DB full of special chars like é and ä. The app allows you to specify UTF-8 encoding but in the resulting SQL file I get "Vieux Carré" instead of "Vieux Carré".
I tried opening the SQL file in UltraEdit and doing a UTF-8 conversion but it does not resolve this issue as I guess it is converting "é" and never sees the "é"?
What is a Good™ solution for this?
The problem is in the UTF-8 to Unicode conversion into or out of Access. Access, like SQL Server, can only natively store data in ASCII format or Unicode (UTF-16) (With Unicode compression off). In order to ensure a given value was stored properly, you would need to convert it to Unicode on storage and convert it back to UTF-8 on retrieval. You may be able to use the StrConv function for such a purpose.
I have the same problem with Bullzip convertor now, so it could still help someone.
It doesn´t show the special characters right if I have my pc language set to english. I have to switch it back to czech (language of the special characters) and it works now and SQL looks correct.