Rendering Chinese/UTF8 characters in MySQL Select using PuTTY & commandline client - mysql

Is there any good, straightforward way to connect to a MySQL database using MySQL's normal commandline client while connected using PuTTY and get it to render UTF8 fields that include non-Western characters properly?
My tables use UTF8 encoding, and in normal use the values come from an Android app and are displayed by an Android app. The problem is that occasionally, Things Go Wrong, and when they do, it's almost impossible for me to figure out what's going on because MySQL's commandline client forcibly casts UTF8 values to (what appears to be) ISO-8859-1 (ie, quasi-random gibberish when shown on the screen). For what it's worth, Toad for MySQL (both free and beta) seem to mangle UTF8 output the same way.
On a semi-related note, my favorite monospaced font is Andale Mono (yeah, I really like the forcibly-disambiguated 0/O and 1/l characters). I'm pretty sure it doesn't include CJK characters. Is there any (free) utility that can be used to rip the lower 127 or 256 characters from one Truetype font (like Andale Mono), and create a new Truetype font based on some UTF8 CJK Truetype font that replaces the lower 127 or 256 characters with the font data ripped from Andale Mono?

First you should make sure that your console encoding is set to UTF-8.
Using PuTTY you need to set the charset dropdown in "Window" > "Translation" to UTF-8
Second MySQL distincts the data charset and the connection charset.
When your data is UTF-8 encoded but your connection charset is set to e.g. "ISO-8859-1" MySQL will automatically convert the output.
The easiest way to set the charsets permanently is to update your client my.cnf with the following:
[client]
default-character-set=utf8
Detailed information about the connection charset you can find here:
http://dev.mysql.com/doc/refman/5.5/en/charset-connection.html
When using the MySQL API functions ( PHP client e.g. ) you can set the connection charset by sending the query
SET NAMES utf8
Various implementations of the MySQL API also support setting the charset directly.
e.g. http://www.php.net/manual/en/mysqli.set-charset.php

Related

Convert UTF8MB4 additional characters back to UTF8 counterpart

Due to the problems with the driver of a specific software program(we can't change or upgrade) we need to translate the additional characters of UTF8MB4 back to UTF8 for example the Bold and other styled characters in the MB4, must turn to regular.
Does anybody have a MySQL 8 query to do this, or is there a python3 library that we could use?
Or any other way we could do this? The software program can't solve the driver issue.
For example: 𝗖𝗼𝗿𝗼𝗻𝗮𝘁𝗲𝘀𝘁𝗲𝗻 to Coronatesten

Chinese characters encode error when move database from aws to google cloud sql

We have a website which is dealing with Chinese characters and was hosted on AWS.
Here I can save Chinese characters in database without any problem.
Now we move to Google Cloud and I am facing issue saving Chinese characters in database.
They display as 一地兩檢
I am following all rules like "column should be utf8-unicode-ci" and "database connection as utf8".
It is working fine on localhost.
Any Idea what can be problem ?
Thanks.
If the data (column) in the database holds (similar) UTF8-encoded data in both cases and the code/platform which handles the data in the web-page is the same (meaning not python 2 vs python 3 for example), the difference might be the current local setting, either of the Google server (environment-variables), the SQL-client (UTF8-settings) or the php-settings.
Lets start with the sql-client:
Try to run the php - function
mysqli_character_set_name()
to get the encoding. If it is not UTF-8 then set it with
mysqli_set_charset('utf8')
If this is not working ensure the php-html stuff by setting the charset in the META html-tag to utf-8
charset=utf-8
and enforce it with
declare(encoding='utf8')
Looks like you have latin1 somewhere in the processing.
一地兩檢 is "Mojibake" for 一地兩檢
See Mojibake in Trouble with UTF-8 characters; what I see is not what I stored
Some Chinese characters take 4 bytes, not just 3 bytes. So, I recommend you use utf8mb4, not simply utf8.

Perl-Application and queries with accented characters using postgres

It's been a decade I have worked with Postgres and Perl.
One of my oldest still-operated applications, an dictionary of government addresses and departmental responsibilities, has issues handling query terms containing accented characters, for example köln. In other words, whenever a query term contains a accented character (mainly umlauts) there are 0 results returned.
I have to mention that this behavior is only happening using this application with Postgres as the database. If I switch to MySQL5 (same data) same queries are working correctly.
Trying to track the cause of this problem I have checked the following:
Postgres database is UTF-8 (using the command show server_encoding;)
Postgres client encoding is also UTF8 (using show client_encoding;)
If I use the Postgres monitor and execute the same SQL query as the application does, using accented characters in the query term, I get correct results
The Perl application itself is handling UTF-8, the HTML-Header is set correctly, contents of the output display correct and not garbled
All Perl code files, scripts, .pm package files and templates are UTF-8 encoded (I verified that with file --mime perl_file_name)
I fiddled with the database connection, setting $self->{dbh}->{pg_enable_utf8} = 1; or/and $self->{dbh}->do("SET CLIENT_ENCODING TO 'UTF8';"); or/and $self->{dbh}->do("SET NAMES 'UTF8';"); with no change
I've updated the DBD::Pg module to version 3.6.2, no change.
So I am pretty much out of ideas what else to check or try to get Postgres fully working. Like mentioned in my intro, same application just using MySQL as database works flawlessly.
2 years ago the application was changed to handle UTF-8 data, I did not do the changes myself, but as far as I can see in the code (compared to the code in my GIT repo) its just the HTML UTF8-Header print "Content-type: text/html; charset=utf-8\n\n"; and a few unrelated template parts. Perhaps that change somewhere is the origin for all the problems but I don't know what esp. to adjust for Postgres.
The current Perl version is 5.22.1, using Apache/2.2.22 (Ubuntu). The vhost configuration is simple:
AddHandler cgi-script .cgi .pl
ScriptAlias /...abs-path-to-app.../cgi-bin/
<Directory "/...abs-path-to-app.../cgi-bin/">
AllowOverride None
Options +Indexes +ExecCGI +MultiViews +SymLinksIfOwnerMatch
<IfVersion < 2.4>
Allow from all
</IfVersion>
<IfVersion >= 2.4>
Require all granted
</IfVersion>
Allow from all
</Directory>
Postgres is version 9.1.24.
Edit:
Collate and Ctype is set to en_US.UTF-8, Encoding is set to UTF-8 for the database in question.
Taking a look into the tables, all character varying columns use pg_catalog."default" collation. Executing show lc_collate; show already mentioned en_US.UTF-8.
Edit2:
Using the DBD::Pg flag pg_enable_utf8 and setting it to 0 seems to work out and I get the expected results. Using a value other than 0, for example '-1or1` does not work. I tried out that flag (once again) right after the database connect. Actually I have to verify this as I still do not really understand what's going on.

Deciphering MySQL Encoding

I'm having an issue with encoding in MySQL, and I need some help in figuring out what's going on.
First, some parameters. The default encoding of the table is utf8. The character_set_client, character_set_connection, collation_connection, and character_set_server MySQL system variables, though, are all latin1.
I ssh into my MySQL server and I connect to the local server using the local command line client. I select record/column and the string that's returned, let's say the character comes back as A, which is correct. A is represented by hex in UTF-8 as "C5 9F."
However, the PHP app that hits the server interprets it as XY. In the MySQL commandline client, if I send the command "SET NAMES utf8", it will also now display it as XY.
If I do a select INTO OUTFILE and use hexedit to edit the file, I see two hex characters that map to X, then two hex characters that map to Y. ("c3 85" for X and "C5 B8" for Y). Basically, it's taking the two hex values and displaying them indeed as UTF8 characters.
First and foremost, it looks like the database is indeed storing things as UTF8, but the wrong kind of UTF8, correct? Are they going in as raw Unicode, but somehow, maybe because of the sytem variables, it is not being translated to UTF8?
Second, how/why is the MySQL command line client correctly interpreting XY as A?
Finally, to the successful interpretation of the MySQL command line, is there a chart that shows how C3 85 C5 B8 is getting converted to A, or XY is getting converted to A?
Thanks a bunch for any insight.
Your question is kind of confusing, so I'll explain with an example of my own:
You connect to the database without issuing SET NAMES, so the connection is set to Latin-1. That means the database expects any communication between you and it to be encoded in Latin-1.
You send the bytes C3A2 to the database, which you want to mean "â" in the UTF-8 encoding.
The database, expecting Latin-1, is interpreting this as the characters "¢" (C3 and A2 in the Latin-1 encoding).
The database will store these two characters internally in whatever encoding the table is set to.
You connect to the database in a different fashion, running SET NAMES UTF-8. The database now expects to talk to you in UTF-8.
You query the data stored in the database, you receive the characters "¢" encoded in UTF-8 as C382 C2A2, because you told the database to store the characters "¢" and you are now querying them over a UTF-8 connection.
If you connected to the database again using Latin-1 for the connection, the database would give you the characters "¢" encoded in Latin-1, which are the bytes C3 A2. If the client that you used to connect is interpreting that in Latin-1, you'll see the characters "¢". If the client is interpreting that as UTF-8, you'll see the character "â".
Essentially these are the points at which something can screw up:
the database will interpret any bytes it receives as characters in whatever encoding is set for the connection and convert the encoding of these characters to match the table they're supposed to be stored in
the database will convert the encoding of any characters from the encoding they're stored in into the encoding of the connection when retrieving data
the client may or may not interpret the bytes it receives from the database into the right characters to display on screen, especially command line environments aren't always set to correctly display UTF-8 data
Hope that helps.

UTF-8 Character Encoding in SQL

I am trying out Bullzip's Access to mySQL app on an Access DB full of special chars like é and ä. The app allows you to specify UTF-8 encoding but in the resulting SQL file I get "Vieux Carré" instead of "Vieux Carré".
I tried opening the SQL file in UltraEdit and doing a UTF-8 conversion but it does not resolve this issue as I guess it is converting "é" and never sees the "é"?
What is a Good™ solution for this?
The problem is in the UTF-8 to Unicode conversion into or out of Access. Access, like SQL Server, can only natively store data in ASCII format or Unicode (UTF-16) (With Unicode compression off). In order to ensure a given value was stored properly, you would need to convert it to Unicode on storage and convert it back to UTF-8 on retrieval. You may be able to use the StrConv function for such a purpose.
I have the same problem with Bullzip convertor now, so it could still help someone.
It doesn´t show the special characters right if I have my pc language set to english. I have to switch it back to czech (language of the special characters) and it works now and SQL looks correct.