MySQL WorkBench Unicode hell under Windows(R) - mysql

I'm tired, guys... Since 6.2.5 the WB does not support unicode in its configuration.
Let's compare:
6.2.5:
Workplace restores just fine. Cyrillic in paths and names is OK.
8.0.2:
Same workplace isn't restorable: any Cyrillic in path/names turns into unreadable hieroglyphs
Almost 2yo ago I already filed the bugreport. But no reaction, no corrections, no attention...
So, please suggest me how to enable normal unicode support, or suggest me another functionally close tool. Fonts aren't problem: as you can see on second screensho, the 2-byte UTF8 chars were converted to 2-char representations (note repeating "D"-like symbol)

Related

Chinese characters encode error when move database from aws to google cloud sql

We have a website which is dealing with Chinese characters and was hosted on AWS.
Here I can save Chinese characters in database without any problem.
Now we move to Google Cloud and I am facing issue saving Chinese characters in database.
They display as 一地兩檢
I am following all rules like "column should be utf8-unicode-ci" and "database connection as utf8".
It is working fine on localhost.
Any Idea what can be problem ?
Thanks.
If the data (column) in the database holds (similar) UTF8-encoded data in both cases and the code/platform which handles the data in the web-page is the same (meaning not python 2 vs python 3 for example), the difference might be the current local setting, either of the Google server (environment-variables), the SQL-client (UTF8-settings) or the php-settings.
Lets start with the sql-client:
Try to run the php - function
mysqli_character_set_name()
to get the encoding. If it is not UTF-8 then set it with
mysqli_set_charset('utf8')
If this is not working ensure the php-html stuff by setting the charset in the META html-tag to utf-8
charset=utf-8
and enforce it with
declare(encoding='utf8')
Looks like you have latin1 somewhere in the processing.
一地兩檢 is "Mojibake" for 一地兩檢
See Mojibake in Trouble with UTF-8 characters; what I see is not what I stored
Some Chinese characters take 4 bytes, not just 3 bytes. So, I recommend you use utf8mb4, not simply utf8.

Reading CSV file with Chinese Character [One character cannot be shown]

When I am opening a csv file containing Chinese characters, using Microsoft Excel, TextWrangler and Sublime Text, there are some Chinese words, which cannot be displayed properly. I have no ideas why this is the case.
Specifically, the csv file can be found in the following link: https://www.hkex.com.hk/eng/plw/csv/List_of_Current_SEHK_EP.CSV
One of the word that cannot be displayed correctly is shown here:
As you can see a ? can be found.
Using mac file command as suggested by
http://osxdaily.com/2015/08/11/determine-file-type-encoding-command-line-mac-os-x/ tell me that the csv format is utf-16le.
I am wondering what's the problem, why I cannot read that specific text?
Is it related to encoding? Or is it related to my laptop setting? Trying to use Mac and windows 10 on Mac (via Parallel Desktop) cannot display the work correctly.
Thanks for the help. I really want to know why this specific text cannot be displayed properly.
The actual name of HSBC Broking Securities is:
滙豐金融證券(香港)有限公司
The first character, U+6ED9 滙, is one of the troublesome HKSCS characters: characters that weren't available in standard pre-Unicode Big-5, which were grafted on in incompatible ways later.
For a while there was an unfortunate convention of converting these characters into Private Use Area characters when converting to Unicode. This data was presumably converted back then and is now mangled, replacing 滙 with U+E05E  Private Use Area Character.
For PUA cases that you're sure are the result of HKSCS-compatibility-bodge, you can convert back to proper Unicode using this table.

Pinyin tone marked symbols and MySQL

I’m creating a MySQL database storing Chinese characters with associated pīnyīn pronunciations. I’ve set up everything to work in UTF-8 charset, so I’m having no troubles with most of the symbols I’m using. Except, strangely, some of certain latin characters with tone marks, and only when I write them into the database from $_POST, using PHP.
Those are: all characters with an acute accent (á, é, í, ó, ú), except ǘ (?!); and all characters with a grave accent (à, è ì ò ù), again, except ǜ. When they are typed into a form, and that form is submitted to the db, those characters are just cut off, like they never existed. E.g., cháng submits like chng. Any other characters (with a caron, like ǎ, or a macron, like ā) are written in fine, and so are actual Chinese characters.
Again, I’m using UTF-8 everywhere possible, and this sort of problem so far has been only experienced upon submitting data from a form. Before, I ran a script to manually insert an array, containing those characters, to the database, and everything went fine.
Any ideas?
I think you may post pinyin in a numbered format.
e.g. cháng as cha2ng
And dealing with the post information in php script by some mapping methods.
Here's a method to deal with it.
Convert numbered to accentuated Pinyin?
Hopefully, it helps you.
I got a solution!
Before:
SELECT 'liàng' = 'liǎng';
Change to:
SELECT CONVERT('liàng' USING BINARY)= CONVERT('liǎng' USING BINARY) as equal;

UTF-8 Character Encoding in SQL

I am trying out Bullzip's Access to mySQL app on an Access DB full of special chars like é and ä. The app allows you to specify UTF-8 encoding but in the resulting SQL file I get "Vieux Carré" instead of "Vieux Carré".
I tried opening the SQL file in UltraEdit and doing a UTF-8 conversion but it does not resolve this issue as I guess it is converting "é" and never sees the "é"?
What is a Good™ solution for this?
The problem is in the UTF-8 to Unicode conversion into or out of Access. Access, like SQL Server, can only natively store data in ASCII format or Unicode (UTF-16) (With Unicode compression off). In order to ensure a given value was stored properly, you would need to convert it to Unicode on storage and convert it back to UTF-8 on retrieval. You may be able to use the StrConv function for such a purpose.
I have the same problem with Bullzip convertor now, so it could still help someone.
It doesn´t show the special characters right if I have my pc language set to english. I have to switch it back to czech (language of the special characters) and it works now and SQL looks correct.

Migrating MS Access data to MySQL: character encoding issues

We have an MS Access .mdb file produced, I think, by an Access 2000 database. I am trying to export a table to SQL with mdbtools, using this command:
mdb-export -S -X \\ -I orig.mdb Reviewer > Reviewer.sql
That produces the file I expect, except one thing: Some of the characters are represented as question marks. This: "He wasn't ready" shows up like this: "He wasn?t ready", only in some cases (primarily single/double curly quotes), where maybe the content was pasted into the DB from MS Word. Otherwise, the data look great.
I have tried various values for "export MDB_ICONV=". I've tried using iconv on the resulting file, with ISO-8859-1 in the from/to, with UTF-8 in the from/to, with WINDOWS-1250 and WINDOWS-1252 and WINDOWS-1256 in the from, in various combinations. But I haven't succeeded in getting those curly quotes back.
Frankly, based on the way the resulting file looks, I suspect the issue is either in the original .mdb file, or in mdbtools. The malformed characters are all single question marks, but it is clear that they are not malformed versions of the same thing; so (my gut says) there's not enough data in the resulting file; so (my gut says) the issue can't be fixed in the resulting file.
Has anyone run into this one before? Any tips for moving forward? FWIW, I don't have and never have had MS Access -- the file is coming from a 3rd party -- so this could be as simple as changing something on the database, and I would be very glad to hear that.
Thanks.
Looks like "smart quotes" have claimed yet another victim.
MS word takes plain ascii quotes and translates them to the double-byte left-quote and right-quote characters and translates a single quote into the double byte apostrophe character. The double byte characters in question blelong to to an MS code page which is roughly compatable with unicode-16 except for the silly quote characters.
There is a perl script called 'demoroniser.pl' which undoes all this malarky and converts the quotes back to plain ASCII.
It's most likely due to the fact that the data in the Access file is UTF, and MDB Tools is trying to convert it to ascii/latin/is0-8859-1 or some other encoding. Since these encodings don't map all the UTF characters properly, you end up with question marks. The information here may help you fix your encoding issues by getting MDB Tools to use the correct encoding.