MySQL Exporting Arabic/Persian Characters - mysql

I'm new to MySQL and i'm working on it through phpMyAdmin.
My problem is that i have imported some tables with (.sql) extension into a database with: UTF8_general_ci format and it contains some Arabic or Persian characters. However, when i export these data into an Excel file, they appear as the following:
The original value: أحمد الكمالي
The exported value: أحمد  الكمالي
I have searched and looked for this issue and tried to solve it by making the output and the server connection with the same format UTF8_general_ci. But, for some reason which i don't know, the phpMyAdmin doesn't allow me to change to the same format, it forces me to chose this: UTF8mb4_general_ci
Anyway, when i export the data, i'm making sure that the format is in UTF8 but it still appears like that.
How can i solve it or fix it?
Note: Here are some screenshots if you want to check organized by numbers.
http://www.megafileupload.com/rbt5/Screenshots.rar

I found easier way that you can rebuild excel file with correct characters.
Export your data from MySQL normally in CSV format.
Open new Excel and go to Data tab.
Select "From Text".if you not find this it is under "Get External Data".
Select your file.
Change file origin to Unicode(UTF-8) and select next.("Delimited" checked by default)
Select Comma delimiter and press finish.
you will see your language characters correctly.See more

Mojibake. Probably...
The bytes you have in the client are correctly encoded in utf8mb4 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8mb4.)
The column in the tables may or may not have been CHARACTER SET utf8mb4, but it should have been that.
(utf8 and utf8mb4 work equally well for Arabic/Persian.)
Please provide more details if this explanation does not suffice.

Related

Unicode text confounding Mysql queries

I have a PHP script that extracts attachments (Unicode text csv files) from gmail and uploads them to a mysql database. All of that works fine. But once in the database I cannot run the simplest of queries against the data.
If I first bring the file into Excel then export as a CSV file then all works fine, I can query and get the expected results.
I have done enough reading to understand (I think) that the issue is somehow related to the fact that Unicode text is either UTF8 or UTF16, but when I convert the table to either of those, the data comes in fine but I still cannot run a successful query.
Update:
I have an individual named White in the lastrep column of the data. The only way I can pull the associated records is by using wild cards between characters, as in:
SELECT * FROM `dailyactual` WHERE `lastrep` like "%W%h%i%t%e%"
Any help would be appreciated.
Jim
Use the UTF8MB4 collation. Instructions https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-upgrading.html
In utf8 or utf8mb4 character set, 'White' would be 'White' (hex 57 68 69 74 65). In utf16, there would be (effectively) a zero byte between each character; hex: 0057 0068 0069 0074 0065.
Can you get a hex dump of part of the file?
If you can specify the output of excel, do so. Else specify the input for mysql to be utf16 or whatever the encoding says. Since there are many ways of bringing a csv file into mysql, I can't be more specific.

Using iconv to convert mysqldump-ed databases

Trying to quickly convert a latin1 mysql DB to utf8, I tried the following:
Dump the DB
run iconv -f latin1 -t utf8 on the resulting file
import into a fresh DB with UTF8 default encoding
This mostly works except... some letters get converted wrong (an example: uppercase accented 'U' becomes some garbled sequence starting with a question mark). Some conversion is taking place (od an a query result shows a two byte sequence where the latin1 byte was) and te latin1 version is alright. While I have so far been unsystematic in isolating the problem (late night; under deadline; etc.) the weirdness of the issue kills me: why would it fail on some letters and not all? Client connection? Column charset? Why I am not getting any diagnostics? I'm stymied.
Sure, I can work on isolating the issue and its details, but thought that maybe somebody ran into this already and can recognize it by this (admittedly rather poor) description.
Cheers
The data may have been stored as latin1 but it's possible that what ever client you used to dump the data has already exported it as UTF-8.
Open the dump file in a decent text editor (Notepad++, TextWrangler, Atom) and check which encoding allows all characters to be displayed properly.
Then when it comes to import the data back in, ensure your client is set to use UTF-8 on the import.
Don't use iconv, it only muddies the works.
Assuming that a table is declared to be latin1 and correctly contains latin1 bytes, but you would like to change it to utf8, do this to the table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8;
It is also possible to do it with a dump and reload; it involves some changes to the arguments. Sorry I don't have the details.

Not able to display Chinese characters after loading it to Postgres DB

I have a source file which contains Chinese characters. After loading that file into a table in Postgres DB, all the characters are garbled and I'm not able to see the Chinese characters. The encoding on Postgres DB is UTF-8. I'm using the psql utility on my local mac osx to check the output. The source file was generated from mySql db using mysqldump and contains only insert statements.
INSERT INTO "trg_tbl" ("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col7",
"col8", "col9", "col10", "col11", "col12", "col13", "col14",
"col15", "col16", "col17", "col18", "col19", "col20", "col21",
"col22", "col23", "col24", "col25", "col26", "col27", "col28",
"col29", "col30", "col31", "col32", "col33")
VALUES ( 1, 1, '与é<U+009D>žç½‘_首页&频é<U+0081>“页顶部广告ä½<U+008D>(946×90)',
'通æ <U+008F>广告(Leaderboard Banner)',
0,3,'',946,90,'','','','',0,'f',0,'',NULL,NULL,NULL,NULL,NULL,
'2011-08-19 07:29:56',0,0,0,'',NULL,0,NULL,'CPM',NULL,NULL,0);
What can I do to resolve this issue?
The text was mangled before producing that SQL statement. You probably wanted the text to start with 与 instead of the "Mojibake" version: 与. I suggest you fix the dump either to produce utf8 characters or hex. Then the load may work, or there may be more places to specify utf8, such as SET NAMES or the equivalent.
Also, for Chinese, CHARACTER SET utf8mb4 is preferred in MySQL.
é<U+009D>ž is so mangled I don't want to figure out the second character.

phpMyAdmin won't display or insert Unicode characters properly into database

I'm using phpMyAdmin version 4.4.4 with MySQL 5.6 (charset is set to UTF-8 Unicode). The table in question has the collation set to utf8-general-ci and all fields are also set to utf8-general-ci collation as well. My php.ini file has default_charset = "UTF-8".
Despite all the UTF-8 settings for all three applications, unicode characters appear garbled when viewing a table within phpMyAdmin. So, instead of seeing ...
Søren
... in phpMyAdmin I see ...
Søren
Even though it displays garbled in phpMyAdmin, it displays correctly on the website. The only problem is with phpMyAdmin.
If I attempt to Insert a new record using phpMyAdmin and enter Søren in a text field, it displays like this within phpMyAdmin...
Søren
Which looks correct there, but, on the web page, it displays like this...
S�ren
The ø character is replaced with a question mark inside a black diamond instead of displaying the proper unicode character on the website.
What the heck is going on? How do I make phpMyAdmin display and insert the unicode characters properly into the table without mangling them? Thanks!
My php.ini file has default_charset = "UTF-8".
That only affects the charset used for some PHP built-in functions like htmlentities.
MySQL uses its own charset to decode stuff you send it. This can be set using $mysqli->set_charset('utf8') for mysqli, or mysql_set_charset('utf8') for the deprecated mysql module, or using charset=utf8 in the connection string in PDO.

ZF2 Doctrine2 MySql charset error

I've setup a MySQL DB with utf8_unicode_ci collation, and all the tables and columns on it have de same collation.
My Doctrine config have SET NAMES utf8 as a connection option and my html files use utf8 charset.
The text saved on those tables contain accented characters (á,è,etc).
The problem is that when I save the content to the DB, it saves with strange characters, like when I try to save ISO in UTF8 table. (e.g.: Notícias)
The only workaround that i've found is to, utf8_decode before save, and utf8_encode before printing.
That means that, for some reason, something in between is messing up utf8 with iso.
What might be?
Thanks.
EDIT:
I've setup to encode before saving and decode before printing, and it prints correctly but in DB my chars change to:
XPTÓ -> XPTÓ
This makes searching in DB for "XPTÓ" impossible...
I would print bin2hex($string); at each step of the original workflow (i.e. without encode/decode steps).
Go through each of:
the raw $_POST data
the values you get after form validation
the values that get put in your bound Entity
the values you'd get from the db if you query it directly using PDO (get this from
$em->getConnection())
the values that get populated into your Entity on reload (can do this via $em->detach($entity); $entity = $em->find('Entity', $id);)
You'd be looking at the point at which the output changes, and focus your search there.
I would also double check:
On the db: SHOW CREATE TABLE 'table' shows CHARSET=utf8 for the whole table (and nothing different for the individual columns)
That the tool you use to see your database values (Navicat, phpMyAdmin) has got the correct encoding set.