Garbled special characters with SQL rendering of XML data - mysql

I have a DHTMLX grid on a page that saves data through a php connector file to a DB. The data from the grid is shown through xml encoding that is rendered in the PHP connector file.
Japanese words in the grid show up in Japanese but get saved as: ーダー
However they do stay in Japanese in the grid! (somehow...)
If I save something in the DB on php myadmin, it shows up in the grid as: ???
I checked and everything seems right...
DB fields: UTF-8 √
HTML headers: UTF-8 √
connector.php: UTF-8 √ (checked through network tab, devtools)
Is there anywhere else I should check?
When looking at the PHP file that gives me the DB values, I get XML data that's already garbled:
<rows><row id='00000000001'><cell><![CDATA[]]></cell><cell><![CDATA[??]]></cell><cell><![CDATA[33]]></cell><cell><![CDATA[]]></cell><cell><![CDATA[]]></cell><cell><![CDATA[?????????]]></cell>...
So maybe the problem lies before the data is received from the server. Does anyone know where I should look for the problem?

Were you expecting ーダー for ーダー? (Mojibake.)
Other times, do you get question marks?
Those two symptoms come from different causes. But both usually involve not declaring the client bytes to be utf8. In php, that can be done with mysqli_set_charset('utf8')
Question marks usually also involves failing to declare the column to be utf8.
To further diagnose, please do
SELECT col, HEX(col) FROM tbl WHERE ...
so we can see whether the text was mangled as it was inserted.

Related

How to fix garbled characters in PHPMyAdmin

My MySQL database contains some Chinese symbols and such (non-ASCII symbols). When I view them in PHPMyAdmin, they look garbled. However, if I display them on my website with PHP using the regular mysqli API, it looks fine so I assume the data is uploaded/stored properly in the database, so maybe the server connection collation is incorrect.
My PHP code for opening the database connection is:
function openConnection(): mysqli
{
$databaseHost = "localhost";
$databaseUser = "root";
$databasePassword = '';
$databaseName = "my-database-name";
$connection = new mysqli($databaseHost, $databaseUser,
$databasePassword, $databaseName);
if ($connection->connect_error) {
die("Connection failed: " . $connection->connect_error);
}
return $connection;
}
My PHPMyAdmin server connection collation is the default utf8mb4_unicode_ci which seems to be reasonable as well. My tables are also created with the default utf8mb4_general_ci. Shouldn't that work fine for any input users might make?
Calling $connection->get_charset() in PHP also returns the correct charset:
If I export the database data in MyPHPAdmin, the export is also garbled in Notepad++, I made sure to view it with UTF-8 encoding. If I import the garbled export again, the database will show the data as garbled once more and on the website the data now also shows as garbled. In this case, an actually corrupted export happened.
How can I solve this encoding problem? Clearly PHP can handle UTF-8 properly, my Apache web server is also serving UTF-8 and my database is configured seemingly correctly as well but there is an issue with PHPMyAdmin or the database/database table collation.
It looks like the issue was entirely elsewhere since I'm supplying data to PHP with C++ code. The C++ code uses the nlohmann JSON libary to build the data submitted to the PHP script. The issue was my inability to specifically encode std::strings to UTF-8 like described here when putting data into a C++ JSON object. With that said, everything is now working as expected.
⚈ If using mysqli, do $mysqli_obj->set_charset('utf8mb4');
⚈ If using PDO do somethin like $db = new PDO('dblib:host=host;dbname=db;charset=utf8mb4', $user, $pwd);
⚈ Alternatively, execute SET NAMES utf8mb4
Any of these will say that the bytes in the client are UTF-8 encoded. Conversion, if necessary, will occur between the client and the database if the column definition is something other than utf8mb4.
More notes on PHP: http://mysql.rjweb.org/doc.php/charcoll#php
If you have specific garbling, see Trouble with UTF-8 characters; what I see is not what I stored
If you suspect the data being fed from PHP to Notepad, dump a few Chinese characters in hex and shown to us. I would expect every 4th character to be hex F0 or every 3rd to be between E3 and EA. (These are the first byte for 4-char and 3-char UTF-8 encoding of Chinese characters.)
Does Notepad properly handle UTF-8, or does it need a setting?
If you are in the "cmd" in Windows, you may need chcp 65001; see http://mysql.rjweb.org/doc.php/charcoll#entering_accents_in_cmd That way, more non-English characters will display correctly.

UTF-8 encoded MS Access table

I have a MS Access table which is encoded in UTF-8 charset and the characters appear like this:
Participació en comissió
If I UTF-8 decode this text I get the correct text:
Participació en comissió
How can I utf-8 decode several Access table columns? I would like to end up with the same MS Access database but with the columns converted (utf-8 decoded). I cannot figure out an easy way to do this conversion.
Thanks in advance.
--
More clarifications:
So how did you decode the text that you have in the question?
I simply put the sentence in an online utf-8 decoder but it crashes when there is a lot of text. FYI, the Access table comes from a MS SQL Server database with Modern_Spanish_CI_AS collation and varchar (MAX) data type field. Maybe is there a way to perform the conversion while exporting the table from the MS SQL Server?
While searching for a solution I found this post that has a function to decode utf-8 fields right from the MS SQL Server. I tested it and it works perfectly, althought it is quite slow. Hope this helps someone else with the same problem.
New query editor and copy&paste the function provided in this link:
Convert text value in SQL Server from UTF8 to ISO 8859-1

MySQL Exporting Arabic/Persian Characters

I'm new to MySQL and i'm working on it through phpMyAdmin.
My problem is that i have imported some tables with (.sql) extension into a database with: UTF8_general_ci format and it contains some Arabic or Persian characters. However, when i export these data into an Excel file, they appear as the following:
The original value: أحمد الكمالي
The exported value: أحمد  الكمالي
I have searched and looked for this issue and tried to solve it by making the output and the server connection with the same format UTF8_general_ci. But, for some reason which i don't know, the phpMyAdmin doesn't allow me to change to the same format, it forces me to chose this: UTF8mb4_general_ci
Anyway, when i export the data, i'm making sure that the format is in UTF8 but it still appears like that.
How can i solve it or fix it?
Note: Here are some screenshots if you want to check organized by numbers.
http://www.megafileupload.com/rbt5/Screenshots.rar
I found easier way that you can rebuild excel file with correct characters.
Export your data from MySQL normally in CSV format.
Open new Excel and go to Data tab.
Select "From Text".if you not find this it is under "Get External Data".
Select your file.
Change file origin to Unicode(UTF-8) and select next.("Delimited" checked by default)
Select Comma delimiter and press finish.
you will see your language characters correctly.See more
Mojibake. Probably...
The bytes you have in the client are correctly encoded in utf8mb4 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8mb4.)
The column in the tables may or may not have been CHARACTER SET utf8mb4, but it should have been that.
(utf8 and utf8mb4 work equally well for Arabic/Persian.)
Please provide more details if this explanation does not suffice.

Problems importing excel data into MySQL via CSV

I have 12 excel files, each one with lots of data organized in 2 fields (columns): id and text.
Each excel file uses a diferent language for the text field: spanish, italian, french, english, german, arabic, japanese, rusian, korean, chinese, japanese and portuguese.
The id field is a combination of letters and numbers.
I need to import every excel into a different MySQL table, so one table per language.
I'm trying to do it the following way:
- Save the excel as a CSV file
- Import that CSV in phpMyAdmin
The problem is that I'm getting all sorts of problems and I can't get to import them properly, probably because of codification issues.
For example, with the Arabic one, I set everything to UTF-8 (the database table field and the CSV file), but when I do the import, I get weird characters instead of the normal arabic ones (if I manually copy them, they show fine).
Other problems I'm getting are that some texts have commas, and since the CSV file uses also commas to separate fields, in texts that are imported are truncated whenever there's a comma.
Other problems are that, when saving as CSV, the characters get messed up (like the chinese one), and I can't find an option to tell excel what encoding I want to use in the CSV file.
Is there any "protocol" or "rule" that I can follow to make sure that I do it the right way? Something that works for each different language? I'm trying to pay attention to the character encoding, but even with that I still get weird stuff.
Maybe I should try a different method instead of CSV files?
Any advice would be much appreciated.
OK, how do I solved all my issues? FORGET ABOUT EXCEL!!!
I uploaded the excels to Googledocs spreadsheets, downloaded them as CSV, and all the characters were perfect.
Then I just imported into their corresponding fields of the tables, using a "utf_general_ci" collation, and now everything is uploaded perfectly in the database.
One standard thing to do in a CSV is to enclose fields containing commas with double quotes. So
ABC, johnny cant't come out, can he?, newfield
becomes
ABC, "johnny cant't come out, can he?", newfield
I believe Excel does this if you choose to save as file type CSV. A problem you'll have is that CSV is ANSI-only. I think you need to use the "Unicode Text" save-as option and live with the tab delimiters or convert them to commas. The Unicode text option also quotes comma-containing values. (checked using Excel 2007)
EDIT: Add specific directions
In Excel 2007 (the specifics may be different for other versions of Excel)
Choose "Save As"
In the "Save as type:" field, select "Unicode Text"
You'll get a Unicode file. UCS-2 Little Endian, specifically.

Data in db is in wrong encoding (using CKeditor) and greek

I am using ckeditor 3.4 to insert data (text) to database and then display it on a page.
Problem: when I write (greek )in the ckeditor everything is fine. When I press the HTML button of the ckeditor again everything is fine (e.g. i see the actuall text typed not html entities). However when I save the data (and hence store them to the db) the stored data in the db are like this
"<p style="text-align: center;">
... σÏντομα πεÏισσότεÏες πληÏοφοÏίες...</p>
<p>
</p>"
Note: when I recall the data the are correctly displayed on the web page.
Actions taken so far:
1- the connection file to the db has the following: $conn->query("SET NAMES 'utf8'");
2- In the config.js of the ckeditor I have added the following lines
config.entities = false;
config.entities_greek = false;
config.entities_latin = false;
config.entities_processNumerical = false;
// Define changes to default configuration here. For example:
config.language = 'el';
// config.uiColor = '#AADC6E';
};
3- my webpages are set to: content="text/html;charset=utf-8"
4- db colation: utf8_unicode_ci / type MyIsam
I've been searching around but no luck.
I'd appreciate any help
Thank you all for your answers.
Solution was much simpler.
The right writing is SET NAMES UTF8 instead of SET NAMES 'utf8'
If you are using PHP or any other language that doesn't do this automatically, you need to invoke
SET NAMES 'UTF8'
on the connection before calling any statements, in order to use UTF-8 in your database.
Also make sure you are serving all pages as UTF-8 so that posted data is in UTF-8.
There are also some configuration parameters that controls how the data is sent and processed by the server, but I have never managed to get it to work without this statement.
se more here: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
EDIT: Ah, sorry, didn't see that you actually did this. If it is displayed correctly when you output it and your charset is set to UTF-8 on the page, then I'm assuming that you only view it in the DB with a tool that doesn't support UTF-8, or isn't configured for it? So what exactly is the problem right now?