While using the Commcare Export tool for exporting data, the data is exported correctly in the Excel File and also in SQLite DB, however when we try to export the Data in MySql DB, the export breaks and gives us the following error:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)
(Refer attached Screenshot for the same)
The data is imported correctly into the DB until the Hindi Text is encountered. Once Hindi text is encountered, it breaks the process and gives the error.
We understand that the error may be due to the Devnagiri Text being inserted into the DB, so we tried to solve this issue by changing all the data columns to utf8_unicode_ci, but still the problem persists.
How can we fix this?
The default mysqldb connection uses latin-1. According to the SqlAlchemy docs you can set the connection encoding directly in the connection string:
http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#unicode
The CommCare HQ Export Tool Docs at
https://confluence.dimagi.com/display/commcarepublic/CommCare+Data+Export+Tool#CommCareDataExportTool-SQLURLformats
include this string, and suggest the format
mysql+pymysql://<username>:<password>#<host>/<database name>?charset=utf8
(in your case pymysql would be mysqldb)
are you including the charset in that connector? If not, that should correct issues with the cursor expected latin-1 encoded text.
Related
My MySQL database contains some Chinese symbols and such (non-ASCII symbols). When I view them in PHPMyAdmin, they look garbled. However, if I display them on my website with PHP using the regular mysqli API, it looks fine so I assume the data is uploaded/stored properly in the database, so maybe the server connection collation is incorrect.
My PHP code for opening the database connection is:
function openConnection(): mysqli
{
$databaseHost = "localhost";
$databaseUser = "root";
$databasePassword = '';
$databaseName = "my-database-name";
$connection = new mysqli($databaseHost, $databaseUser,
$databasePassword, $databaseName);
if ($connection->connect_error) {
die("Connection failed: " . $connection->connect_error);
}
return $connection;
}
My PHPMyAdmin server connection collation is the default utf8mb4_unicode_ci which seems to be reasonable as well. My tables are also created with the default utf8mb4_general_ci. Shouldn't that work fine for any input users might make?
Calling $connection->get_charset() in PHP also returns the correct charset:
If I export the database data in MyPHPAdmin, the export is also garbled in Notepad++, I made sure to view it with UTF-8 encoding. If I import the garbled export again, the database will show the data as garbled once more and on the website the data now also shows as garbled. In this case, an actually corrupted export happened.
How can I solve this encoding problem? Clearly PHP can handle UTF-8 properly, my Apache web server is also serving UTF-8 and my database is configured seemingly correctly as well but there is an issue with PHPMyAdmin or the database/database table collation.
It looks like the issue was entirely elsewhere since I'm supplying data to PHP with C++ code. The C++ code uses the nlohmann JSON libary to build the data submitted to the PHP script. The issue was my inability to specifically encode std::strings to UTF-8 like described here when putting data into a C++ JSON object. With that said, everything is now working as expected.
⚈ If using mysqli, do $mysqli_obj->set_charset('utf8mb4');
⚈ If using PDO do somethin like $db = new PDO('dblib:host=host;dbname=db;charset=utf8mb4', $user, $pwd);
⚈ Alternatively, execute SET NAMES utf8mb4
Any of these will say that the bytes in the client are UTF-8 encoded. Conversion, if necessary, will occur between the client and the database if the column definition is something other than utf8mb4.
More notes on PHP: http://mysql.rjweb.org/doc.php/charcoll#php
If you have specific garbling, see Trouble with UTF-8 characters; what I see is not what I stored
If you suspect the data being fed from PHP to Notepad, dump a few Chinese characters in hex and shown to us. I would expect every 4th character to be hex F0 or every 3rd to be between E3 and EA. (These are the first byte for 4-char and 3-char UTF-8 encoding of Chinese characters.)
Does Notepad properly handle UTF-8, or does it need a setting?
If you are in the "cmd" in Windows, you may need chcp 65001; see http://mysql.rjweb.org/doc.php/charcoll#entering_accents_in_cmd That way, more non-English characters will display correctly.
When I use the import feature of PHPMyAdmin, it doesn't import non-ASCII characters such as ä, ö, ü, õ and the rest of the word after the characters.
When I open the CSV file with Notepad it displays the non-ASCII characters normally, but when I'm trying to import it - it doesn't work.
Entering those missing characters manually works and MySQL saves them just as it should. Any thoughts?
mySQL will do this when it encounters a character that is invalid under the current character set.
You're not mentioning what tool you are using to import the data, but you should be able to specify a character set when importing. If that character set matches the database's, everything will be fine. Also, make sure the file is actually encoded in that character set.
If your import tool doesn't offer the option of selecting the character set, you could try phpMyAdmin which does.
Make sure you know what the encoding of your CSV file is - it should be UTF-8. Then before you import, type 'use utf8', and it should work fine.
I have problem with mysql database. I can't import a database from my friend.
I need some help.
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
ERROR:
Unexpected beginning of statement. (near "phpMyAdmin" at position 0)
Unrecognized statement type. (near "SQL" at position 11)
#1064 - Something is wrong in your syntax obok 'phpMyAdmin SQL Dump
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"' w linii 1
There´s nothing wrong with you syntax, but probably with your file:
most likely the file was edited and the text-editor (of course Windows notepad.exe) was too clever and added a BOM on saving.
Remove the first 3 bytes (HEX: EF BB BF), save the file without it (either use a hex editor or use PSPad and switch format to UNIX), and the importer should have no problem anymore.
The BOM fools the importer, the first - gets eaten and the importer no longer recognizes the first comment as such.
Wikipedia about BOM:
File comparison (w/o BOM)
I encountered exactly the same problem. Apparently, you use a version of phpMyAdmin which has bugs in the import module (in my case it was phpMyAdmin 4.5.5.1 packaged in Wamp 3.0.4). More precisely, it interprets comments (valid syntax with space after --) as SQL code. This is the case at the beginning of a dump created by phpMyAdmin: it typically starts with
-- phpMyAdmin SQL Dump
which explains your error message.
The import module of phpMyAdmin 4.5.5.1 was not able to parse escaped single quotes either (see https://github.com/phpmyadmin/phpmyadmin/issues/11721).
There are many possible workarounds to this problem:
Update phpMyAdmin
Use another tool to import your DB dump, for example MySQL Command Line or MySQL workbench
Less advisable: execute the contents of the .sql file as a query in your current version of phpMyAdmin (it has fewer bugs)
Less advisable: strip all comments from your .sql file
windows notepad and other editor, change encoding of file.
for change it to utf-8 open your file with "notepad++" and use Encoding menu then select UTF-8
now save your file
I'm new to MySQL and i'm working on it through phpMyAdmin.
My problem is that i have imported some tables with (.sql) extension into a database with: UTF8_general_ci format and it contains some Arabic or Persian characters. However, when i export these data into an Excel file, they appear as the following:
The original value: أحمد الكمالي
The exported value: Ø£Øمد  الكمالي
I have searched and looked for this issue and tried to solve it by making the output and the server connection with the same format UTF8_general_ci. But, for some reason which i don't know, the phpMyAdmin doesn't allow me to change to the same format, it forces me to chose this: UTF8mb4_general_ci
Anyway, when i export the data, i'm making sure that the format is in UTF8 but it still appears like that.
How can i solve it or fix it?
Note: Here are some screenshots if you want to check organized by numbers.
http://www.megafileupload.com/rbt5/Screenshots.rar
I found easier way that you can rebuild excel file with correct characters.
Export your data from MySQL normally in CSV format.
Open new Excel and go to Data tab.
Select "From Text".if you not find this it is under "Get External Data".
Select your file.
Change file origin to Unicode(UTF-8) and select next.("Delimited" checked by default)
Select Comma delimiter and press finish.
you will see your language characters correctly.See more
Mojibake. Probably...
The bytes you have in the client are correctly encoded in utf8mb4 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8mb4.)
The column in the tables may or may not have been CHARACTER SET utf8mb4, but it should have been that.
(utf8 and utf8mb4 work equally well for Arabic/Persian.)
Please provide more details if this explanation does not suffice.
I am currently working on a project that requires a large data migration for a company. We are in the process of planning and testing data imports from an existing Access database to a MySQL database for a CRM they will be using.
We have encountered errors with importing (using Load Data Infile) exported data in .csv format, when the records have accented or special characters due to the files being imported being in ANSI format (the rest of the MySQL database is all in UTF8). I managed to fix this issue by using the Convert to UTF8 functionality in Notepad++, but this was before I knew we needed the existing primary key ID's from the Access database to be imported as well.
Doing this same process with the added ID's causes a MySQL error to throw:
Error Code: 1366. Incorrect integer value: '135' for column 'id' at row 1
Is there a way to convert all this data to UTF8 without having integer values throw errors?
Convert the file to UTF-8 without BOM and try again :)
The trick is that at beginning of the UTF-8 file there is a BOM sequence and your number 135 at the beginning of the file is actually 0xEF 0xBB 0xBF 1 3 5 what causes error in TSV importer unaware of UTF-8.