Steps for saving and retriving utf8 to MySQL with Perl - mysql

I am confused about the Perl DBI settings for handling utf8:
$db->{mysql_enable_utf8} = 1
$db->do(qq{SET NAMES utf8});
I read I should issue them immediately after making the db connection this way:
my $db = DBI->connect($cstring, $user, $password);
$db->{mysql_enable_utf8} = 1
$db->do(qq{SET NAMES utf8});
Here is the issue:
1)-I have the web page with form which set to utf8, so user data are sent in utf8 to the script.
2)-the script uses CGI::Simple to read the form data. Should I decode the form data with utf8::decode() or just leave it?.
3)-Should I set these two or not:
$db->{mysql_enable_utf8} = 1
$db->do(qq{SET NAMES utf8});
I hope some one will explain the steps to save and read utf8 starting from getting the user input on the web page to the MySQL database.

I did some tests which may be helpful in answering part of the question.
Consider this variable in utf8 Arabic:
my $string = "السلام عليكم";
According to this uft8 counter:
https://mothereff.in/byte-counter
It is 12 characters, totaling 23 bytes.
These two statements
$strlen = length(($string));
say $strlen;
$strlen = length(decode_utf8($string));
say $strlen;
Prints 12, so Perl knows it is utf-8 characters because I've used use utf8; to tell perl my source code is encoded using utf-8. This is the equivalent of decoding your CGI inputs.
Now to the Mysql:
1)- I created test table with attributes:
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
2)- Setting these two as following after connecting to the mysql db:
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');
When I see the table in MySQL Windows Query Browser I see it stored as correct Arabic and total 23 bytes and I can read the text as .
السلام عليكم
3)- Without setting these two:
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');
When I see the table in MySQL Windows Query Browser I see wrong encoded data total 50 bytes.
السلام عليكم
I am using Perl 5.10 on Windows.
This means that we have to set these two settings for correct storing and retrieving of utf8 data with mysql after connecting immediately:
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');
This I think clears part of the question in storing and retrieving data from mysql, still the rest of the question about handling the data starting from receiving data from forms should be decoded first or just used as is.

Related

How to fix garbled characters in PHPMyAdmin

My MySQL database contains some Chinese symbols and such (non-ASCII symbols). When I view them in PHPMyAdmin, they look garbled. However, if I display them on my website with PHP using the regular mysqli API, it looks fine so I assume the data is uploaded/stored properly in the database, so maybe the server connection collation is incorrect.
My PHP code for opening the database connection is:
function openConnection(): mysqli
{
$databaseHost = "localhost";
$databaseUser = "root";
$databasePassword = '';
$databaseName = "my-database-name";
$connection = new mysqli($databaseHost, $databaseUser,
$databasePassword, $databaseName);
if ($connection->connect_error) {
die("Connection failed: " . $connection->connect_error);
}
return $connection;
}
My PHPMyAdmin server connection collation is the default utf8mb4_unicode_ci which seems to be reasonable as well. My tables are also created with the default utf8mb4_general_ci. Shouldn't that work fine for any input users might make?
Calling $connection->get_charset() in PHP also returns the correct charset:
If I export the database data in MyPHPAdmin, the export is also garbled in Notepad++, I made sure to view it with UTF-8 encoding. If I import the garbled export again, the database will show the data as garbled once more and on the website the data now also shows as garbled. In this case, an actually corrupted export happened.
How can I solve this encoding problem? Clearly PHP can handle UTF-8 properly, my Apache web server is also serving UTF-8 and my database is configured seemingly correctly as well but there is an issue with PHPMyAdmin or the database/database table collation.
It looks like the issue was entirely elsewhere since I'm supplying data to PHP with C++ code. The C++ code uses the nlohmann JSON libary to build the data submitted to the PHP script. The issue was my inability to specifically encode std::strings to UTF-8 like described here when putting data into a C++ JSON object. With that said, everything is now working as expected.
⚈ If using mysqli, do $mysqli_obj->set_charset('utf8mb4');
⚈ If using PDO do somethin like $db = new PDO('dblib:host=host;dbname=db;charset=utf8mb4', $user, $pwd);
⚈ Alternatively, execute SET NAMES utf8mb4
Any of these will say that the bytes in the client are UTF-8 encoded. Conversion, if necessary, will occur between the client and the database if the column definition is something other than utf8mb4.
More notes on PHP: http://mysql.rjweb.org/doc.php/charcoll#php
If you have specific garbling, see Trouble with UTF-8 characters; what I see is not what I stored
If you suspect the data being fed from PHP to Notepad, dump a few Chinese characters in hex and shown to us. I would expect every 4th character to be hex F0 or every 3rd to be between E3 and EA. (These are the first byte for 4-char and 3-char UTF-8 encoding of Chinese characters.)
Does Notepad properly handle UTF-8, or does it need a setting?
If you are in the "cmd" in Windows, you may need chcp 65001; see http://mysql.rjweb.org/doc.php/charcoll#entering_accents_in_cmd That way, more non-English characters will display correctly.

Issue with retrieving special characters of sql inserted entities from MySQL using in Spring Boot application

I created a MySQL database from a script, along with inserting rows of data that include special characters such as č, š and ž.
For that reason I set default character set on schema in the script:
CREATE SCHEMA IF NOT EXISTS schemaname DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ;
Now if I go through MySQL console and try retrieving data from a table with these special characters:
SELECT name FROM tablename;
I'll get a response with these special characters:
+-------------------------------------------------+
| name |
+-------------------------------------------------+
| Test of š, and č, and ž |
+-------------------------------------------------+
I've also created a Spring Boot application (hibernate with Spring Boot Starter Data JPA) in which I implemented rest API for entities that I have in my database.
The problem arises when I try to retrieve (previously inserted) entities using JPARepository (EntityRepository.findAll()). Instead of correctly written special characters which are normal in MySQL console, I get:
Å¡, ž and such characters (should be š, ž)
However, if I store a new entity using the Rest API with special characters and retrieve that one, the special characters are fine (š, ž).
Any idea as to why there is a difference between these and how I might solve the issue?
Without having to insert every entity through Rest API instead of MySQL inserts.
My application.properties file contains (I don't think the issue is here, since entities created through rest API are fine):
spring.datasource.url=jdbc:mysql://localhost:3306/schemaname?useSSL=false&allowPublicKeyRetrieval=true&useUnicode=true&characterEncoding=UTF-8
Could be an issue: The .sql files were created on Windows, but I am running sql server (in docker) and my application on Linux (all .sql files have UTF-8 character encoding).
The solution I found was actually really simple, can't believe I missed this.
I just had to put N in front of every string with special character in the .sql file. Example:
INSERT INTO X(FIELD1) VALUE (N'ščž');
Simply means I'm passing a NCHAR/NVARCHAR/NTEXT value instead of normal CHAR/VARCHAR/TEXT.

\u00a0 becomes  in MYSQL database [duplicate]

I have my database properly set to UTF-8 and am dealing with a database containing Japanese characters. If I do SELECT *... from the mysql command line, I properly see the Japanese characters. When pulling data out of the database and displaying it on a webpage, I see it properly.
However, when viewing the table data in phpMyAdmin, I just see garbage text. ie.
ç§ã¯æ—¥æœ¬æ–™ç†ãŒå¥½ãã§ã™ã€‚日本料ç†ã‚...
How can I get phpMyAdmin to display the characters in Japanese?
The character encoding on the HTML page is set to UTF-8.
Edit:
I have tried an export of my database and opened up the .sql file in geany. The characters are still garbled even though the encoding is set to UTF-8. (However, doing a mysqldump of the database also shows garbled characters).
The character set is set correctly for the database and all tables ('latin' is not found anywhere in the file)
CREATE DATABASE `japanese` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
I have added the lines to my.cnf and restarted mysql but there is no change. I am using Zend Framework to insert data into the database.
I am going to open a bounty for this question as I really want to figure this out.
Unfortunately, phpMyAdmin is one of the first php application that talk to MySQL about charset correctly. Your problem is most likely due to the fact that the database does not store the correct UTF-8 strings at first place.
In order to correctly display the characters correctly in phpMyAdmin, the data must be correctly stored in the database. However, convert the database into correct charset often breaks web apps that does not aware charset-related feature provided by MySQL.
May I ask: is MySQL > version 4.1? What web app is the database for? phpBB? Was the database migrated from an older version of the web app, or an older version of MySQL?
My suggestion is not to brother if the web app you are using is too old and not supported. Only convert database to real UTF-8 if you are sure the web app can read them correctly.
Edit:
Your MySQL is > 4.1, that means it's charset-aware. What's the charset collation settings for you database? I am pretty sure you are using latin1, which is MySQL name for ASCII, to store the UTF-8 text in 'bytes', into the database.
For charset-insensitive clients (i.e. mysql-cli and php-mod-mysql), characters get displayed correctly since they are being transfer to/from database as bytes. In phpMyAdmin, bytes get read and displayed as ASCII characters, that's the garbage text you seem.
Countless hours had been spend years ago (2005?) when MySQL 4.0 went obsolete, in many parts of Asia. There is a standard way to deal with your problem and gobbled data:
Back up your database as .sql
Open it up in UTF-8 capable text editor, make sure they look correct.
Look for charset collation latin1_general_ci, replace latin1 to utf8.
Save as a new sql file, do not overwrite your backup
Import the new file, they will now look correctly in phpMyAdmin, and Japanese on your web app will become question marks. That's normal.
For your php web app that rely on php-mod-mysql, insert mysql_query("SET NAMES UTF8"); after mysql_connect(), now the question marks will be gone.
Add the following configuration my.ini for mysql-cli:
# CLIENT SECTION
[mysql]
default-character-set=utf8
# SERVER SECTION
[mysqld]
default-character-set=utf8
For more information about charset on MySQL, please refer to manual:
http://dev.mysql.com/doc/refman/5.0/en/charset-server.html
Note that I assume your web app is using php-mod-mysql to connect to the database (hence the mysql_connect() function), since php-mod-mysql is the only extension I can think of that still trigger the problem TO THIS DAY.
phpMyAdmin use php-mod-mysqli to connect to MySQL. I never learned how to use it because switch to frameworks* to develop my php projects. I strongly encourage you do that too.
Many frameworks, e.g. CodeIgniter, Zend, use mysqli or pdo to connect to databases. mod-mysql functions are considered obsolete cause performance and scalability issue. Also, you do not want to tie your project to a specific type of database.
If you're using PDO don't forget to initiate it with UTF8:
$con = new PDO('mysql:host=' . $server . ';dbname=' . $db . ';charset=UTF8', $user, $pass, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
(just spent 5 hours to figure this out, hope it will save someone precious time...)
I did a little more googling and came across this page
The command doesn't seem to make sense but I tried it anyway:
In the file /usr/share/phpmyadmin/libraries/dbi/mysqli.dbi.lib.php at the end of function PMA_DBI_connect() just before the return statement I added:
mysqli_query($link, "SET SESSION CHARACTER_SET_RESULTS =latin1;");
mysqli_query($link, "SET SESSION CHARACTER_SET_CLIENT =latin1;");
And it works! I now see Japanese characters in phpMyAdmin. WTF? Why does this work?
I had the same problem,
Set all text/varchar collations in phpMyAdmin to utf-8 and in php files add this:
mysql_set_charset("utf8", $your_connection_name);
This solved it for me.
the solution for this can be as easy as :
find the phpmysqladmin connection function/method
add this after database is conncted $db_conect->set_charset('utf8');
phpmyadmin doesn't follow the MySQL connection because it defines its proper collation in phpmyadmin config file.
So if we don't want or if we can't access server parameters, we should just force it to send results in a different format (encoding) compatible with client i.e. phpmyadmin
for example if both the MySQL connection collation and the MySQL charset are utf8 but phpmyadmin is ISO, we should just add this one before any select query sent to the MYSQL via phpmyadmin :
SET SESSION CHARACTER_SET_RESULTS =latin1;
Here is my way how do I restore the data without looseness from latin1 to utf8:
/**
* Fixes the data in the database that was inserted into latin1 table using utf8 encoding.
*
* DO NOT execute "SET NAMES UTF8" after mysql_connect.
* Your encoding should be the same as when you firstly inserted the data.
* In my case I inserted all my utf8 data into LATIN1 tables.
* The data in tables was like ДЕТСКИÐ.
* But my page presented the data correctly, without "SET NAMES UTF8" query.
* But phpmyadmin did not present it correctly.
* So this is hack how to convert your data to the correct UTF8 format.
* Execute this code just ONCE!
* Don't forget to make backup first!
*/
public function fixIncorrectUtf8DataInsertedByLatinEncoding() {
// mysql_query("SET NAMES LATIN1") or die(mysql_error()); #uncomment this if you already set UTF8 names somewhere
// get all tables in the database
$tables = array();
$query = mysql_query("SHOW TABLES");
while ($t = mysql_fetch_row($query)) {
$tables[] = $t[0];
}
// you need to set explicit tables if not all tables in your database are latin1 charset
// $tables = array('mytable1', 'mytable2', 'mytable3'); # uncomment this if you want to set explicit tables
// duplicate tables, and copy all data from the original tables to the new tables with correct encoding
// the hack is that data retrieved in correct format using latin1 names and inserted again utf8
foreach ($tables as $table) {
$temptable = $table . '_temp';
mysql_query("CREATE TABLE $temptable LIKE $table") or die(mysql_error());
mysql_query("ALTER TABLE $temptable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci") or die(mysql_error());
$query = mysql_query("SELECT * FROM `$table`") or die(mysql_error());
mysql_query("SET NAMES UTF8") or die(mysql_error());
while ($row = mysql_fetch_row($query)) {
$values = implode("', '", $row);
mysql_query("INSERT INTO `$temptable` VALUES('$values')") or die(mysql_error());
}
mysql_query("SET NAMES LATIN1") or die(mysql_error());
}
// drop old tables and rename temporary tables
// this actually should work, but it not, then
// comment out this lines if this would not work for you and try to rename tables manually with phpmyadmin
foreach ($tables as $table) {
$temptable = $table . '_temp';
mysql_query("DROP TABLE `$table`") or die(mysql_error());
mysql_query("ALTER TABLE `$temptable` RENAME `$table`") or die(mysql_error());
}
// now you data should be correct
// change the database character set
mysql_query("ALTER DATABASE DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci") or die(mysql_error());
// now you can use "SET NAMES UTF8" in your project and mysql will use corrected data
}
Change latin1_swedish_ci to utf8_general_ci in phpmyadmin->table_name->field_name
This is where you find it on the screen:
First, from the client do
mysql> SHOW VARIABLES LIKE 'character_set%';
This will give you something like
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
where you can inspect the general settings for the client, connection, database
Then you should also inspect the columns from which you are retrieving data with
SHOW CREATE TABLE TableName
and inspecting the charset and collation of CHAR fields (though usually people do not set them explicitly, but it is possible to give CHAR[(length)] [CHARACTER SET charset_name] [COLLATE collation_name] in CREATE TABLE foo ADD COLUMN foo CHAR ...)
I believe that I have listed all relevant settings on the side of mysql.
If still getting lost read fine docs and perhaps this question which might shed some light (especially how I though I got it right by looking only at mysql client in the first go).
1- Open file:
C:\wamp\bin\mysql\mysql5.5.24\my.ini
2- Look for [mysqld] entry and append:
character-set-server = utf8
skip-character-set-client-handshake
The whole view should look like:
[mysqld]
port=3306
character-set-server = utf8
skip-character-set-client-handshake
3- Restart MySQL service!
Its realy simple to add multilanguage in myphpadmin if you got garbdata showing in myphpadmin, just go to myphpadmin click your database go to operations tab in operation tab page see collation section set it to utf8_general_ci, after that all your garbdata will show correctly. a simple and easy trick
The function and file names don't match those in newer versions of phpMyAdmin. Here is how to fix in the newer PHPMyAdmins:
Find file:
phpmyadmin/libraries/DatabaseInterface.php
In function: public function query
Right after the opening { add this:
if($link != null){
mysqli_query($link, "SET SESSION CHARACTER_SET_RESULTS =latin1;");
mysqli_query($link, "SET SESSION CHARACTER_SET_CLIENT =latin1;");
}
That's it. Works like a charm.
I had exactly the same problem. Database charset is utf-8 and collation is utf8_unicode_ci. I was able to see Unicode text in my webapp but the phpMyAdmin and sqldump results were garbled.
It turned out that the problem was in the way my web application was connecting to MySQL. I was missing the encoding flag.
After I fixed it, I was able to see Greek characters correctly in both phpMyAdmin and sqldump but lost all my previous entries.
just uncomment this lines in libraries/database_interface.lib.php
if (! empty($GLOBALS['collation_connection'])) {
// PMA_DBI_query("SET CHARACTER SET 'utf8';", $link, PMA_DBI_QUERY_STORE);
//PMA_DBI_query("SET collation_connection = '" .
//PMA_sqlAddslashes($GLOBALS['collation_connection']) . "';", $link, PMA_DBI_QUERY_STORE);
} else {
//PMA_DBI_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci';", $link, PMA_DBI_QUERY_STORE);
}
if you store data in utf8 without storing charset you do not need phpmyadmin to re-convert again the connection. This will work.
Easier solution for wamp is:
go to phpMyAdmin,
click localhost,
select latin1_bin for Server connection collation,
then start to create database and table
Add:
mysql_query("SET NAMES UTF8");
below:
mysql_select_db(/*your_database_name*/);
It works for me,
mysqli_query($con, "SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'");
ALTER TABLE table_name CONVERT to CHARACTER SET utf8;
*IMPORTANT: Back-up first, execute after

Laravel 4 seeding accents

I'm trying to seed categories in my database. I set the title as unique. However, in french it's spelled catÉgorie.
I try to seed Catégorie 1, Catégorie 2, Catégorie 3. I got an error when I php artisan db:seed because he read it has Cat?gorie 1, Cat?gorie 2, Cat?gorie 3 and that Cat is repeated (therefore not unique).
I've set my db to utf8_general_ci and did the change in config/database.php.
What did I miss here?
Although you do not specify that, Im assuming you are using mysql and the "Categorie 1" is the content of some filed in one of the seeded tables. Things to remember if you want to use Laravel with utf8 mysql data:
Laravel 4 Config
//file: app/config/database.php
'charset' => 'utf8',
'collation' => 'utf8_unicode_ci',
Seeder files encoding
obviously, all your files should be saved with utf8 encoding.
Data collation
the database collation is set to the utf8 you set in the above mentioned config file
the tables collation is the same utf8
the fields are also utf8 where needed
Database connection
the connection itself should be of utf8.
Regarding the last remark, ofcourse Laravel already deals with that while establishing mysql connection by executing "set names '$charset' collate '$collation'" . If for some reason you will still be facing problems, try enforcing utf8 charset and collation in mysql server config and it should ultimately solve your issues:
//file /etc/mysql/my.cnf
[mysqld]
...
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
...
Hints & Analysis
"Cat?gorie 1" or "Catégory 1" are processed as normal strings being inserted in a DB Table row collumn.
But Some of the modern query strings parameters are represented by a like
query("SELECT something FROM somewhere WHERE condition IN (?)", array('this', 'that'))
Here your query compiler may get lost if stuff is not well formatted.
.
Another point is, if you are using Artisan command line in Dos if you're on Windows for example, the output messages either mentioning success, errors or whatever will not be displayed correctly in your console, because the DOS output will not be supported by UTF-8.
Solution
1 . The seeder errors may be right, check flat your Database if the records really exists.
2 . If the records doesn't exists, please attach your : migration & seed classes.

ZF2 Doctrine2 MySql charset error

I've setup a MySQL DB with utf8_unicode_ci collation, and all the tables and columns on it have de same collation.
My Doctrine config have SET NAMES utf8 as a connection option and my html files use utf8 charset.
The text saved on those tables contain accented characters (á,è,etc).
The problem is that when I save the content to the DB, it saves with strange characters, like when I try to save ISO in UTF8 table. (e.g.: Notícias)
The only workaround that i've found is to, utf8_decode before save, and utf8_encode before printing.
That means that, for some reason, something in between is messing up utf8 with iso.
What might be?
Thanks.
EDIT:
I've setup to encode before saving and decode before printing, and it prints correctly but in DB my chars change to:
XPTÓ -> XPTÓ
This makes searching in DB for "XPTÓ" impossible...
I would print bin2hex($string); at each step of the original workflow (i.e. without encode/decode steps).
Go through each of:
the raw $_POST data
the values you get after form validation
the values that get put in your bound Entity
the values you'd get from the db if you query it directly using PDO (get this from
$em->getConnection())
the values that get populated into your Entity on reload (can do this via $em->detach($entity); $entity = $em->find('Entity', $id);)
You'd be looking at the point at which the output changes, and focus your search there.
I would also double check:
On the db: SHOW CREATE TABLE 'table' shows CHARSET=utf8 for the whole table (and nothing different for the individual columns)
That the tool you use to see your database values (Navicat, phpMyAdmin) has got the correct encoding set.