data in mysql database is not in proper format - mysql

I am trying to store data with multiple languages(mainly English, Hindi, Bengali, Punjabi). I am using UTF8mb4 charset and UTF8mb4_unicode_ci collation.I am able to insert and receive data from the database. on the HTML page, received data from database looks alike as I inserted it but, when I look at my database, data is not saved in the proper format.
Output on html page
data in database
Character set values

Earlier i had the same issue:
if you set to UTF 8 it will work as you expected it happens when special characters are involved,
Note: make sure the data entered properly while inserting
mysqli_query($con,'SET NAMES utf8');

Related

Detailed instructions on converting a MYSQL DB and its data from latin to UTF-8. Too much diff info out there

Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.
Some people suggest just using these commands on the tables and databases:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Others say that this just changes the database and tables, but not the contents.
Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?
mysqldump --skip-opt --set-charset --skip-set-charset
Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?
Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.
There are so many different methods that it has become very confusing.
Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.
Thanks in advance.
The are two different problems which are often conflated:
change the specification of a table or column on how it should store data internally
convert garbled mojibake data to its intended characters
Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.
The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).
Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.
However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.
Here is an example from the Moodle community:
https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
(Scroll down to "Explained".)
The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.
I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.
Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html
Run the migration wizard to produce a script that will create the database schema.
Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
Run the script to create the copy under a new name.
Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.

Convert Mysql stored data to correct utf8

I stored large Arabic database in Mysql using Perl in wrong format, here what happened:
1)-Mysql tables created with attributes:
ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
2)-My perl scripts are created in utf8 and I use "use utf8;" at the top.
3)-I can read the data from the table and display it normal in Arabic on html pages using charset utf8 meta tag.
The problem when I see the data in the database it is stored as encoded not readable, after search about Perl module DBI, I found I should do this:
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');
Immediately after connecting to the database which I did not do.
Now if I do this, the new data stored in the table are shown correct everywhere even in the mysql browser windows application.
The problem now with the already data stored in the table, it seems it comes double utf8 encode or something like that. How to fix the already stored data when the above flags were not set.
After days of search, did not find anyway to fix the data in mysql direct or using programming like Perl.
The only solution I did is to export the data from mysql the same way I put it in which seems to be double utf8 encoded by Perl to text files but after I utf8 decode it in Perl first.
After that the data is correctly saved to text files in UTF8 format which can be imported or managed direct as a valid UTF8 data in Perl and mysql.

UTF-8 weird behaviour between HTML/PHP forms and MySQL (Hindi)

I have all my database/tables and columns set to UTF-8_general_ci collation set.
Conditions that I Faced :-
When I insert hindi data manually by phpmyadmin, I can see the the hindi characters in phpmyadmin, while question marks when seen on webpage generated by PHP
In the same table when I insert data by HTML/PHP Forms I see some unrecognizable words in english something like cc2faa;(something like this) and Correct Hindi on Webpage.
For the large data we have a script that reads from txt files and insert the data in the table in this , I see characters like जाना in phpmyadmin but Hindi On webpage.
Now the main problem is :-
Data has gone under changes online by forms and now I need this data to export to excel and give to the client but I am getting जाठin excel instead of Hindi Characters.
Note :-
All English characters are working fine and as it is everywhere.
My CHARACTER SET is utf8 For all tables.
I tried to change the collation to UTF-8_bin but that too doesn't helped me in anyway.
Encoding on the browser is UTF-8, and I have already sent the headers for UTF-8 encoding.
I have seen many posts about utf8 problem but no one seem to have this weird different behavior problem.
Please Do I have any rescue from this?? Or finally have to give the PHP reports of the data??
Please help!!
When I insert hindi data manually by phpmyadmin, I can see the the hindi characters in phpmyadmin, while question marks when seen on webpage generated by PHP
PHP probably generates the question marks because the encoding of the database connection is not utf-8. How to fix this depends on the database library you use; if you use MySQLi use mysqli_set_charset('utf8'), if PDO you add charset=utf8 to the DSN...
In the same table when I insert data by HTML/PHP Forms I see some unrecognizable words in english something like cc2faa;(something like this) and Correct Hindi on Webpage.
For the large data we have a script that reads from txt files and insert the data in the table in this , I see characters like जाना in phpmyadmin but Hindi On webpage.
These are likely caused by the same problem as above: the PHP forms and the script connect to the database using the default encoding, probably latin1. Then they insert utf-8 encoded text, but since MySQL thinks you are using latin1, it encodes the text into utf-8 again, and inserts this doubly encoded text into the table.
So: PHP sends "जाना" to MySQL telling it's latin1, and MySQL goes and converts it to utf-8, resulting in "जाना". Later PHP asks MySQL return the value, and since the connection is again using latin1, MySQL takes "जाना" and decodes it to latin1. Then PHP pretends that this latin1 string is actually utf-8 and displays "जाना".
Again, the solution is setting the encoding of the connection to utf-8. And this depends on what you use to access the database.
If you need to export your data as Excel file, use the PHP class php-export-data by Eli Dickinson, http://github.com/elidickinson/php-export-data. It is pretty nifty and so far I had have no problems exporting weird character sets with it.

CakePHP - Query not returning strange characters from MySQL database

I have a MySQL table of countries, which includes some country names that have strange characters like 'Åland Islands'.
When I perform a CakePHP find on all countries, the rows with strange characters just come back blank. My table and column collation is utf8_general_ci.
make sure that the output of your displaying is set to UTF-8
make sure that the page through which you input the data is set to UTF-8 so it doesn't send the data as ANSI or something else.
You could set up a little test page that you're sure off to enter the data in UTF-8 and displays in UTF-8 as well, then at least you'll know wether it's related to display or previous methods of data entry.
The answer was, as tigrang said, in Config/database.php you must have 'encoding' => 'utf8'

Databases: column encoding, when is it important?

We are importing data from .sql script containing UTF-8 encoded data to MySQL database:
mysql ... database_name < script.sql
Later this data is being displayed on page in our web application (connected to that database), again in UTF-8. But somewhere in the process something went wrong, because non-ascii characters was displayed incorrectly.
Our first attempt to solve it was to change mysql columns encoding to UTF-8 (as described for example here):
alter table wp_posts change post_content post_content LONGBLOB;`
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
But it didn't helped.
Finally we solved this problem by importing data from .sql script with additional command line flag which as I believe forced mysql client to treat data from .sql script as UTF-8.
mysql ... --default-character-set=utf8 database_name < script.sql
It helped but then we realized that this time we forgot to change column encoding to utf8 - it was set to latin1 even if utf-8 encoded data was flowing through database (from sql script to application).
So if data obtained from database is displayed correctly even if database character set is set incorrectly, then why the heck should I bother setting correct database encoding?
Especially I would like to know:
What parts of database rely on column encoding setting? When this setting has any real meaning?
On what occasions implicit conversion of column encoding is done?
How does trick with converting column to binary format and then to the destination encoding work (see: sql code snippet above)? I still don't get it.
Hope someone help me to clear things up...
The biggest reason, in my view, is that it breaks your DB consistency.
it happens way to often that you need to check data in the database. And if you cannot properly input UTF-8 strings coming from the web page to your MySQL CLI client, it's a pity;
if you need to use phpMyAdmin to administer your database through the “correct” web, then you're limiting yourself (might not be an issue though);
if you need to build a report on your data, then you're greatly limited by the number of possible choices, given only web is producing your the correct output;
if you need to deliver a partial database extract to your partner or external company for analysis, and extract is messed up — it's a pity.
Now to your questions:
When you ask database to ORDER BY some column of string data type, then sorting rules takes into account the encoding of your column, as some internal trasformation are applicable in case you have different encodings for different columns. Same applies if you're trying to compare strings, encoding information is essential here. Encoding comes together with collation, although most people don't use this feature so often.
As mentioned, if you have any set of columns in different encodings, database will choose to implicitly convert values to a common encoding, which is UTF8 nowadays. Strings' implicit encoding might be done in the client frameworks/libraries, depending on the client's environment encoding. Typically data is recoded into the database's encoding when sent to the server and back into client's encoding when results are delivered.
Binary data has no notion of encoding, it's just a set of bytes. So when you convert to binary, you're telling database to “forget” encoding, although you keep data without changes. Later, you convert to the string enforcing the right encoding. This trick helps if you're sure that data physically is in UTF-8, while by some accident a different encoding was specified.
Given that you've managed to load in data into the database by using --default-character-set=utf8 then there was something to do with your environment, I suggest it was not UTF8 setup.
I think the best practice today would be to:
have all your environments being UTF8 ready, including shells;
have all your databases defaulting to UTF8 encoding.
This way you'll have less field for errors.