MySQL: insert symbols like ½ or ° - mysql

I need to insert some data into a MySQL table. The table has collation utf8_general_ci. The insert lines are all written into a text document.
I am able to insert special symbols into the MySQL tables on my own computer when I copy-paste the inserts into the HeidiSQL editor and execute from there.
Now I need to insert them into the database on the server. I am using Webfaction. I don't seem to be able to get the symbols in there. Instead, the database inserts them as a '' (empty string). I upload the text file (using Filezilla), ssh to the server (putty), and then execute the inserts from MySQL as "source path/file.txt". The encoding of the file before uploading doesn't seem to make a difference (ansi or utf8).

i been seeing lots of encoding related issues the owner should google a bit before posting
general check list:
mysql table schema to use utf-8
mysql clients connection to use utf-8 mysql --default-character-set=utf8
php mysqli_set_charset to utf-8
html encoding to utf-8
putty, emac clients... to be in utf-8

Related

UTF-8 weird behaviour between HTML/PHP forms and MySQL (Hindi)

I have all my database/tables and columns set to UTF-8_general_ci collation set.
Conditions that I Faced :-
When I insert hindi data manually by phpmyadmin, I can see the the hindi characters in phpmyadmin, while question marks when seen on webpage generated by PHP
In the same table when I insert data by HTML/PHP Forms I see some unrecognizable words in english something like cc2faa;(something like this) and Correct Hindi on Webpage.
For the large data we have a script that reads from txt files and insert the data in the table in this , I see characters like जाना in phpmyadmin but Hindi On webpage.
Now the main problem is :-
Data has gone under changes online by forms and now I need this data to export to excel and give to the client but I am getting जाठin excel instead of Hindi Characters.
Note :-
All English characters are working fine and as it is everywhere.
My CHARACTER SET is utf8 For all tables.
I tried to change the collation to UTF-8_bin but that too doesn't helped me in anyway.
Encoding on the browser is UTF-8, and I have already sent the headers for UTF-8 encoding.
I have seen many posts about utf8 problem but no one seem to have this weird different behavior problem.
Please Do I have any rescue from this?? Or finally have to give the PHP reports of the data??
Please help!!
When I insert hindi data manually by phpmyadmin, I can see the the hindi characters in phpmyadmin, while question marks when seen on webpage generated by PHP
PHP probably generates the question marks because the encoding of the database connection is not utf-8. How to fix this depends on the database library you use; if you use MySQLi use mysqli_set_charset('utf8'), if PDO you add charset=utf8 to the DSN...
In the same table when I insert data by HTML/PHP Forms I see some unrecognizable words in english something like cc2faa;(something like this) and Correct Hindi on Webpage.
For the large data we have a script that reads from txt files and insert the data in the table in this , I see characters like जाना in phpmyadmin but Hindi On webpage.
These are likely caused by the same problem as above: the PHP forms and the script connect to the database using the default encoding, probably latin1. Then they insert utf-8 encoded text, but since MySQL thinks you are using latin1, it encodes the text into utf-8 again, and inserts this doubly encoded text into the table.
So: PHP sends "जाना" to MySQL telling it's latin1, and MySQL goes and converts it to utf-8, resulting in "जाना". Later PHP asks MySQL return the value, and since the connection is again using latin1, MySQL takes "जाना" and decodes it to latin1. Then PHP pretends that this latin1 string is actually utf-8 and displays "जाना".
Again, the solution is setting the encoding of the connection to utf-8. And this depends on what you use to access the database.
If you need to export your data as Excel file, use the PHP class php-export-data by Eli Dickinson, http://github.com/elidickinson/php-export-data. It is pretty nifty and so far I had have no problems exporting weird character sets with it.

Databases: column encoding, when is it important?

We are importing data from .sql script containing UTF-8 encoded data to MySQL database:
mysql ... database_name < script.sql
Later this data is being displayed on page in our web application (connected to that database), again in UTF-8. But somewhere in the process something went wrong, because non-ascii characters was displayed incorrectly.
Our first attempt to solve it was to change mysql columns encoding to UTF-8 (as described for example here):
alter table wp_posts change post_content post_content LONGBLOB;`
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
But it didn't helped.
Finally we solved this problem by importing data from .sql script with additional command line flag which as I believe forced mysql client to treat data from .sql script as UTF-8.
mysql ... --default-character-set=utf8 database_name < script.sql
It helped but then we realized that this time we forgot to change column encoding to utf8 - it was set to latin1 even if utf-8 encoded data was flowing through database (from sql script to application).
So if data obtained from database is displayed correctly even if database character set is set incorrectly, then why the heck should I bother setting correct database encoding?
Especially I would like to know:
What parts of database rely on column encoding setting? When this setting has any real meaning?
On what occasions implicit conversion of column encoding is done?
How does trick with converting column to binary format and then to the destination encoding work (see: sql code snippet above)? I still don't get it.
Hope someone help me to clear things up...
The biggest reason, in my view, is that it breaks your DB consistency.
it happens way to often that you need to check data in the database. And if you cannot properly input UTF-8 strings coming from the web page to your MySQL CLI client, it's a pity;
if you need to use phpMyAdmin to administer your database through the “correct” web, then you're limiting yourself (might not be an issue though);
if you need to build a report on your data, then you're greatly limited by the number of possible choices, given only web is producing your the correct output;
if you need to deliver a partial database extract to your partner or external company for analysis, and extract is messed up — it's a pity.
Now to your questions:
When you ask database to ORDER BY some column of string data type, then sorting rules takes into account the encoding of your column, as some internal trasformation are applicable in case you have different encodings for different columns. Same applies if you're trying to compare strings, encoding information is essential here. Encoding comes together with collation, although most people don't use this feature so often.
As mentioned, if you have any set of columns in different encodings, database will choose to implicitly convert values to a common encoding, which is UTF8 nowadays. Strings' implicit encoding might be done in the client frameworks/libraries, depending on the client's environment encoding. Typically data is recoded into the database's encoding when sent to the server and back into client's encoding when results are delivered.
Binary data has no notion of encoding, it's just a set of bytes. So when you convert to binary, you're telling database to “forget” encoding, although you keep data without changes. Later, you convert to the string enforcing the right encoding. This trick helps if you're sure that data physically is in UTF-8, while by some accident a different encoding was specified.
Given that you've managed to load in data into the database by using --default-character-set=utf8 then there was something to do with your environment, I suggest it was not UTF8 setup.
I think the best practice today would be to:
have all your environments being UTF8 ready, including shells;
have all your databases defaulting to UTF8 encoding.
This way you'll have less field for errors.

Getting incorrectly encoded characters when retrieving values from MySQL DB

I'm encoding all my files in UTF-8 without BOM.
All characters appear correctly when displaying my page, yet values retrieved from my MySQL database such as Árbol render Ás and nont ANSI characters incorrectly (as a black romboid). I'm storing values in database with PHP My Admin manually.
What can be causing this issue with the database? Are there any changes I must do to make the database "utf8-ready"?
After you connect to the server, use the following command to view your current client characterset:
status;
To set it, use the following command:
set names utf8;

Transfer old 3.23.49 MySQL database to 5.0.51 MySQL database - Encoding in ANSI and UTF-8

I want to transfer a 3.23.49 MySQL database to a 5.0.51 MySQL database. Now I have exported the SQL file and I'm ready for import. I looked in the sql-file and Notepad++ shows me that the files is encoded in ANSI. I looked in the values and some of them are in ANSI and some of them are in UTF-8. What is the best way to proceed?
Should I change the encoding within Notepad++?
Should I use ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8;?
Should I use iconv?
Do I have to look through each table and make the necessary changes?
Whate are the settings for the import? MYSQL323 compatibility mode and encoding latin1?
Do I have to be aware of something if the php-scripts are using another encoding?
Thank you for your hints!
If the problem is to import a utf8-encoded mysql dump, the solution is usually to add --default-character-set=utf8 to mysql options:
mysql --default-character-set=utf8 -Ddbname -uuser -p < dump.sql
UPD1: In case the dump file is corrupted, I would try to export the database once again table by table so that the dump would result in a correct utf8 encoded file.
I have converted a MySQL 4.0 database (which also had no notion of character encoding yet) to MySQL 5.0 four years ago, so BTDT.
But first of all, there is no "ANSI" character encoding; that is a misconception and a misnomer that has caught on from the early versions of Windows (there are ANSI escape sequences, but they have nothing to do with character encoding). You are most certainly looking at Windows‑1252-encoded text. You should convert that text to UTF‑8 as then you have the best chance of keeping all used characters intact (UTF‑8 is a Unicode encoding, and Unicode contains all characters that can be encoded with Windows-125x, but at different code points).
I had used both the iconv and recode programs (on the Debian GNU/Linux system that the MySQL server ran on) to convert Windows‑1252-encoded text of a MySQL export (created by phpMyAdmin) to UTF‑8. Use whatever program or combination of programs works best for you.
As to your questions:
You can try, but it might not work. In particular, you might have trouble opening a large database dump with Notepad++ or another text editor.
Depends. ALTER TABLE … CONVERT TO … does more than just converting encodings.
See the paragraph above.
Yes. You should set the character encoding of every table and every text field that you are importing data into, to utf8 (use whatever utf8_… collation fits your purpose or data best). ALTER TABLE … CONVERT TO … does that. (But see 2.)
I don't think MYSQL323 matters here, as your export would contain only CREATE, INSERT and ALTER statements. But check the manual first (the "?" icon next to the setting in phpMyAdmin). latin1 means "Windows-1252" in MySQL 5.0, so that might work and you must skip the manual conversion of the import then.
I don't think so; PHP is not yet Unicode-aware. What matters is how the data is processed by the PHP script. Usually the Content-Type header field for your generated text resources using that data should end with ; charset=UTF-8.
On an additional note, you should not be using MySQL 5.0.x anymore. The current stable version is MySQL 5.5.18. "Per the MySQL Support Lifecycle policy, active support for MySQL 5.0 ended on December 31, 2009. MySQL 5.0 is now in the Extended support phase." MySQL 5.0.0 Alpha having been released on 2003-12-22, Extended Support is expected to end 8 full years after that, on 2011‑12‑31 (this year).

Replace text in MySql database

I have a MySql database on my server with a table named table_1. However I imported a csv file which occasionally included "café". However the "é" was not inserted into the database table, so I have been left with the text "caf". So what I would like to know is how can I replace the word "caf" in my database table with "cafe"?
Looks like an encoding issue to me - make sure you're using UTF-8 throughout your DB, and reimport your CSV.
If you've used the LOAD DATA command in MySQL, you can pass it a CHARACTER SET, which, when set to 'utf8' should allow you to import that file correctly.
This is a common problem of encoding. I sugest that you change your mysql database to utf-8 via GUI or with this information