How can I stop mysql from converting ' into ’ when I do an insert?
i believe it has something to do with charset or something?
I am using php to do the mysql_insert.
The single quotation mark you posted is called an 'acute accent', which is often converted from the generic single quotation mark by some web applications. It's a UTF8 character, which when inserted into a Latin-1 database translates to '’'. This means that you need to change MySQL's charset to UTF8, or alternatively change your website's charset to Latin-1. The former would be preferred:
ALTER DATABASE YourDatabase CHARACTER SET utf8;
ALTER TABLE YourTableOne CONVERT TO CHARACTER SET utf8;
ALTER TABLE YourTableTwo CONVERT TO CHARACTER SET utf8;
...
ALTER TABLE YourTableN CONVERT TO CHARACTER SET utf8;
Maybe someone will know the answer immediately, but I don't. However here are a few suggestions on what to examine (and possibly expand the question on)
When dealing with encodings and escaping you should include the full history of data
how was it created
what happened to it before the problem (did it have to go through backup, e-mail, was it created on a different server, OS, etc..; if it was transferred then was it as text file?)
The above is because anything that writes to a text file (browser, mysql client, web server, php application, to name a few layers that could have done it) can mess up character coding.
To troubleshoot, you can start eliminating, and thus the first step (in my book), is to
connect to mysql server using mysql command line client.
check the output of SHOW VARIABLES LIKE 'character_set%'
(so even in this simple environment you have 7 values that can influence how the data is parsed, stored and/or displayed
inspect SHOW CREATE TABLE TableName, and look for charset and collation info, both default for the table and explicit definition on columns
Having said all of the above, I don't think any western script would transcode a single quote character. So you might need to look at your escaping and other data processing.
EDIT
Most of the above from answer and discussion here
This is what I've done, and it worked for me:
First make sure that column containing ' is utf8_general_ci
Then add the mysql_set_charset to your code
$db=mysql_connect("localhost", $your_username, $your_password);
mysql_set_charset('utf8',$db);
mysql_select_db($your_db_name, $db);
Related
I need to implement a sorted SELECT, on a specific encoding of a field, without CONVERT.
That is, normally I'd do it by
SELECT * FROM table ORDER BY CONVERT(field USING gbk) COLLATE gbk_chinese_ci
However for some reason CONVERT was not allowed. As a result, I tried to approach this by
ALTER TABLE table MODIFY field VARCHAR(xx) CHARACTER SET gbk COLLATE gbk_chinese_ci;
SELECT * FROM table ORDER BY field
It works. That's good. However I'm worried about encoding problems.
Connection to the MySQL server includes the parameters characterEncoding=utf8 and useUnicode=true. I couldn't yet find the explanation of these params in MySQL's official document, but I suppose these ensure that the communications between the client and the server should be in utf-8.
That brings the question. Does MySQL server implicitly convert data in utf-8 to gbk when it receives the data? Do the GET params only define the charset of communication rather than that of the final stored data?
Edit
Comments say that the server does convert them! Thanks guys!
My further confusion is that, only one of the fields is set to use gbk, while everything else has been left to use utf8. That means the server's charset should still be utf8 globally but gbk locally for that field only.
Suppose now I fire this line of script to the server
INSERT INTO table (field_gbk, field_utf8) VALUES ("a", "b");
Does the server:
Receive the whole statement in utf8;
Convert only "a" to gbk and stores it; and
Stores "b" as-is to the database?
Many thanks guys!
Yes.
You specify the encoding of in the client when you connect.
You specify the encoding ("Character set") of the column you are Inserting into.
MySQL converts from one encoding to the other as it INSERTs the rows. Similarly, it converts the other way when SELECTing.
The CONVERT function should not (normally) be used for anything.
You are using Java? characterEncoding=utf8 and useUnicode=true is what it uses for declaring the client side.
"gbk" for a single column? Find. That column will handled differently than other columns.
Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.
Some people suggest just using these commands on the tables and databases:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Others say that this just changes the database and tables, but not the contents.
Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?
mysqldump --skip-opt --set-charset --skip-set-charset
Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?
Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.
There are so many different methods that it has become very confusing.
Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.
Thanks in advance.
The are two different problems which are often conflated:
change the specification of a table or column on how it should store data internally
convert garbled mojibake data to its intended characters
Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.
The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).
Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.
However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.
Here is an example from the Moodle community:
https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
(Scroll down to "Explained".)
The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.
I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.
Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html
Run the migration wizard to produce a script that will create the database schema.
Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
Run the script to create the copy under a new name.
Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.
We are importing data from .sql script containing UTF-8 encoded data to MySQL database:
mysql ... database_name < script.sql
Later this data is being displayed on page in our web application (connected to that database), again in UTF-8. But somewhere in the process something went wrong, because non-ascii characters was displayed incorrectly.
Our first attempt to solve it was to change mysql columns encoding to UTF-8 (as described for example here):
alter table wp_posts change post_content post_content LONGBLOB;`
alter table wp_posts change post_content post_content LONGTEXT CHARACTER SET utf8;
But it didn't helped.
Finally we solved this problem by importing data from .sql script with additional command line flag which as I believe forced mysql client to treat data from .sql script as UTF-8.
mysql ... --default-character-set=utf8 database_name < script.sql
It helped but then we realized that this time we forgot to change column encoding to utf8 - it was set to latin1 even if utf-8 encoded data was flowing through database (from sql script to application).
So if data obtained from database is displayed correctly even if database character set is set incorrectly, then why the heck should I bother setting correct database encoding?
Especially I would like to know:
What parts of database rely on column encoding setting? When this setting has any real meaning?
On what occasions implicit conversion of column encoding is done?
How does trick with converting column to binary format and then to the destination encoding work (see: sql code snippet above)? I still don't get it.
Hope someone help me to clear things up...
The biggest reason, in my view, is that it breaks your DB consistency.
it happens way to often that you need to check data in the database. And if you cannot properly input UTF-8 strings coming from the web page to your MySQL CLI client, it's a pity;
if you need to use phpMyAdmin to administer your database through the “correct” web, then you're limiting yourself (might not be an issue though);
if you need to build a report on your data, then you're greatly limited by the number of possible choices, given only web is producing your the correct output;
if you need to deliver a partial database extract to your partner or external company for analysis, and extract is messed up — it's a pity.
Now to your questions:
When you ask database to ORDER BY some column of string data type, then sorting rules takes into account the encoding of your column, as some internal trasformation are applicable in case you have different encodings for different columns. Same applies if you're trying to compare strings, encoding information is essential here. Encoding comes together with collation, although most people don't use this feature so often.
As mentioned, if you have any set of columns in different encodings, database will choose to implicitly convert values to a common encoding, which is UTF8 nowadays. Strings' implicit encoding might be done in the client frameworks/libraries, depending on the client's environment encoding. Typically data is recoded into the database's encoding when sent to the server and back into client's encoding when results are delivered.
Binary data has no notion of encoding, it's just a set of bytes. So when you convert to binary, you're telling database to “forget” encoding, although you keep data without changes. Later, you convert to the string enforcing the right encoding. This trick helps if you're sure that data physically is in UTF-8, while by some accident a different encoding was specified.
Given that you've managed to load in data into the database by using --default-character-set=utf8 then there was something to do with your environment, I suggest it was not UTF8 setup.
I think the best practice today would be to:
have all your environments being UTF8 ready, including shells;
have all your databases defaulting to UTF8 encoding.
This way you'll have less field for errors.
I tried inserting Vietnamese characters into MySQL database through my java program. It is getting inserted but certain characters are being inserted as junk. And while trying to retrieve, i'm getting the same junk values in place of some characters. Can anyone tel me what should be done? Is there a problem in MySQL or is there any DB that supports these characters?
Example of ‘junk’, and code?
In general you need to make sure:
your tables are created with UTF-8 collation on all text columns. This can be done at several levels: config default-character-set=utf8, db CREATE DATABASE ... DEFAULT CHARACTER SET utf8, table CREATE TABLE ... DEFAULT CHARACTER SET utf8, and column column VARCHAR(255) CHARACTER SET utf8. After the initial creation you can only do it by ALTER on the columns; changing the default character sets won't change the column.
that your connection to the database is in UTF-8 encoding, by specifying useUnicode=true and characterEncoding=UTF-8 properties in your connection string or properties. Ensure you have an up-to-date MySQL Connector as there have been grievous bugs here in the past.
that nothing else in your processing stream is mangling the characters before they get to the database connection, or on the way back out. Ensure you aren't using the default encoding anywhere because it is probably wrong. Setting the flag -Dfile.encoding=UTF-8 may help with that as a temporary workaround, but you don't want to rely on it.
(And if part of your testing involves printing to the terminal, be aware that the Windows command prompt won't be able to do anything with UTF-8 so you will definitely see junk there.)
Hi there no problem to store vietnamese characters, but check mysql FAQ first:
http://dev.mysql.com/doc/refman/5.0/en/faqs-cjk.html
I'm using google translate with my website to translate short, frequently used phrases. Instead of asking google for a translation every time, I thought of caching the translations in a MySQL table.
Anyway, it works fine for latin characters, but fails for others like asian. What collation/charset would be the best to use?
Also - I've tried the default (latin1_swedish_ci) and utf8_unicode_ci
One of those should do the trick:
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
Also, as seen in the MySQL documentation:
Client applications that need to
communicate with the server using
Unicode should set the client
character set accordingly; for
example, by issuing a SET NAMES 'utf8'
statement.
So, if you select the utf8_unicode_ci encoding, you will need to execute a SET NAMES 'utf8' query for every connection to your database (run it after a mysql_select_db() or whatever you're using).
Collation has nothing to do with international characters. Charset does.
Usual solution is utf8.
Dunno what do you mean "I've tried utf8_unicode_ci", but at least you have to tell database, what charset your data is. SET NAMES utf8 query can do that, if your data from google uses that charset