I wish to save the IPA phonetic pronunciation against a word in a simple dictionary. I have a simple table where the IPA field is collated to UTF8_GENERAL_CI. I have a script that is running to extract the IPA from a source and write that into the table.
If I show the SQL prior to execution, the command looks fine. If I retrieve what IPA value after save, it is garbage. If I look at it using PHPMyAdmin, it is garbage. If I overwrite the value using PHPMyAdmin edit then it shows perfectly via PHPMyAdmin
UPDATE vocab_word SET phonetic='[ˈaːχənɐ]' WHERE id=4
What I see when retrieving it is '[ˈaËχənÉ]' and this is what I see within PHPMyAdmin.
So, I know I am getting it in the format I want, I know that if it is saved correctly, PHPMyAdmin can show it, but something is happening during the write to convert it.
I have tried :-
VARCHAR with UTF8_General_ci collation
VARCHAR with UTF8_bin
TEXT with UTF8_general_ci
TEXT with UTF8_bin
Within the SQL, I have tried simply using the value (as above) as well as using a UTF8_encode.
I am assuming there is a specific combination that I need to save the IPA text of encoding in the SQL and the database, and will continue to try combinations, however if someone can save me some time, I would really apprecaited it.
Regards
Chris H
It turned out to be a combination of things....
I needed to convert the character_set for the database to utf8mb4
I needed to convert the character_set for the table to utf8mb4
Collations needed to also be utf8mb4
plus the mysql connection needed to have it's character_set set.
Any one of these not done caused the characters to be mutated.
Related
Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.
Some people suggest just using these commands on the tables and databases:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Others say that this just changes the database and tables, but not the contents.
Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?
mysqldump --skip-opt --set-charset --skip-set-charset
Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?
Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.
There are so many different methods that it has become very confusing.
Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.
Thanks in advance.
The are two different problems which are often conflated:
change the specification of a table or column on how it should store data internally
convert garbled mojibake data to its intended characters
Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.
The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).
Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.
However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.
Here is an example from the Moodle community:
https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
(Scroll down to "Explained".)
The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.
I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.
Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html
Run the migration wizard to produce a script that will create the database schema.
Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
Run the script to create the copy under a new name.
Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.
I'm trying migrate a MSSQL database to MySQL. Using MySQL Workbench I moved the schema and data over but having problems converting the character encoding. During the migration I had the tool put text into BLOBS when there was problems with the encoding.
I believe I've confirmed that the data that is now in MySQL is *latin1_swedish_ci*. To simplify the problem I'm looking at ® symbols in one of the columns.
I wanted to convert the BLOBS to VARCHAR or TEXT with UTF8 encoding. I'm running this SQL command on one of the columns:
ALTER TABLEbookdetailsMODIFYBookNameVARCHAR(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Instead of converting the ® it is just removing them which is not what I want. What am I doing wrong? Not that reading half the internet trying to find a solution isn't fun but 3 days in and I think my eyes are about to give out.
MySQL workbench has a UI that is relatively simple to nav. If you need to change the collation of the tables or schemas, you can right click them on the Object Browser and go to alter table, or alter schema there you can change the data types, and set the collation to whatever you want.
I tried inserting Vietnamese characters into MySQL database through my java program. It is getting inserted but certain characters are being inserted as junk. And while trying to retrieve, i'm getting the same junk values in place of some characters. Can anyone tel me what should be done? Is there a problem in MySQL or is there any DB that supports these characters?
Example of ‘junk’, and code?
In general you need to make sure:
your tables are created with UTF-8 collation on all text columns. This can be done at several levels: config default-character-set=utf8, db CREATE DATABASE ... DEFAULT CHARACTER SET utf8, table CREATE TABLE ... DEFAULT CHARACTER SET utf8, and column column VARCHAR(255) CHARACTER SET utf8. After the initial creation you can only do it by ALTER on the columns; changing the default character sets won't change the column.
that your connection to the database is in UTF-8 encoding, by specifying useUnicode=true and characterEncoding=UTF-8 properties in your connection string or properties. Ensure you have an up-to-date MySQL Connector as there have been grievous bugs here in the past.
that nothing else in your processing stream is mangling the characters before they get to the database connection, or on the way back out. Ensure you aren't using the default encoding anywhere because it is probably wrong. Setting the flag -Dfile.encoding=UTF-8 may help with that as a temporary workaround, but you don't want to rely on it.
(And if part of your testing involves printing to the terminal, be aware that the Windows command prompt won't be able to do anything with UTF-8 so you will definitely see junk there.)
Hi there no problem to store vietnamese characters, but check mysql FAQ first:
http://dev.mysql.com/doc/refman/5.0/en/faqs-cjk.html
How can I stop mysql from converting ' into ’ when I do an insert?
i believe it has something to do with charset or something?
I am using php to do the mysql_insert.
The single quotation mark you posted is called an 'acute accent', which is often converted from the generic single quotation mark by some web applications. It's a UTF8 character, which when inserted into a Latin-1 database translates to '’'. This means that you need to change MySQL's charset to UTF8, or alternatively change your website's charset to Latin-1. The former would be preferred:
ALTER DATABASE YourDatabase CHARACTER SET utf8;
ALTER TABLE YourTableOne CONVERT TO CHARACTER SET utf8;
ALTER TABLE YourTableTwo CONVERT TO CHARACTER SET utf8;
...
ALTER TABLE YourTableN CONVERT TO CHARACTER SET utf8;
Maybe someone will know the answer immediately, but I don't. However here are a few suggestions on what to examine (and possibly expand the question on)
When dealing with encodings and escaping you should include the full history of data
how was it created
what happened to it before the problem (did it have to go through backup, e-mail, was it created on a different server, OS, etc..; if it was transferred then was it as text file?)
The above is because anything that writes to a text file (browser, mysql client, web server, php application, to name a few layers that could have done it) can mess up character coding.
To troubleshoot, you can start eliminating, and thus the first step (in my book), is to
connect to mysql server using mysql command line client.
check the output of SHOW VARIABLES LIKE 'character_set%'
(so even in this simple environment you have 7 values that can influence how the data is parsed, stored and/or displayed
inspect SHOW CREATE TABLE TableName, and look for charset and collation info, both default for the table and explicit definition on columns
Having said all of the above, I don't think any western script would transcode a single quote character. So you might need to look at your escaping and other data processing.
EDIT
Most of the above from answer and discussion here
This is what I've done, and it worked for me:
First make sure that column containing ' is utf8_general_ci
Then add the mysql_set_charset to your code
$db=mysql_connect("localhost", $your_username, $your_password);
mysql_set_charset('utf8',$db);
mysql_select_db($your_db_name, $db);
I'm using google translate with my website to translate short, frequently used phrases. Instead of asking google for a translation every time, I thought of caching the translations in a MySQL table.
Anyway, it works fine for latin characters, but fails for others like asian. What collation/charset would be the best to use?
Also - I've tried the default (latin1_swedish_ci) and utf8_unicode_ci
One of those should do the trick:
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
Also, as seen in the MySQL documentation:
Client applications that need to
communicate with the server using
Unicode should set the client
character set accordingly; for
example, by issuing a SET NAMES 'utf8'
statement.
So, if you select the utf8_unicode_ci encoding, you will need to execute a SET NAMES 'utf8' query for every connection to your database (run it after a mysql_select_db() or whatever you're using).
Collation has nothing to do with international characters. Charset does.
Usual solution is utf8.
Dunno what do you mean "I've tried utf8_unicode_ci", but at least you have to tell database, what charset your data is. SET NAMES utf8 query can do that, if your data from google uses that charset