Converting data from LATIN1 to UTF8 in MySQL

Converting data from LATIN1 to UTF8 in MySQL - mysql

I'm trying migrate a MSSQL database to MySQL. Using MySQL Workbench I moved the schema and data over but having problems converting the character encoding. During the migration I had the tool put text into BLOBS when there was problems with the encoding.
I believe I've confirmed that the data that is now in MySQL is *latin1_swedish_ci*. To simplify the problem I'm looking at ® symbols in one of the columns.
I wanted to convert the BLOBS to VARCHAR or TEXT with UTF8 encoding. I'm running this SQL command on one of the columns:
ALTER TABLEbookdetailsMODIFYBookNameVARCHAR(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Instead of converting the ® it is just removing them which is not what I want. What am I doing wrong? Not that reading half the internet trying to find a solution isn't fun but 3 days in and I think my eyes are about to give out.

MySQL workbench has a UI that is relatively simple to nav. If you need to change the collation of the tables or schemas, you can right click them on the Object Browser and go to alter table, or alter schema there you can change the data types, and set the collation to whatever you want.

Related

Strange character appearing in MySQL database field

I've imported some data from an old FileMaker program into a mysql database and I've noticed some strange characters. What is the best way to get rid of all these?

It seems the data you imported into your MySQL DB has a different character set (Collation) then what you set when you built the table in MySQL.
My guess would be those "strange characters" are an emoji. Which for example in utf8 charset, aren't defined. But utf8mb4 they would be.
The short answer is, find what charset the data was originally made in, and build your new data table with the same charset.
MySQL Workbench
The Charset can be manipulated easily in MySQL Workbench by right clicking the schema then table in question, click on the wrench.
SQL Syntax
Or via something similar to :
ALTER TABLE YourTableName CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE YourTableName modify name text charset utf8mb4;

I believe your varchar length are overflowing.
For example: field varchar(10) and you put "STRING OVER 10 CHARACTERS" this may cause trash on String.

How to convert mysql latin1 to utf8

I inherited a web system that I need to develop further.
The system seems to be created by someone who read two chapters of a PHP tutorial and thought he could code...
So... the webpage itself is in UTF8 and displays and inputs everything in it. The database tables have been created with UTF8 character set. But, in the config, there is "SET NAMES LATIN1". In other words, UTF8 encoded strings are populated into the database with forced latin1 coding.
Is there a way to convert this mess to actually store in utf8 and get rid of latin1?
I tried this, but since the database table is set to utf8, this does not work. Also tried this one without success.
I maybe able to do this by reading all tables in PHP with latin1 encoding then write them back to a new database in utf8, but I want to avoid it if possible.

I managed to solve it by running updates on text fields like this:
UPDATE table SET title = CONVERT(CONVERT(CONVERT(title USING latin1) USING binary) USING UTF8)

The situation isn't as bad as you think it is, unless you already have lots of non-Roman characters (that is, characters that aren't representable in Latin-1) in your database already. Latin-1 is a proper subset of utf8. Your web app works in utf8 and your tables' contents are in utf8 as well. So there's no need to convert the tables.
So, try changing the SET NAMES latin1 to SET NAMES utf8. It will probably solve your problem, by allowing your php program's connection to work with the same character set as the code on either end of the connection.
Read this. http://dev.mysql.com/doc/refman/5.7/en/charset-connection.html

Change column data type
to VARBINARY
and it's automatically convert the latin1 datas
Thank you guys.
I hope it's helpful.

Detailed instructions on converting a MYSQL DB and its data from latin to UTF-8. Too much diff info out there

Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.
Some people suggest just using these commands on the tables and databases:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Others say that this just changes the database and tables, but not the contents.
Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?
mysqldump --skip-opt --set-charset --skip-set-charset
Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?
Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.
There are so many different methods that it has become very confusing.
Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.
Thanks in advance.

The are two different problems which are often conflated:
change the specification of a table or column on how it should store data internally
convert garbled mojibake data to its intended characters
Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.
The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).
Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.
However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.

Here is an example from the Moodle community:
https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
(Scroll down to "Explained".)
The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.
I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.

Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html
Run the migration wizard to produce a script that will create the database schema.
Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
Run the script to create the copy under a new name.
Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.

The dreaded � character and displaying info from database in UTF8

So, I have a database and I use Navicat. We have a simple PHP website which is a few years old and we've upgraded the site to UTF8.
We have 'activities' on the site which handle UTF8 special characters perfectly, but we also have 'comments' on the site and curly single quotes and other special characters show me a �.
The database was converted to UTF via:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
When I look at both databases in Navicat, I can see both are UTF8 and utf8_general_ci.
When I design the table I can see the 'activities' table I can see the cell is a mediumText and is setup with UTF8. When I design the 'comments' section, the cell that isn't working is a Blob and it doesn't have any character encoding info.
We're doing a pretty basic SELECT and then displaying via $vairable[column].
Does anyone know why the 'activities' would work perfectly with UTF8 and the 'comments' would have issues? We're not doing anything super fancy to either of them.
I have tried converting the Blob to a text field, but when I do that the database then escapes it'self when it's outputting to the page, so as soon as there is a single quote in the text it cuts off.
I have tried things like utf8_encode, stripslashes, mysql_real_escape_string, htmlentities, htmlspecialchars, but I'm not sure any of them would help anyway.
Thanks!

blob means binary large object. Binary data does not have any encoding in raw.
So you have latin1 or whatever data in a blob, and you show it and treat it like utf-8 data.
You need to manually convert the data using PHP or whatever.
Here is a good article from the performanceblog that describes what you can do:
http://www.mysqlperformanceblog.com/2013/10/16/utf8-data-on-latin1-tables-converting-to-utf8-without-downtime-or-double-encoding/
If you have problems firing your queries, use the console instead of phpMyAdmin and don't forget the connection encoding through SET NAMES
master> ALTER TABLE t CONVERT TO CHARACTER SET utf8, CHANGE comment comment TEXT;
master> SET NAMES utf8;

Is this a safe way to convert MySQL tables from latin1 to utf-8?

I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).
I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
Is it safe for me to repeat this on the real database? The data seems fine by inspection...

There are 3 different cases to consider:
The values are indeed encoded using Latin1
This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.
Use the command you suggested:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATE utf8_bin
Note that the CONVERT TO CHARACTER SET command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.
The values are already encoded using utf8
In this case, you don't want mysql to convert any data, you only need to change the column's metadata.
For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8
This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.
The values use in a different encoding
The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:
Fix the table meta-data, using the BLOB type trick
Convert the values using CONVERT TO.

A straightforward conversion will potentially break any strings with non-utf7 characters.
If you don't have any of those (i.e. all of your text is english), you'll usually be fine.
If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.
See this article for detailed procedures:
http://codex.wordpress.org/Converting_Database_Character_Sets

I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008