I have a huge database of book authors in which names of French authors have not been stored correctly and the French characters have been replaced by some strange characters!
Can I solve the problem by a SQL query? if yes, I do appreciate if you give me a clue.
Thanks,
Export your table data with a mysqldump
Change the character encoding of the dump file create table statement to utf8
drop the table or change the name to something like tablename_old (I recommend keeping the old table until after the experiment ;))
Import the modified dump file
Since french characters are all in UTF8 and you probably don't have a multi-byte encoding character set on your table, this should fix the issue.
You might be able to just run an alter table to change the encoding, but in my experience that can be a roll of the dice.
Related
Can you someone please provide the best way to convert not only a mysql database and all its tables from latin1_swedish_ci to UTF-8, with their contents? I have been researching all over Stackoverflow as well as elsewhere and the suggestions are always different.
Some people suggest just using these commands on the tables and databases:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Others say that this just changes the database and tables, but not the contents.
Some suggest dumping the db, create a new table with the right char set and collation, and importing the old db into that. Does this actually convert the data as well?
mysqldump --skip-opt --set-charset --skip-set-charset
Others suggest running iconv against the dumped DB before importing? Is this really needed or would the import into a UTF-8 db do the conversion?
Finally, other suggest altering the database, convert char/blog tables to binary, and the converting back.
There are so many different methods that it has become very confusing.
Can someone please provide a concise step-by-step instruction, or point me to one, on how I can go about convert my latin DBs and their content to UTF-8? Even better if there is a script that automates this process against a database.
Thanks in advance.
The are two different problems which are often conflated:
change the specification of a table or column on how it should store data internally
convert garbled mojibake data to its intended characters
Each text column in MySQL has an associated charset attribute, which specifies what encoding text stored in this column should be stored as internally. This only really influences what characters can be stored in this column and how efficient the data storage is. For example, if you're storing a ton of Japanese text, sjis as an encoding may be a lot more efficient than utf8 and save you a bit of disk space.
The column encoding does not in any way influence in what encoding data is input and output to/from the database. This is a separate setting, the connection encoding, which is established for every individual client every time you connect to the database. MySQL will convert data on the fly between the connection encoding and the column/table charset as needed. You can connect to the database with a utf8 connection, send it Japanese text destined for an sjis column, and MySQL will convert from utf8 to sjis on the fly (and back in reverse on the way out).
Now, if you've screwed up the connection encoding (as happens way too often) and you've inserted text in a different encoding than your connection encoding specified (e.g. your connection encoding was latin1 but you actually sent UTF-8 encoded data), then you're storing garbage in your database and you need to recover that. If that's your issue, see How to convert wrongly encoded data to UTF-8?.
However, if all your data is peachy and all you want to do is tell MySQL to store data in a different encoding from now on, you only need this:
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
MySQL will convert the current data from its current charset to the new charset and store future data in the new charset. That's all.
Here is an example from the Moodle community:
https://docs.moodle.org/23/en/Converting_your_MySQL_database_to_UTF8
(Scroll down to "Explained".)
The author does first an SQL dump, which is a big SQL file. Then he copies the file. After, he makes coding corrections with sed on the copied file. Finally he imports the copied and corrected SQL dump file back into the database.
I can recommend this because with this single steps it is easy to inspect if they have been done right. If something goes wrong, just go back to the last step and try it another way.
Use the MySQL Workbench to handle this. http://dev.mysql.com/doc/workbench/en/index.html
Run the migration wizard to produce a script that will create the database schema.
Edit that script to alter the collation and character set (notepad++ search replace is just fine for this) and the shema name so you don't overwrite the existing database.
Run the script to create the copy under a new name.
Use the migration wizard to bulk transfer the data to the new schema. It will handle all the conversion for you and ensure that your data is still good.
I am inserting some datas including turkish characters into a table and when I insert a data appearance of turkish characters confused me. For example for the character 'ğ', appearance on database is 'ð'. But when I get data and print it on the browser for testing this character shows up to be correct 'ğ' character. Do I have to worry about the situation ? (Btw I am using utf8mb4 as described in title)
If you are using phpmyadmin to see database entries, it is a common misconception.
You shouldn't worry and you can configure it from php.ini and phpmyadmin's settings. But as you said it is not much of a problem. I reccomend you to use toad or navicat for mysql for better results.
Over the years the databases I use for my research have been migrated from SyBase to MySQL to PostgreSQL back to MySQL.
It was done very carefully so that data wasn't broken because of various encoding issues but unfortunately a bunch of records did get damaged.
For example, one of the records says Jòzefina, but it should be Józefina.
Does anyone know if I could fix this particular encoding problem programmatically?
I'm not that strong in encodings, but it looks like I could somehow map the byte sequence ò to ó, and so on.
I wonder if anyone knows in which encoding ò corresponds to ó, so that I didn't have to manually create the encoding mapping table from broken text to correct text, but instead do it automatically.
Change collation to unicode. Do this:
ALTER TABLE `t1` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
Your table is using Windows-1252. As you can see:
Dec 242
Hex F2
UTF-8 ò
Windows 1252 ò
I have a database that contains data in different languages. Some languages use accents (like áéíóú) and I need to search in this data as the accents doesn't exist (search for 'campeon' should return 'campeón' as a valir result).
The problem is that the tables in my database (utf8_unicode_ci) are not storing utf8 characteres. If you see the data through phpmyadmin the words with accents looks like this: campeón
After some researching, I've found (in a StackOverflow question) that the problem is related to the inexistence of a SET NAMES [charset]. In fact, I've made some testings and if I set names to utf8, everything works as expected.
Well, I have the solution, what's the problem? The problem is that the database is in production, so there are thousands of strings in the database. If I change the character set the client will use, all already existing string will become invalid. The question is: is there any way to:
perform accent-insensitive searches in a database that uses a wrong charset like mine?
transform safely the data in the tables to the appropriate charset?
continue working with mixed charsets (latin1 and utf8) in the database, assuming that latin1 data will not be accent-insensitive?
If anybody has experience in any of the solutions I propose or has a new one, I'll be very thankful if share.
The problem being that the data was inserted using the wrong connection encoding, you can fix it by
Exporting the data using the wrong connection encoding, just like you have used it thus far, followed by
Importing the data using the correct utf8 connection encoding.
That will fix the encoding problem, after which search will work as expected.
What if you create a copy of the table at the beginning of your session, alter the copy's charset, perform all your queries from that, and then drop the table at the end of your session? I don't know how practical this would be - depends on how often you need to perform these queries and how big the table is.
I exported and then imported a wordpress mysql database from one server to another.
On the new server, a lot of the apostrophes have turned to question mark symbols ?. When I look at the data in the database itself, the apostrophes are normal like this ' so what would be causing those characters to look messed up?
Thanks
Perhaps run SHOW CREATE TABLE tablename on both servers. The difference might be related to the CHARSET.
You should use the corresponding collation in the select, for example
SELECT k
FROM t1
ORDER BY k COLLATE latin1_german2_ci;
One common collation name could be SQL_Latin1_General_Cp1254_CS_AS
here is a list of collation names:http://msdn.microsoft.com/en-us/library/ms180175.aspx