Converting latin mysql data to utf8 - mysql

I want to use utf 8 right now , but all my data is latin1 , what is the efficient way to convert data . Also I know how to change database's structure(charset) to utf8 , What I want to do is changing charset of existing data .
update
Here are my old setting ,
Html output : utf8
Html input : utf8
Php - mysql connection : latin1
mysql (fields and tables) : latin1
Here are my new settings , and I hope this is the best way to create multi-language website
Html output : utf8
Html input : utf8
Php - mysql connection : utf8
sql (fields and tables) : utf8

If you apply utf8_encode() to an already UTF8 string it will return a garbled UTF8 output.
I made a function that addresses all this issues. It´s called forceUTF8().
You dont need to know what the encoding of your strings is. It can be Latin1 (iso 8859-1) or UTF8, or the string can have a mix of the two. forceUTF8() will convert everything to UTF8.
I did it because a service was giving me a feed of data all messed up, mixing UTF8 and Latin1 in the same string.
Usage:
$utf8_string = forceUTF8($utf8_or_latin1_or_mixed_string);
$latin1_string = forceLatin1($utf8_or_latin1_or_mixed_string);
I've included another function, fixUFT8(), wich will fix every UTF8 string that looks garbled.
Usage:
$utf8_string = fixUTF8($garbled_utf8_string);
Examples:
echo fixUTF8("Fédération Camerounaise de Football");
echo fixUTF8("Fédération Camerounaise de Football");
echo fixUTF8("FÃÂédÃÂération Camerounaise de Football");
echo fixUTF8("Fédération Camerounaise de Football");
will output:
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Fédération Camerounaise de Football
Update: I converted theese into a static class, and they live in Github now:
https://github.com/neitanod/forceutf8

You need to change collation (to utf-8) . Here is script to do that easily.
http://blog.vision4web.net/2008/11/change-collation-on-all-tables-and-columns-in-mysql/
I have experience with this script , it works perfectly

Do you actually use the latin1 part, or is your data actually ASCII?
It would seem that there's a command for this:
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
...but be careful, I also found this:
http://bugs.mysql.com/bug.php?id=21681
Failing the command that seems to be there for this sort of thing, an alternative might be to dump the table(s) to a file, convert that, and then re-import that. (Or, if you can convince it to dump to UTF-8, even better...)
There seems to be a lot of information out there for this: http://www.google.com/search?q=mysql+convert+table+to+utf8

Your best solution to to create a new database called dbname_new do a SQL dump from your old database.
Then take that dump and replace the charset info with your new utf8 data, and make sure you resave the sql file itself as utf8.
Then load that back into the new database, check everything worked OK and then rename it.
This can be a lengthy process over the 'net so i recomend you do it via a ssh shell session, and take full advantage of bash pipes and the like.

Excellent resource on the subject:
Turning MySQL data in latin1 to utf8 utf-8

If you can/want to live with data stored as latin1, but just want to present it as UTF-8, specifying UTF-8 as the connection character set should work too. One way to test this is to issue the query
SET NAMES 'utf8'
once you establish a connection, before reading/writing any data.
More details on this here http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

Related

How to fix garbled characters in PHPMyAdmin

My MySQL database contains some Chinese symbols and such (non-ASCII symbols). When I view them in PHPMyAdmin, they look garbled. However, if I display them on my website with PHP using the regular mysqli API, it looks fine so I assume the data is uploaded/stored properly in the database, so maybe the server connection collation is incorrect.
My PHP code for opening the database connection is:
function openConnection(): mysqli
{
$databaseHost = "localhost";
$databaseUser = "root";
$databasePassword = '';
$databaseName = "my-database-name";
$connection = new mysqli($databaseHost, $databaseUser,
$databasePassword, $databaseName);
if ($connection->connect_error) {
die("Connection failed: " . $connection->connect_error);
}
return $connection;
}
My PHPMyAdmin server connection collation is the default utf8mb4_unicode_ci which seems to be reasonable as well. My tables are also created with the default utf8mb4_general_ci. Shouldn't that work fine for any input users might make?
Calling $connection->get_charset() in PHP also returns the correct charset:
If I export the database data in MyPHPAdmin, the export is also garbled in Notepad++, I made sure to view it with UTF-8 encoding. If I import the garbled export again, the database will show the data as garbled once more and on the website the data now also shows as garbled. In this case, an actually corrupted export happened.
How can I solve this encoding problem? Clearly PHP can handle UTF-8 properly, my Apache web server is also serving UTF-8 and my database is configured seemingly correctly as well but there is an issue with PHPMyAdmin or the database/database table collation.
It looks like the issue was entirely elsewhere since I'm supplying data to PHP with C++ code. The C++ code uses the nlohmann JSON libary to build the data submitted to the PHP script. The issue was my inability to specifically encode std::strings to UTF-8 like described here when putting data into a C++ JSON object. With that said, everything is now working as expected.
⚈ If using mysqli, do $mysqli_obj->set_charset('utf8mb4');
⚈ If using PDO do somethin like $db = new PDO('dblib:host=host;dbname=db;charset=utf8mb4', $user, $pwd);
⚈ Alternatively, execute SET NAMES utf8mb4
Any of these will say that the bytes in the client are UTF-8 encoded. Conversion, if necessary, will occur between the client and the database if the column definition is something other than utf8mb4.
More notes on PHP: http://mysql.rjweb.org/doc.php/charcoll#php
If you have specific garbling, see Trouble with UTF-8 characters; what I see is not what I stored
If you suspect the data being fed from PHP to Notepad, dump a few Chinese characters in hex and shown to us. I would expect every 4th character to be hex F0 or every 3rd to be between E3 and EA. (These are the first byte for 4-char and 3-char UTF-8 encoding of Chinese characters.)
Does Notepad properly handle UTF-8, or does it need a setting?
If you are in the "cmd" in Windows, you may need chcp 65001; see http://mysql.rjweb.org/doc.php/charcoll#entering_accents_in_cmd That way, more non-English characters will display correctly.

Using iconv to convert mysqldump-ed databases

Trying to quickly convert a latin1 mysql DB to utf8, I tried the following:
Dump the DB
run iconv -f latin1 -t utf8 on the resulting file
import into a fresh DB with UTF8 default encoding
This mostly works except... some letters get converted wrong (an example: uppercase accented 'U' becomes some garbled sequence starting with a question mark). Some conversion is taking place (od an a query result shows a two byte sequence where the latin1 byte was) and te latin1 version is alright. While I have so far been unsystematic in isolating the problem (late night; under deadline; etc.) the weirdness of the issue kills me: why would it fail on some letters and not all? Client connection? Column charset? Why I am not getting any diagnostics? I'm stymied.
Sure, I can work on isolating the issue and its details, but thought that maybe somebody ran into this already and can recognize it by this (admittedly rather poor) description.
Cheers
The data may have been stored as latin1 but it's possible that what ever client you used to dump the data has already exported it as UTF-8.
Open the dump file in a decent text editor (Notepad++, TextWrangler, Atom) and check which encoding allows all characters to be displayed properly.
Then when it comes to import the data back in, ensure your client is set to use UTF-8 on the import.
Don't use iconv, it only muddies the works.
Assuming that a table is declared to be latin1 and correctly contains latin1 bytes, but you would like to change it to utf8, do this to the table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8;
It is also possible to do it with a dump and reload; it involves some changes to the arguments. Sorry I don't have the details.

Perl DBI/Mysql Unicode Bug

I'm not sure if it's a bug or I'm doing something wrong:
I read data per
open my $fh, "<:encoding(iso-latin1)", $file or die "Failed to open $file: $!";
$file is definitely in iso-latin1.
Then I have a mysql table which is
ENGINE=InnoDB AUTO_INCREMENT=53072 DEFAULT CHARSET=latin1
I check the connection settings:
$dbh->prepare("show variables");
Which gives
character_set_client, latin1
character_set_connection, latin1
character_set_database, latin1
character_set_filesystem, binary
character_set_results, latin1
character_set_server, latin1
character_set_system, utf8
So to me everything should be fine:
Table is iso-latin1
Data was iso-latin1 (should be perl internal character format now)
Connection info shows the right settings
Output to STDOUT (terminal is iso-latin1) is correct
But: Data in table is plain utf8 (most probably perl's internal format in this case).
Did I miss something is this maybe a bug in DBI/DBD::mysql?
My guess would be that you're right and this data is in Perl's internal character format. The sequence goes like this.
Data in input file stored as Latin-1 bytes
Data read from input file and auto-converted to Perl characters because of the encoding option on your open statement
Data sent to MySQL as Perl characters
MySQL slightly confused by getting UTF8 instead of Latin-1, but stores it anyway as best it can
The step your missing is to encode you Perl characters back into Latin-1 before sending them to the database. The obvious solution is to call encode('iso-885901', $string) on every value you sent to the database. It would be nice if there was some kind auto-encode option. But I can't find one.
Of course, if your data is all going to be Latin-1, then you could consider just ignoring any decoding/encoding issues. It should all just work without that complication.

change charset from UTF-8 to ISO-8859-1 in mysql

I started working on a legacy mysql database whose collation: latin1-default but tables are utf-8-default. Even though tables are mentioned with utf-8 (universal standard encoding) it doesn't render Swedish characters. It seems application related to this database encoding is ISO-8859-1. So, I would like to convert this database and data in it to ISO-8859-1 encoding. I tried with this command
iconv -f UTF-8 -t ISO-8859-1 webtest_backu_01.sql > converted-file.sql
it gives error: illegal input sequence at position
any help is appreciated. thanks.
Please take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html
You can use the alter table command to make this conversion per-table if it is possible. I used this before successfully.
Example from the link:
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
Also an important detail... Conversion may be lossy if the column contains characters that are not in both character sets... but I don't think ISO-8859-1 to UTF-8.
Give this a try for one of the tables and see if it works.

Perl: How to treat a certain MySQL table as utf8

Here is a system that handles everything in latin1, but I want this one particular table to be read as utf8, then properly encoded into JSON.
How do I switch the connection to utf8, then read it, then switch the connection back?
I know how to do the JSON, but the MySQL I do not know about.
I am using the DBI MySQL driver, and this is an old CGI program.
You could try something like this:
$dbh->do('set names utf8');
you can change connection encoding like this:
$dbh->do("set names 'utf8';");
# ...do something with utf8 tables...
$dbh->do("set names 'latin1';");
# do something with latin1 tables
$dbh->disconnect;