Exporting latin1 database and import as utf8 / convert to utf8 - mysql

I am working on a website that uses an old database, running on aa external MySQL 4.1 server (Server A). The database uses a latin1_swedish_ci collation, as do the tables and columns. There is a new server B that runs MySQL 5 to replace server A. The encoding should be utf8_unicode_ci.
I export the DB on Server A:
mysqldump -u root -p --opt --quote-names --skip-set-charset --default-character-set=latin1 db_a -r db_a.sql
Transfer db_a.sql via scp from server A to server B
Replace latin1 with utf-8
sed -e 's/CHARSET\=latin1/CHARSET\=utf8\ COLLATE\=utf8_general_ci/g' db_a.sql > db_a2.sql
Convert file to utf-8
iconv -f latin1 -t utf8 db_a2.sql > db_a3.sql
Import db_a3.sql
In phpmyadmin everything is printed correctly. But the new client application shows artifacts in the text columns.
I tried different variations of the steps above without success. Including importing as latin1 and using the mysql convert command. Does someone know a solution to my problem?

Better would be to load it up as latin1, then fix things.
However, this is not straightforward because there are multiple scenarios to consider. See this: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Note in particular that there are at least 2 different ways to do the ALTER. If you pick the wrong one, the data will become garbled worse.
To see what you have, use this for a sample of the data:
SELECT col, HEX(col) FROM ... WHERE ...

Related

MySQL database with wrong character set and LONGTEXT's with binary data

I have Percona XtraDB 5.6 server with very old database with charset set to utf8 and data encoded in different charset (probably latin1).
I tried to migrate the database to new Percona 8.0 server but after importing the SQL file, all diacritic marks become broken on the 8.0 server. I have resolved the issue by executing this query on every column in every table:
UPDATE table SET col = convert(cast(convert(col using latin1) as binary) using UTF8) WHERE 1;
BUT there is one table with binary data (specifically GZIP compressed data) saved into LONGTEXT columns. Data from this columns always becomes damaged after import on the new server.
This is what I tried so far:
changing column type to LONGBLOB before dump.
using the above query to convert the data before/after column type change.
This is the command I'm using to export DB:
mysqldump --events --routines --triggers --add-drop-database --hex-blob --opt --skip-comments --single-transaction --skip-set-charset --default-character-set=utf8 --databases "%s" > db.sql
Please note the "--hex-blob" option which still results in binary data being exported as strings, instead of hex.
It would not have been damaged by zip/unzip. But it could have been damaged in any of a number of other ways.
"--hex-blob" turns data into strings such that they will not be mangled until you reload them.
Dumping, loading, INSERTing, SELECTing all need to be told what character set to use.
The particular UPDATE you did may or may not have made things worse. Here is a list of cases that I have identified, and the minimal "fix":
http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases

How to change mysql DB to convert saved text to readable version of UTF-8

I have some problem with my mysql database. I need to use proper mysql server configuration. My mysql configuration has
init-connect='SET NAMES latin2'
my tables are utf8_general_ci
and everything stored by PHP scripts using UTF-8 are unreadable in database like:
Ĺwietnie
but it should be
świetnie
I can use this DB from php and read it like it should be. BUT when remove ini-connect from my.cnf - mysql configuration everything is unreadable.
I can manually export DB and convert it with iconv
iconv -f UTF-8 -t latin2 in.txt > out.txt
then i have readable Polish characters and i can import this DB and use with mysql configuration.
My question is - how to use my.cnf without
init-connect='SET NAMES latin2'
and convert databases to readable Polish characters

Corrupted international characters because of no "SET NAMES utf8;" in TYPO3

I've got some problem in one of the TYPO3 Polish sites with character encoding. There was no setDBinit=SET NAMES utf8; parameter set in configuration.
Everything works okay (frontend & backend) but the export from the database. All international characters are corrupted when I search database via PhpMyAdmin or try to export database with data.
The official page http://wiki.typo3.org/UTF-8_support#SET_NAMES_utf8.3B says:
Without SET NAMES utf8; your TYPO3 UTF-8 setup might work, but chances are that database content entered after the conversion to UTF-8 has each international character stored as two separate, garbled latin1 chars.
If you check your database using phpMyAdmin and find umlauts in new content being shown as two garbled characters, this is the case. If this happens to you, you cannot just add the above statement any more. Your output for the new content will be broken. Instead you have to correct the newly added special chars first. This is done most easily by just deleting the content, setting the option as described above and re-entering it.
Is there any other way to repair corrupted characters? There is a lot of content to edit now...
I tried almost every combination of export encoding and converting to another encoding and so on and so far I failed.
You can try mysqldump to convert from ISO-8859-1 to utf-8:
mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql
chgrep latin1 utf8 dump.sql (or when you prefer sed -i "" 's/latin1/utf8/g' dump.sql)
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql

mysqldump to csv in utf8

I'm using the following command to export data from my database to csv:
mysqldump -u root -p -T/home/xxx/stock_dump -t --fields-terminated-by=";" products stock
But the database is in UTF-8 and this command exports UTF-8 characters incorrectly, e.g.
ŻYWIEC ZDRÓJ is ĂÂťYWIEC ZDRĂâJ
How do I export it in the correct UTF-8 format?
Adding --default-character-set=utf8 did not help at all.
Or if it is not possible, how do I postprocess it the easiest way? Can iconv do anything about it?
mysqldump produces UTF-8 encoded dumps unless told otherwise (or unless you use a really, really old version of mysqldump -- anyways, your correct using of --default-character-set=utf8 settles it).
Either your text editor is not recognizing UTF-8 correctly, or the data in the database is already wrongly encoded.

mysql dump - character encoding

I have a mysql dump what contains ascii characters like überprüft. My problem is like I can not make another dump and I have been searching on the net for a solution but every suggestions would involve another dump set it up to utf-8. Is there a way to convert a dump file.
Is the entire dump encoded like that, that is, in UTF-8? In that case you can simply set the encoding when you import the dump.
If you use the mysql command line client to import the dump, use the --default-character-set command line switch when importing, for example:
> mysql -u user --default-character-set=utf8 < dump.sql