How can I fix a mal-encodede MySQL dump? - mysql

I received a dump from a MySQL database encoded in latin_swedish_1. The dump encoding is UTF-8 but it shows mal-encoded characters. The targeted database in which I'm gonna import the dump has its charset set to utf8_general_ci.
However, I keep the mal-encoded characters when I import the dump.
My question is : how can I fix this dump before importing it ?

I just realised it: if you can see the malformed characters in a utf8-aware text editor, then the data was already malformed in the source database.
I am assuming you see strings like "é" in your dump, instead of "é". UTF-8 sequences were stored in a latin1 table in your source database. Such sequences were converted again to UTF-8 when generating the dump.
You need to fix your dump by rolling-back one stage of conversion. Under Linux, this can be done easliy:
iconv -f utf8 -t ISO88591 < broken.sql > fixed.sql
Import fixed.sql.

Related

Can I upload characters like ö to a MYSQL database via SSH? [duplicate]

When I use the import feature of PHPMyAdmin, it doesn't import non-ASCII characters such as ä, ö, ü, õ and the rest of the word after the characters.
When I open the CSV file with Notepad it displays the non-ASCII characters normally, but when I'm trying to import it - it doesn't work.
Entering those missing characters manually works and MySQL saves them just as it should. Any thoughts?
mySQL will do this when it encounters a character that is invalid under the current character set.
You're not mentioning what tool you are using to import the data, but you should be able to specify a character set when importing. If that character set matches the database's, everything will be fine. Also, make sure the file is actually encoded in that character set.
If your import tool doesn't offer the option of selecting the character set, you could try phpMyAdmin which does.
Make sure you know what the encoding of your CSV file is - it should be UTF-8. Then before you import, type 'use utf8', and it should work fine.

Using iconv to convert mysqldump-ed databases

Trying to quickly convert a latin1 mysql DB to utf8, I tried the following:
Dump the DB
run iconv -f latin1 -t utf8 on the resulting file
import into a fresh DB with UTF8 default encoding
This mostly works except... some letters get converted wrong (an example: uppercase accented 'U' becomes some garbled sequence starting with a question mark). Some conversion is taking place (od an a query result shows a two byte sequence where the latin1 byte was) and te latin1 version is alright. While I have so far been unsystematic in isolating the problem (late night; under deadline; etc.) the weirdness of the issue kills me: why would it fail on some letters and not all? Client connection? Column charset? Why I am not getting any diagnostics? I'm stymied.
Sure, I can work on isolating the issue and its details, but thought that maybe somebody ran into this already and can recognize it by this (admittedly rather poor) description.
Cheers
The data may have been stored as latin1 but it's possible that what ever client you used to dump the data has already exported it as UTF-8.
Open the dump file in a decent text editor (Notepad++, TextWrangler, Atom) and check which encoding allows all characters to be displayed properly.
Then when it comes to import the data back in, ensure your client is set to use UTF-8 on the import.
Don't use iconv, it only muddies the works.
Assuming that a table is declared to be latin1 and correctly contains latin1 bytes, but you would like to change it to utf8, do this to the table:
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8;
It is also possible to do it with a dump and reload; it involves some changes to the arguments. Sorry I don't have the details.

Not able to display Chinese characters after loading it to Postgres DB

I have a source file which contains Chinese characters. After loading that file into a table in Postgres DB, all the characters are garbled and I'm not able to see the Chinese characters. The encoding on Postgres DB is UTF-8. I'm using the psql utility on my local mac osx to check the output. The source file was generated from mySql db using mysqldump and contains only insert statements.
INSERT INTO "trg_tbl" ("col1", "col2", "col3", "col4", "col5", "col6", "col7", "col7",
"col8", "col9", "col10", "col11", "col12", "col13", "col14",
"col15", "col16", "col17", "col18", "col19", "col20", "col21",
"col22", "col23", "col24", "col25", "col26", "col27", "col28",
"col29", "col30", "col31", "col32", "col33")
VALUES ( 1, 1, '与é<U+009D>žç½‘_首页&频é<U+0081>“页顶部广告ä½<U+008D>(946×90)',
'通æ <U+008F>广告(Leaderboard Banner)',
0,3,'',946,90,'','','','',0,'f',0,'',NULL,NULL,NULL,NULL,NULL,
'2011-08-19 07:29:56',0,0,0,'',NULL,0,NULL,'CPM',NULL,NULL,0);
What can I do to resolve this issue?
The text was mangled before producing that SQL statement. You probably wanted the text to start with 与 instead of the "Mojibake" version: 与. I suggest you fix the dump either to produce utf8 characters or hex. Then the load may work, or there may be more places to specify utf8, such as SET NAMES or the equivalent.
Also, for Chinese, CHARACTER SET utf8mb4 is preferred in MySQL.
é<U+009D>ž is so mangled I don't want to figure out the second character.

change charset from UTF-8 to ISO-8859-1 in mysql

I started working on a legacy mysql database whose collation: latin1-default but tables are utf-8-default. Even though tables are mentioned with utf-8 (universal standard encoding) it doesn't render Swedish characters. It seems application related to this database encoding is ISO-8859-1. So, I would like to convert this database and data in it to ISO-8859-1 encoding. I tried with this command
iconv -f UTF-8 -t ISO-8859-1 webtest_backu_01.sql > converted-file.sql
it gives error: illegal input sequence at position
any help is appreciated. thanks.
Please take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html
You can use the alter table command to make this conversion per-table if it is possible. I used this before successfully.
Example from the link:
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
Also an important detail... Conversion may be lossy if the column contains characters that are not in both character sets... but I don't think ISO-8859-1 to UTF-8.
Give this a try for one of the tables and see if it works.

MySQL LOAD DATA INFILE with fields terminated by non-ASCII character

I have a lowercase thorn separated file that I need to load into a MySQL database (5.1.54) using the LOAD DATA INFILE ... query.
The file I'm trying to load is located on the same server as the MySQL database, and I'm issuing the query from a Windows machine using SQLYog, which uses the MySQL C client library.
I'm having some major issues, I've tried using the FIELDS TERMINATED BY 0x00FE syntax using all the variations of the thorn character I can think of, and I've tried changing the character set of the connection (SET NAMES ...), but I consistently get the warning...
Warning Code : 1638
Non-ASCII separator arguments are not fully supported
...and all the data loads into the first column.
Is there any way around this at all? Or am I resigned to pre-processing the file with sed to replace all the thorn's with a more sensible character before loading?
I have succeeded to load this data with Data Import tool (CSV format) in dbForge Studio for MySQL. I just set 'Þ' as custom delimiter. The import from the CSV format is fully supported in free Express Edition.
I decided to fix the file by replacing the non-ASCII character with a character that MySQL's LOAD DATA INFILE ... would understand.
Use od to get the octal byte value of the offending character - od -b file.log - in this case it's 376.
Use grep to make sure the character you want to replace it with doesn't already exist in the file - grep -n '|' file.log.
Use sed and printf to replace the non-ASCII character - sed -i 's/'$(printf '\376')'/|/g' file.log.