Does mysqldump handle binary data reliably? - mysql

I have some tables in MySQL 5.6 that contain large binary data in some fields. I want to know if I can trust dumps created by mysqldump and be sure that those binary fields will not be corrupted easily when transferring the dump files trough systems like FTP, SCP and such. Also, should I force such systems to treat the dump files as binary transfers instead of ascii?
Thanks in advance for any comments!

No, it is not always reliable when you have binary blobs. In that case you MUST use the "--hex-blob" flag to get correct results.
Caveat from comment below:
If you combine the --hex-blob with the -T flag (file per table) then the hex-blob flag will be ignored, silently
I have a case where these calls fail (importing on a different server but both running Centos6/MariaDB 10):
mysqldump --single-transaction --routines --databases myalarm -uroot -p"PASSWORD" | gzip > /FILENAME.sql.gz
gunzip < FILENAME.sql.gz | mysql -p"PASSWORD" -uroot --comments
It produces a file that silently fails to import. Adding "--skip-extended-insert" gives me a file that's much easier to debug, and I find that this line is generated but can't be read (but no error is reported either exporting or importing):
INSERT INTO `panels` VALUES (1003,1,257126,141,6562,1,88891,'??\\\?ŖeV???,NULL);
Note that the terminating quote on the binary data is missing in the original.
select hex(packet_key) from panels where id=1003;
--> DE77CF5C075CE002C596176556AAF9ED
The column is binary data:
CREATE TABLE `panels` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`enabled` tinyint(1) NOT NULL DEFAULT '1',
`serial_number` int(10) unsigned NOT NULL,
`panel_types_id` int(11) NOT NULL,
`all_panels_id` int(11) NOT NULL,
`installers_id` int(11) DEFAULT NULL,
`users_id` int(11) DEFAULT NULL,
`packet_key` binary(16) NOT NULL,
`user_deleted` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
...
So no, not only can you not necessarily trust mysqldump, you can't even rely on it to report an error when one occurs.
An ugly workaround I used was to mysqldump excluding the two afflicted tables by adding options like this to the dump:
--ignore-table=myalarm.panels
Then this BASH script hack. Basically run a SELECT that produces INSERT values where the NULL columns are handled and the binary column gets turned into an UNHEX() call like so:
(123,45678,UNHEX("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"),"2014-03-17 00:00:00",NULL),
Paste it into your editor of choice to play with it if you need to.
echo "SET UNIQUE_CHECKS=0;SET FOREIGN_KEY_CHECKS=0;DELETE FROM panels;INSERT INTO panels VALUES " > all.sql
mysql -uroot -p"PASSWORD" databasename -e "SELECT CONCAT('(',id,',', enabled,',', serial_number,',', panel_types_id,',', all_panels_id,',', IFNULL(CONVERT(installers_id,CHAR(20)),'NULL'),',', IFNULL(CONVERT(users_id,CHAR(20)),'NULL'), ',UNHEX(\"',HEX(packet_key),'\"),', IF(ISNULL(user_deleted),'NULL',CONCAT('\"', user_deleted,'\"')),'),') FROM panels" >> all.sql
echo "SET UNIQUE_CHECKS=1;SET FOREIGN_KEY_CHECKS=1;" > all.sql
That gives me a file called "all.sql" that needs the final comma in the INSERT turned into a semicolon, then it can be run as above. I needed the "large import buffer" tweaks set in both the interactive mysql shell and the command line to process that file because it's large.
mysql ... --max_allowed_packet=1GB
When I reported the bug I was eventually pointed at the "--hex-blob" flag, which does the same as my workaround but in a trivial from my side way. Add that option, blobs get dumped as hex, the end.

The dumps generated from mysqldump can be trusted.
To avoid problems with encodings, binary transfers, etc, use the --hex-blob option, so it translates each byte in a hex number (for example, 'abc' becomes 0x616263). It will make the dump bigger, but it will be the most compatible and secure way to have the info (since it will be pure text, no weird misinterpretations due to special symbols generated with the binary data on a text file).
You can ensure the integrity (and speed up the transfer) of the dump files packing it on a rar or zip file. That way you can easily detect that it didn't get corrupted with the transfer.
When you try to load it on your server, check you have assigned on your my.cnf server config file
[mysqld]
max_allowed_packet=600M
or more if needed.
BTW right now i just did a migration, and dumped lots of binary data with mysqldump and it worked perfectly.

Yes, you can trust dumps generated by mysqldump.
Yes, you should use binary transfer in order to avoid any encoding conversion during transfer. MySQL dump adds control commands to the dump so that the server interprets the file in a specific encoding when reimporting. You do not want to change this encoding.

Related

How does mysqldump write binary data into files for MySQL logical backup?

I am using mysqldump to back up a table. The schema is as follows:
CREATE TABLE `student` (
`ID` bigint(20) unsigned DEFAULT NULL,
`DATA` varbinary(64) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I can use the following command to backup my data in the table.
mysqldump -uroot -p123456 tdb > dump.sql.
Now I want to write my own code using the MySQL c interface to generate the file similar to dump.sql.
So I just
read the data, and store it int char* p(using function mysql_fetch_row);
write data into file using fprintf(f,"%s",p);
However, when I check the table fields written into the file, I find that the file generated by mysqldump and by my own program are different.For example,
one data field in the file generated by mysqldump
'[[ \\^X\í^G\ÑX` C;·Qù^Dô7<8a>¼!{<96>aÓ¹<8c> HÀaHr^Q^^½n÷^Kþ<98>IZ<9f>3þ'
one data field in the file generated by my program
[[ \^Xí^GÑX` C;·Qù^Dô7<8a>¼!{<96>aÓ¹<8c> HÀaHr^Q^^½n÷^Kþ<98>IZ<9f>3þ
So, My question is: Why is writting data using sprintf(f,"%s",xx) for backup not correct? Is it enough to just add ' ' in the front and end of the string? If so, what if the data of that field happen to have ' in it?
Also, I wonder what it means to write some unprintable characters into a text file.
Also, I read stackoverflow.com/questions/16559086 and tried --hex-blob option. Is it OK if I transform every byte of the binary data into hex form and then write simple text strings into the dump.sql.
Then, instead of getting
'[[ \\^X\í^G\ÑX` C;·Qù^Dô7<8a>¼!{<96>aÓ¹<8c> HÀaHr^Q^^½n÷^Kþ<98>IZ<9f>3þ'
I got something like
0x5B5B095C18ED07D1586009433BB751F95E44F4378ABC217B9661D3B98C0948C0614872111EBD6EF70BFE98495A9F33FE
All the characters are printable now!
However, If I choose this method, I wonder if I can meet problems when I use other encoding schemes other than latin1.
Also, the above words are all my own ideas, I also wonder I there are other ways to back up data using the C interface.
Thank you for your help!
latin1, utf8, etc are CHARACTER SETs. They apply to TEXT and VARCHAR columns, not BLOB and VARBINARY columns.
Using --hex-blob is a good idea.
If you have "unprintable characters" in TEXT or CHAR, then either you have been trying to put a BLOB into such -- naughty -- or the print mechanism does is not set for the appropriate charset.

My flat files should be UCS-2, but I can't import into MySQL database

I have twenty pipe-delimited text files that I would like to convert into a MySQL database. The manual that came with the data say
Owing to the difficulty of displaying data for characters outside of
standard Latin Character Sets, all data is displayed using Unicode
(UCS-2) character encoding. All CSV files are structured using
commercial standards with the preferred format being pipe delimiter
(“|”) and carriage return + line feed (CRLF) as row terminators.
I am using MySQL Workbench 6.2.5 on Win 8.1, but the manual provides example SQL Server scripts to create the twenty tables. Here's one.
/****** Object: Table [dbo].[tbl_Company_Profile_Stocks] Script Date:
12/12/2007 08:42:05 ******/
CREATE TABLE [dbo].[tbl_Company_Profile_Stocks](
[BoardID] [int] NULL,
[BoardName] [nvarchar](255) NULL,
[ClientCompanyID] [int] NULL,
[Ticker] [nvarchar](255) NULL,
[ISIN] [nvarchar](255) NULL,
[OrgVisible] [nvarchar](255) NULL
)
Which I adjust as follows for MySQL.
/****** Object: Table dbo.tbl_Company_Profile_Stocks Script Date:
12/12/2007 08:42:05 ******/
CREATE TABLE dbo.tbl_Company_Profile_Stocks
(
BoardID int NULL,
BoardName varchar(255) NULL,
ClientCompanyID int NULL,
Ticker varchar(255) NULL,
ISIN varchar(255) NULL,
OrgVisible varchar(255) NULL
);
Because the manual says that the flat files are UCS-2, I set the dbo schema to UCS-2 default collation when I create it. This works fine AFAIK. It is the LOAD INFILE that fails. Because the data are pipe-delimited with CRLF line endings I try the following.
LOAD DATA LOCAL INFILE 'C:/Users/Richard/Dropbox/Research/BoardEx_data/unzipped/Company_Profile_Stocks20100416.csv'
INTO TABLE dbo.tbl_company_profile_stocks
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
But in this case now rows are imported and the message is 0 row(s) affected Records: 0 Deleted: 0 Skipped: 0 Warnings: 0. So I try \n line endings instead. This imports something, but my integer values become zeros and the text becomes very widely spaced. The message is 14121 row(s) affected, 64 warning(s): 1366 Incorrect integer value: <snip> Records: 14121 Deleted: 0 Skipped: 0 Warnings: 28257.
If I open the flat text file in Sublime Text 3, the Encoding Helper package suggests that the file has UTF-16 LE with BOM encoding. If I repeat the above with UTF-16 default collation when I create the dbo schema, then my results are the same.
How can I fix this? Encoding drives me crazy!
Probably the main problem is that the LOAD DATA needs this clause (see reference):
CHARACTER SET ucs2
In case that does not suffice, ...
Can you get a hex dump of a little of the csv file? I want to make sure it is really ucs2. (ucs2 is very rare. Usually text is transferred in utf8.) If it looks readable when you paste text into this forum, then it is probably utf8 instead.
There is no "dbo" ("database owner"), only database, in MySQL.
Please provide SHOW CREATE TABLE tbl_Company_Profile_Stocks
(just a recommendation) Don't prefix table names with "tbl_"; it does more to clutter than to clarify.
Provide a PRIMARY KEY for the table.
#Rick James had the correct answer (i.e., set the encoding for LOAD DATA with the CHARACTER SET option). But in my case this didn't work because MySQL doesn't support UCS-2.
Note
It is not possible to load data files that use the ucs2 character set.
Here are a few approaches that work here. In the end I went this SQLite rather than MySQL, but the last solution should work with MySQL, or any other DB that accepts flat files.
SQLiteStudio
SQLiteStudio was the easiest solution in this case. I prefer command line solutions, but the SQLiteStudio GUI accepts UCS-2 encoding and any delimiter. This keeps the data in UCS-2.
Convert to ASCII in Windows command line
The easiest conversion to ASCII is in the Windows command line with TYPE.
for %%f in (*.csv) do (
echo %%~nf
type "%%~nf.csv" > "%%~nf.txt"
)
This may cause problems with special characters. In my case it left in single and double quotes that caused some problems with the SQLite import. This is the crudest approach.
Convert to ASCII in Python
import codecs
import glob
import os
for fileOld in glob.glob('*.csv'):
print 'Reading: %s' % fileOld
fileNew = os.path.join('converted', fileOld)
with codecs.open(fileOld, 'r', encoding='utf-16le') as old, codecs.open(fileNew, 'w', encoding='ascii', errors='ignore') as new:
print 'Writing: %s' % fileNew
for line in old:
new.write(line.replace("\'", '').replace('"', ''))
This is the most extensible approach and would allow you more precisely control which data you convert or retain.

Error when migrating MySQL database to SQLite

I have access to a MySQL database hosted on a remote server. I am attempting to migrate this to a local SQLite database. To do this, I am using this script, as suggested by this question. The usage is
./mysql2sqlite mysqldump-opts db-name | sqlite3 database.sqlite
I tried doing exactly that (with no dump options) and sqlite3 returned an error:
Error: near line 4: near "SET": syntax error
So far, I have found that when I only specify one of my tables in the dump options like so
./mysql2sqlite db-name table-B | sqlite3 database.sqlite
It appears to work fine, but when I specify the first table (let's call it table-A) it returns this error. I'm pretty sure it's returning this error because of the output of mysql2sqlite. The 4th line (I guess the 4th logical line, or the command that starts on the 4th actual line) of the dump file looks like this:
CREATE TABLE "Association_data_interaction" (
"id" int(10) DEFAULT NULL,
...
"Comments" text CHARACTER SET latin1,
...
"Experiment" text CHARACTER SET latin1,
"Methods" text CHARACTER SET latin1,
...
);
With many other rows removed. I don't really know SQL that well, but as far as I can tell, the migration script is trying to output a dump file with commands that can create a new database, but the script has to translate between MySQL's output commands and the commands sqlite3 wants to create a database, and is failing to properly handle the text fields. I know that when I run SHOW COLUMNS; in the MySQL database the Comments, Experiment, and Methods columns are of the "text" type. What can I do make sqlite3 accept the database?
Note: I have editing access to the database, but I would much prefer to avoid that if at all possible. I do not believe I have administrative access to the database. Also, if it's relevant, the database has about 1000 tables, most of which have about 10,000 rows and 10-50 columns. I'm not too interested in the performance characteristics of the database; they're currently good enough for me.
That script is buggy; one of the bugs is that it expects a space before the final comma:
gsub( /(CHARACTER SET|character set) [^ ]+ /, "" )
Replace that line with:
gsub( /(CHARACTER SET|character set) [^ ]+/, "" )

MySQL Collation Issue

In my company, the tables in the database were poorly created. Each table has a different collation and charset.
This is very bad, sure, but it makes queries loose a lot of performance til the point the server crashes (and it isn't even a great database...).
I would like to know if there are any good MySQL tools, commands or procedures for converting table collation and charset.
Just executing the alter table and executing convert is braking special characters. Is it normal or I am doing something wrong?
EDIT:
As exemple: I have a table finance with uft8 collation and a table expense with latin swedish. Each table has between 1000 and 5000 rows. The following query takes about 15 second to execute:
select ex.* from expense ex
inner join finance fin on fin.ex_id = ex.id
Executing much complexer queries with bigger tables runs much faster when they have the same collation.
EDIT 2:
Another error in the database: row ids are all varchar(15), not int.
I know the fun of inheriting legacy schemas created by folks who think 'collation' is some form of illness.
The best option is to export the table with it's data to a SQL dump file using good ole' mysqldump. Then modify the create statements manually in the dump file to set the character set and collation. I'm a big fan of 'utf8'. If the dump file is huge, use command line stuff like sed to efficiently edit the file without having to open it in an editor.
Then drop the existing table re-import the modified dump.
Any other way you do this in my experience can be a roll of the dice.
This might be a good time to convert them all to the same storage engine as well or upgrade your MySQL server to 5.5.
I don't recommend to use a "tool" to fix this.
BEFORE YOU DO ANYTHING DUMP YOUR DB TO HAVE A BACKUP IN CASE YOU MESS IT UP ;)
You can streamline your character sets and collation two ways
Method 1: Move your data
Create a completely new database with correct character sets and collations configured in all tables
Fill your new tables with INSERT SELECT statements
e.g.
INSERT INTO newdatabase.table SELECT * FROM olddatabase.table
MySQL will automatically convert your data into the correct character set
Method 2: Alter your tables
If you change the character set of a existing table, all existing contents will be converted as well
e.g.
old table
CREATE TABLE `myWrongCharsetTable` (
`name` varchar(255) COLLATE latin1_german1_ci NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
put some data in for demo
INSERT INTO `myWrongCharsetTable` (`name`) VALUES ( 'I am a latino string' );
INSERT INTO `myWrongCharsetTable` (`name`) VALUES ( 'Mein Name ist Müller' );
INSERT INTO `myWrongCharsetTable` (`name`) VALUES ( 'Mein Name ist Möller' );
SELECT * FROM myWrongCharsetTable INTO outfile '/tmp/mylatinotable.csv';
On a UTF-8 console I do this
# cat /tmp/mylatinotable.csv
I am a latino string
Mein Name ist M▒ller
Mein Name ist M▒ller
right, strange charset.. this is latin 1 displayed on a utf-8 console
# cat /tmp/mylatinotable.csv | iconv -f latin1 -t utf-8
I am a latino string
Mein Name ist Müller
Mein Name ist Möller
Yep, all good
So how do I fix this now??
ALTER TABLE myWrongCharsetTable
MODIFY name varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
DEFAULT CHARSET = utf8 COLLATE utf8_unicode_ci;
That's it :)
Writing the outfile again
mysql> SELECT * FROM myWrongCharsetTable INTO outfile '/tmp/latinoutf8.csv';
Query OK, 3 rows affected (0.01 sec)
mysql> exit
Bye
dbmaster-001 ~ # cat /tmp/latinoutf8.csv
I am a latino string
Mein Name ist Müller
Mein Name ist Möller
Worked, all fine and we're happy
EDIT:
There's actually another method
Method 3: Dump, modify and reload your data
If you're good with sed and awk you can automate this, or edit the file manually
# dump the structure, possibly routines and triggers
mysqldump -h yourhost -p -u youruser --no-data --triggers --skip-comments --routines yourdatabase > database_structure_routines.sql
# dump the data
mysqldump -h yourhost -p -u youruser --no-create-info --skip-triggers --skip-routines yourdatabase > database_data.sql
Now open the database_structure_routines.sql in an editor of your choice and modify the tables to your needs
I recommend to drop all the comments like /*!40101 SET character_set_client = utf8 */ in your dumpfile because this could overwrite table defaults
When you're done, create a new database and structure
mysql > CREATE DATABASE `newDatabase` DEFAULT CHARSET utf8 COLLATE utf8_unicode_ci;
mysql > use `newDatabase`
mysql > ./database_structure_routines.sql;
Don't forget to recheck your tables
mysql > SHOW CREATE TABLE `table`;
If that's all right you can reimport your data, charset conversion again will be done automatically
mysql -h yourhost -p -u youruser newDatabase < database_data.sql
Hope this helps
You could try using CONVERT or CAST to change the charset - create a new column and use CAST to fill the new column with new corrected charset.
http://dev.mysql.com/doc/refman/5.0/en/charset-convert.html

The dreaded MySQL import encoding issue - revisited

I'm having the standard MySQL import encoding issue, but I can't seem to solve it.
My client has had a WordPress installation running for some time. I've dumped the database to a file, and imported it locally. The resulting pages have a splattering of � characters throughout.
I've inspected the database properties on both sides:
production: show create database wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
local: show create database wordpress;
CREATE DATABASE `wordpress` /*!40100 DEFAULT CHARACTER SET latin1 */
production: show create table wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL auto_increment,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7932 DEFAULT CHARSET=utf8
local: show create table wp_posts;
CREATE TABLE `wp_posts` (
`ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
...
KEY `post_date_gmt` (`post_date_gmt`)
) ENGINE=MyISAM AUTO_INCREMENT=7918 DEFAULT CHARSET=utf8
I've spent hours reading forums on how to squash the �, but I can't get anything to work. 99% of the answers say to match the character set between the databases. What I think should work if the following:
mysqldump --opt --compress --default-character-set=latin1 -uusername -ppassword wordpress | ssh username#anotherserver.net mysql --default-character-set=latin1 -uusername -ppassword wordpress
I've done it using the utf8 char-set as well. Still with the �'s.
I've tried modifying the SQL dump directly, putting with utf8 or latin1 in the "SET names UTF8" line. Still with the �'s.
Strange Symptoms
I'd expect these � characters to appear in place of special characters in the content, like ñ or ö, but I've seen it where there would normally be just a space. I've also seen it in place of apostrophes (but not all apostrophes), double quotes, and trademark symbols.
The � marks are pretty rare. They appear on average three to four times per page.
I don't see any �'s when viewing the database through Sequel Pro (locally or live). I don't see any �'s in the SQL when viewing through Textmate.
What am I missing?
EDIT
More info:
I've tried to determine what the live database thinks the encoding is. I ran show table status, and it seems that the Collations are a mix of utf8_general_ci,utf8_binandlatin1_swedish_ci`. What are they different? Does it matter?
I also ran: show variables like "character_set_database" and got latin1;
This is how I ended up solving my problem:
First mysqldump -uusername -ppassword --default-character-set=latin1 database -r dump.sql
Then run this script:
$search = array('/latin1/');
$replace = array('utf8');
foreach (range(128, 255) as $dec) {
$search[] = "/\x".dechex($dec)."/";
$replace[] = "&#$dec;";
}
$input = fopen('dump.sql', 'r');
$output = fopen('result.sql', 'w');
while (!feof($input)) {
$line = fgets($input);
$line = preg_replace($search, $replace, $line);
fwrite($output, $line);
}
fclose($input);
fclose($output);
The script finds all the hex characters above 127 and encoded them into their HTML entities.
Then mysql -uusername -ppassword database < result.sql
A common problem with older WordPress databases and even newer ones is that the database tables get set as latin-1 but the contents are actually encoded as UTF-8. If you try to export as UTF-8 MySQL will attempt to convert the (supposedly) Latin-1 data to UTF-8 resulting in double encoded characters since the data was already UTF-8.
The solution is to export the tables as latin-1. Since MySQL thinks they are already latin-1 it will do a straight export.
Change the character set from ‘latin1′ to ‘utf8′.
Since the dumped data was not converted during the export process, it’s actually UTF-8 encoded data.
Create your new table as UTF-8 If your CREATE TABLE command is in your SQL dump file, change the character set from ‘latin1′ to ‘utf8′.
Import your data normally. Since you’ve got UTF-8 encoded data in your dump file, the declared character set in the dump file is now UTF-8, and the table you’re importing into is UTF-8, everything will go smoothly
I was able to resolve this issue by modifying my wp-config.php as follows:
/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
define( 'DB_COLLATE', 'utf8_general_ci' );
I think you can fix this issue by this way:
$link = mysql_connect('localhost', 'mysql_user', 'mysql_password');
$db = mysql_select_db('mysql_db', $link);
mysql_query('set names utf8', $link);