Table files transfered between servers flags table as crashed - mysql

Work has a web site that uses large data sets, load balanced between two MySQL 5.6.16-64.2 servers using MyISAM, running on Linux (2.6.32-358.el6.x86_64 GNU/Linux.) This data is being updated hourly from a text based file set that is received from a MS-SQL database. To avoid disruption on reads from the web site and at the same time make sure the updates doesn't take too long following process was put in place:
Have the data one a third Linux box (only used for data update processing,) update the different data tables as needed, move a copy of the physical table files to the production servers under a temporary name, and then do a table swap by MySQL TABLE RENAME.
But every time the table (under the temporary name) is seen by the destination MySQL servers as being crashed and require repair. The repair takes too long, so it cannot be forced to do a repair before doing the table swap.
The processing is programmed in Ruby 1.8.7 by having a thread for each server (just as a FYI, this also happens if not doing it in a thread to a single server.)
The steps to perform file copy is as follows:
Use Net::SFTP to transfer the files to a destination folder that is not the database folder (done due to permissions.) Code example of the file transfer for the main table files (if table also has partition files then they are transferred separately and rspFile is assigned differently to match the temporary name.) For speed it is parallel uploaded:
Net::SFTP.start(host_server, host_user, :password => host_pwd) do |sftp|
uploads = fileList.map { |f|
rcpFile = File.basename(f, File.extname(f)) + prcExt + File.extname(f)
sftp.upload(f, "/home/#{host_user}/#{rcpFile}")
}
uploads.each { |u| u.wait }
end
Then assign the files the owner and group to the mysql user and to move the files to the MySQL database folder, by using Net::SSH to execute sudo shell commands:
Net::SSH.start(host_server, host_user, :port => host_port.to_i, :password => host_pwd) do |ssh|
doSSHCommand(ssh, "sudo sh -c 'chown mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
doSSHCommand(ssh, "sudo sh -c 'chgrp mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
doSSHCommand(ssh, "sudo sh -c 'mv /home/#{host_user}/#{prcLocalFiles} #{host_path}'", host_pwd)
end
The doSSHCommand method:
def doSSHCommand(ssh, cmd, pwd)
result = ""
ssh.open_channel do |channel|
channel.request_pty do |c, success|
raise "could not request pty" unless success
channel.exec "#{cmd}" do |c, success|
raise "could not execute command '#{cmd}'" unless success
channel.on_data do |c, data|
if (data[/\[sudo\]|Password/i]) then
channel.send_data "#{pwd}\n"
else
result += data unless data.nil?
end
end
end
end
end
ssh.loop
result
end
If done manually by using scp to move the files over, do the owner/group changes, and move the files, then it never crashes the table. By checking the file sizes compared between scp and Net::SFTP there are no difference.
Other process methods has been tried, but experience they take too long compared to using the method described above. Anyone have an idea of why the tables are being crashed and if there a solution to avoid table crash without having to do a table repair?

The tables are marked as crashed because you're probably getting race conditions as you copy the files. That is, there are writes pending to the tables during your execution of your Ruby script, so the resulting copy is incomplete.
The safer way to copy MyISAM tables is to run the SQL commands FLUSH TABLES followed by FLUSH TABLES WITH READ LOCK first, to ensure that all pending changes are written to the table on disk, and then block any further changes until you release the table lock. Then perform your copy script, and then finally unlock the tables.
This does mean that no one can update the tables while you're copying them. That's the only way you can ensure you get uncorrupt files.
But I have to comment that it seems like you're reinventing MySQL replication. Is there any reason you're not using that? It could probably work faster, better, and more efficiently, incrementally and continually updating only the parts of the tables that have changed.

The issue was found and solved:
The process database had the table files copied from one of the production databases, and did not show crashed on the process server and no issues when query and updating the data.
While searching the web following SO answer was found: MySQL table is marked as crashed
So by guessing that when the tables was copied from production to the process server, that the header info stayed the same and might interfere when copied back to the production servers during the processor. So it was tried by repairing the table on the process server and then run a few tests on our staging environment where the issue was also experienced. And surely enough that corrected the issue.
So the final solution was to repair the tables once on the process server before having the process script run hourly.

I see you've already found an answer, but two things struck me about this question.
One, you should look at rsync which gives you many more options, not the least of which is a speedier transfer, that may better suit this problem. File transfer between servers is basically why rsync exists as a thing.
Second, and I'm not trying to re engineer your system but you may have outgrown MySQL. It may not be the best fit for this problem. This problem may be better served by Riak where you have multiple nodes, or Mongo where you can deal with large files and have multiple nodes. Just two thoughts I had while reading your question.

Related

MySQL can take more than an hour to start

I have a mysql (Percona) 5.7 instance with over 1Million tables.
When I start the database, it can take more than an hour to start.
Errorlog doesn't show anything, but when I trace mysqld_safe, I found out that MySQL is getting a stat on every file in the DB.
Any idea why this may happen?
Also, please no suggestion to fix my schema, this is a blackbox.
Thanks
This turned out to be 2 issues (other than millions of tables)!
When MySQL start, and a crash recovery is needed, as of 5.7.17, it needs to traverse your datadir to build it's dictionary. This will be lifted in future releases(8.0), as MySQL will have it's own catalog, and will not rely on datadir content anymore. Doc states that this isn't done anymore. It's true and false. It doesn't read the 1st page of ibd files, but It does a file stat. Filed Bug
Once it finished (1), it starts a new process, "Executing 'SELECT * FROM INFORMATION_SCHEMA.TABLES;' to get a list of tables using the deprecated partition engine.". That of course open all the files again. Use disable-partition-engine-check if you think you don't need it. Doc
All this can be observed using sysdig. very powerful handy dtrace like tool.
sysdig proc.name=mysqld | grep "open fd="
Ok, now, it's time to reduce the number of files.

Transfer mySQL from development to production

I need to synch development mysql db with the production one.
Production db gets updated by user clicks and other data generated via web.
Development db gets updated with processing data.
What's the best practice to accomplish this?
I found some diff tools (eg. mySQL diff), but they don't manage updated records.
I also found some application solution: http://www.isocra.com/2004/10/dumptosql/
but I'm not sure it's a good practice as in this case I need to retest my code each time I add new innodb related tables.
Any ideas?
Take a look at mysqldump. It may serve you well enough for this.
Assuming your tables are all indexed with some sort of unique key you could do a dump and have it leave out the 'drop/create table' bits. Have it run as 'insert ignore' and you'll get the new data without effecting the existing data.
Another option would be to use the query part of mysqldump to dump only the new records from the production side. Again - have mysqldump leave off the 'drop/create' bits.

Copying data from PostgreSQL to MySQL

I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates&times from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.

mysqldump without interrupting live production INSERT

I'm about to migrate our production database to another server. It's about 38GB large and it's using MYISAM tables. Due to I have no physical access to the new server file system, we can only use mysqldump.
I have looked through this site and see whether will mysqldump online backup bring down our production website. From this post: Run MySQLDump without Locking Tables , it says obviously mysqldump will lock the db and prevent insert. But after a few test, I'm curious to find out it shows otherwise.
If I use
mysqldump -u root -ppassword --flush-logs testDB > /tmp/backup.sql
mysqldump will eventually by default do a '--lock-tables', and this is a READ LOCAL locks (refer to mysql 5.1 doc), where concurrent insert still available. I have done a for loop to insert into one of the table every second while mysqldump take one minute to complete. Every second there will be record inserted during that period. Which mean, mysqldump will not interrupt production server and INSERT can still go on.
Is there anyone having different experience ? I want to make sure this before carry on to my production server, so would be glad to know if I have done anything wrong that make my test incorrect.
[My version of mysql-server is 5.1.52, and mysqldump is 10.13]
Now, you may have a database with disjunct tables, or a data warehouse - where everything isn't normalized (at all), and where there are no links what so ever between the tables. In that case, any dump would work.
I ASSUME, that a production database containing 38G data is containing graphics in some form (BLOB's), and then - ubuquitously - you have links from other tables. Right?
Therefore, you are - as far as I can see it - at risk of loosing serious links between tables (usually primary / foreign key pairs), thus, you may capture one table at the point of being updated/inserted into, while its dependant (which uses that table as its primary source) has not been updated, yet. Thus, you will loose the so called integrity of your database.
More often than not, it is extremely cumbersome to restablish integrity, most often due to that the system using/generating/maintaining the database system has not been made as a transaction oriented system, thus, relationships in the database cannot be tracked except via the primary/foreign key relations.
Thus, you may surely get away with copying your table without locks and many of the other proposals here above - but you are at risk of burning your fingers, and depending on, how sensitive the operations are of the system - you may burn yourself severely or just get a surface scratch.
Example: If your database is a critical mission database system, containing recommended heart beat rate for life support devices in an ICU, I would think more than twice, before I make the migration.
If, however, the database contains pictures from facebook or similar site = you may be able to live with the consequences of anything from 0 up to 129,388 lost links :-).
Now - so much for analysis. Solution:
YOU WOULD HAVE to create a software which does the dump for you with full integrity, table-set by table-set, tuple by tuple. You need to identify that cluster of data, which can be copied from your current online 24/7/365 base to your new base, then do that, then mark that it has been copied.
IFFF now changes occur to the records you have already copied, you will need to do subsequent copy of those. It can be a tricky affair to do so.
IFFF you are running a more advanced version of MYSQL - you can actually create another site and/or a replica, or a distributed database - and then get away with it, that way.
IFFF you have a window of lets say 10 minutes, which you can create if you need it, then you can also just COPY the physical files, located on the drive. I am talking about the .stm .std - and so on - files - then you can close down the server for a few minutes, then copy.
Now to a cardinal question:
You need to do maintenance of your machines from time to time. Haven't your system got space for that kind of operations? If not - then what will you do, when the hard disk crashes. Pay attention to the 'when' - not 'if'.
1) Use of --opt is the same as specifying --add-drop-table, --add-locks, --create-options, --disable-keys, --extended-insert, --lock-tables, --quick, and --set-charset. All of the options that --opt stands for also are on by default because --opt is on by default.
2) mysqldump can retrieve and dump table contents row by row, or it can retrieve the entire content from a table and buffer it in memory before dumping it. Buffering in memory can be a problem if you are dumping large tables. To dump tables row by row, use the --quick option (or --opt, which enables --quick). The --opt option (and hence --quick) is enabled by default, so to enable memory buffering, use --skip-quick.
3) --single-transaction This option issues a BEGIN SQL statement before dumping data from the server (transactional tables InnoDB).
If your schema is a combination of both InnoDB and MyISAM , following example will help you:
mysqldump -uuid -ppwd --skip-opt --single-transaction --max_allowed_packet=512M db > db.sql
I've never done it before but you could try --skip-add-locks when dumping.
Though it might take longer, you could dump in several patches, each of which would take very little time to complete. Adding --skip--add-drop-table would allow you to upload these multiple smaller dumps into the same table without re-creating it. Using --extended-insert would make the sql file smaller to boot.
Possibly try something like mysqldump -u -p${password} --skip-add-drop-table --extended-insert --where='id between 0 and 20000' test_db test_table > test.sql. You would need to dump the table structures and upload them first in order to do it this way, or remove the --skip-add-drop-table for the first dump
mysqldump doesn't add --lock-tables by default. Try to use --lock-tables
Let me know if it helped
BTW - you should also use add-locks which will make your import faster!

I/O operations are blocking mysql transactions

here we have a x64 Debian Lenny with MySQL 5.1.47 and some InnoDB databases on it. ibdata files and other stuff is on the same file system (ext3). I've noticed, that on some situations there are many processes in MySQL process list which hang on "freeing items" state. This happens when I do the following on shell (file1 and file2 are about 2.5gb)
cat file1 file2 >new_file
or execute the following SQL statement
SELECT 'name' AS col UNION SELECT col FROM db_name.table_name INTO OUTFILE ('/var/xxx/yyy')
When one of these two things is running, then I can see many MySQL processes running endless with "freeing items" state (I'm using innotop). When killing this shell process (or SQL statement) then these blocked transactions disappear.
In internet I found some hints to disable InnoDB adaptive hash index and common query cache, but this doesn't help. Is there someone who has the same experience?
Regards
We've found turning on the deadline i/o scheduler to be of great help keeping our db from starving during hi external load on the file system. Try
echo deadline > /sys/block/sda/queue/scheduler
And test if the problem becones smaller. (Replace sda for the device your db is on)
Just stepping back a bit, have you done some basic InnoDB tuning? The MySQL defaults can be pretty limiting. A good overview:
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
The interesting is:
all databases on this server are kept in one InnoDB file (one_file_per_table is disabled)
we've written a small shell script, inserting 10 rows per second in a new table in a new database on this server
then we've copied some big files via "cat" (like described above) parallel...
and: nothing happend, no locks, no endless "freeing items"
so we've changed the shell script to insert the lines in a new table in the existing database, which causes this problems and copied the files again
and: the locks are back! huh?
last try: we've copied the "buggy" database on the server itself into a new database (with full structure and data) and started shell script again
and: no "freeing items" processes!?
I don't understand this behavior: inserting in the old database let these "freeing items" processes appear, but inserting into a new database (full copy of the old database) is fine?