Best way to archive live MySQL database - mysql

We have a live MySQL database that is 99% INSERTs, around 100 per second. We want to archive the data each day so that we can run queries on it without affecting the main, live database. In addition, once the archive is completed, we want to clear the live database.
What is the best way to do this without (if possible) locking INSERTs? We use INSERT DELAYED for the queries.

http://www.maatkit.org/ has mk-archiver
archives or purges rows from a table to another table and/or a file. It is designed to efficiently “nibble” data in very small chunks without interfering with critical online transaction processing (OLTP) queries. It accomplishes this with a non-backtracking query plan that keeps its place in the table from query to query, so each subsequent query does very little work to find more archivable rows.
Another alternative is to simply create a new database table each day. MyIsam does have some advantages for this, since INSERTs to the end of the table don't generally block anyway, and there is a merge table type to being them all back together. A number of websites log the httpd traffic to tables like that.
With Mysql 5.1, there are also partition tables that can do much the same.

I use mysql partition tables and I've achieve wonderful results in all aspects.

Sounds like replication is the best solution for this. After the initial sync the slave gets updates via the Binary Log, thus not affecting the master DB at all.
More on replication.

MK-ARCHIVER is a elegant tool to archive MYSQL data.
http://www.maatkit.org/doc/mk-archiver.html

MySQL replication would work perfectly for this.
Master -> the live server.
Slave -> a different server on the same network.

Could you keep two mirrored databases around? Write to one, keep the second as an archive. Switch every, say, 24 hours (or however long you deem appropriate). Into the database that was the archive, insert all of todays activity. Then the two databases should match. Use this as the new live db. Take the archived database and do whatever you want to it. You can backup/extract/read all you want now that its not being actively written to.
Its kind of like having mirrored raid where you can take one drive offline for backup, resync it, then take the other drive out for backup.

Related

Giant unpartitioned MySQL table issues

I have a MySQL table which is about 8TB in size. As you can imagine, querying is horrendous.
I am thinking about:
Create a new table with partitions
Loop through a series of queries to dump data into those partitions
But the loop will require lots of queries to be submitted & each will be REALLY slow.
Is there a better way to do this? Repartitioning a production database in-situ isn't going to work - this seemed like an OK option, but slow
And is there a tool that will make life easier? Rather than a Python job looping & submitting jobs?
Thanks a lot in advance
You could use pt-online-schema-change. This free tool allows you to partition the table with an ALTER TABLE statement, but it does not block clients from using the table while it's restructuring it.
Another useful tool could be pt-archiver. You would create a new table with your partitioning idea, then pt-archiver to gradually copy or move data from the old table to the new table.
Of course try out using these tools in a test environment on a much smaller table first, so you get some practice using them. Do not try to use them for the first time on your 8TB table.
Regardless of what solution you use, you are going to need enough storage space to store the entire dataset twice, plus binary logs. The old table will not shrink, even as you remove data from it. So I hope your filesystem is at least 24TB. Or else the new table should be stored on a different server (or ideally several other servers).
It will also take a long time no matter which solution you use. I expect at least 4 weeks, and perhaps longer if you don't have a very powerful server with direct-attached NVMe storage.
If you use remote storage (like Amazon EBS) it may not finish before you retire from your career!
In my opinion, 8TB for a single table is a problem even if you try partitioning. Partitioning doesn't magically fix performance, and could make some queries worse. Do you have experience with querying partitioned tables? And you understand how partition pruning works, and when it doesn't work?
Before you choose partitioning as your solution, I suggest you read the whole chapter on partitioning in the MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html, especially the page on limitations: https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html Then try it out with a smaller table.
A better strategy than partitioning for data at this scale is to split the data into shards, and store each shard on one of multiple database servers. You need a strategy for adding more shards as I assume the data will continue to grow.

Performing Heavy Crunching On a Table Without Affecting the Table

I'm looking for some general advice on the best way to perform heavy crunching/data-mining on a database table, without affecting the performance of regular site queries on the table. Some of the calculations may involve joining several tables, and involve complex sorting and ordering. So "use better indexes" isn't always the solution.
This question isn't really specific. I'm looking for a general way to solve a problem that's come up many times over the years. So I don't have a specific table schema to show, a specific query to show. I've considered dumping the table first using mysqldump, and then re-importing the table under a different name, and then performing my heavy crunching on that temp table. My sysadmin hates the idea, so I'm looking for any other solutions people have come up with to deal with this type of problem.
If your "heavy crunching" is all read only and you are not doing anything that needs to be written back into your production data, use a Master/Slave replication and use the Slave for all your reporting and data analysis needs. The replication link will keep the values up to date on the Slave, and you can hit the Slave with as much load as you want without slowing down the Master which is serving your production system.
If you want to avoid affecting performance of your production database, the only solution I have used previously is to run your queries on another database server.
I would take a backup of the entire database and then restore it on a separate server.
Obviously, you cannot do this if you want to analyze real-time data. But for most analysis, a snapshot from the previous day is sufficient.

MySQL: time needed to set up index for column in the table InnoDB of about 100,000 rows?

I'm curious how long will it take to put an index on the column in the db, that is not empty and has about 100,000 rows. I understand, it depends, but does it take a 0.1 second, 1 second or 10 minutes? Should I stop any actions from users during the indexing of a column in the db?
The reason I ask is I don't know what columns should be indexed (to get an answer, I should measure it on practice, when the table grows), because the table is almost empty and the project is not uploaded.
Thank you.
1) Create a sqldump, load the data into another test database and try it out.
2) Yes, you should prevent user actions during any sort of DB maintenance. Even if the update is fast, something could go wrong and you should give yourself scope for the unexpected. It's also a good practice to get into.
3) The best way to figure out what indices is needed doesn't require much data:
a) Figure out the most frequently called and data intensive queries from a design point of view. Establish what they query on and the index requirements should be a logical next step.
b) Learn how to read query plans and look out for table scans, which you want to avoid.
If you're not sure which fields to index, then my first suggestion would be reading about indexes on MySQL web site.
Secondly, I would suggest you to take a dump of your data, insert that into a TEST environment that is completely separate from your live environment and perform your tests on that machine.
You can even consider creating a slave instance for your live MySQL instance. You stop the slave instance, make your indexing on it, then wait for it to synchronize with master again. Then you can switch master and slave instances so that your live environment doesn't get effected. Finally you can index the ex-master instance. But to do this, you have to know master-slave replications and how to convert a master to be a slave and vice versa.

Reducing priority of MySQL commands/jobs (add an index/other commands)?

We have a moderately large production MySQL database. Periodically, we will run commands, usually via a rails migration, that while running, bog down the database. As a specific example, we might add an index to a large table.
Is there any method that can reduce the priority MySQL gives to a specific task. A sort of "nice" within MySQL itself? I found this, which is what inspired the question:
PostgreSQL tips and tricks
Since adding an index causes the work to be done within the DB and MySQL process, lowering the priority of the Rails migration process seems like it won't help. Are there other ways we can lower the priority?
We use multiple, replicated database servers to make changes like this.
In our case, db1 is the master, replicated to db2. (db1->db2).
Start by making the change to db2. If things lock, replication will stall, but that's OK.
Move your traffic to db2. Any remnant traffic going to db1 will replicate over, so you won't lose anything.
Once there's no traffic on db1, rebuild it as a slave of db2 (db2->db1).
That's the general idea and you get very little downtime and you don't have to pull an all-nighter! We actually have three servers, so it's a little more complicated, but not much.
Good luck.
Unfortunately, there is no simple way to do this: commands that alter the database structure don't have a priority option.
If your tables are MyISAM, you can try this:
mysqlhotcopy to make a backup of the table
import that backup it into a different database server (one that's not under load)
make the changes there
make a mysqlhotcopy backup of the altered table
import it into the live server
Note that this may or may not be faster than adding the index on the live server, depending on the time it takes you to transfer the table back and forth.

Replication with lots of temporary table writes

I've got a database which I intend to replicate for backup reasons (performance is not a problem at the moment).
We've set up the replication correctly and tested it and all was fine.
Then we realized that it replicates all the writes to the temporary tables, which in effect meant that replication of one day's worth of data took almost two hours for the idle slave.
The reason for that is that we recompute some of the data in our db via cronjob every 15 mins to ensure it's in sync (it takes ~3 minutes in total, so it is unacceptable to do those operations during a web request; instead we just store the modifications without attempting to recompute anything while in the web request, and then do all of the work in bulk). In order to process that data efficiently, we use temporary tables (as there's lots of interdependencies).
Now, the first problem is that temporary tables do not persist if we restart the slave while it's in the middle of processing transactions that use that temp table. That can be avoided by not using temporary tables, although this has its own issues.
The more serious problem is that the slave could easily catch up in less than half an hour if it wasn't for all that recomputation (which it does one after the other, so there's no benefit of rebuilding the data every 15 mins... and you can literally see it stuck at, say 1115, only to quickly catch up and got stuck at 1130 etc).
One solution we came up with is to move all that recomputation out of the replicated db, so that the slave doesn't replicate it. But it has disadvantages in that we'd have to prune the tables it eventually updates, making our slave in effect "castrated", ie. we'd have to recompute everything on it before we could actually use it.
Did anyone have a similar problem and/or how would you solve it? Am I missing something obvious?
I've come up with the solution. It makes use of replicate-do-db mentioned by Nick. Writing it down here in case somebody had a similar problem.
The problem with just using replicate-(wild-)do* options in this case (like I said, we use temp tables to repopulate a central table) is that either you ignore temp tables and repopulate the central one with no data (which causes further problems as all the queries relying on the central table being up-to-date will produce different results) or you ignore the central table, which has a similar problem. Not to mention, you have to restart mysql after adding any of those options to my.cnf. We wanted something that would cover all those cases (and future ones) without the need for any further restart.
So, what we decided to do is to split the database into the "real" and a "workarea" databases. Only the "real" database is replicated (I guess you could decide on a convention of table names to be used for replicate-wild-do-table syntax).
All the temporary table work is happening in "workarea" db, and to avoid the dependency problem mentioned above, we won't populate the central table (which sits in "real" db) by INSERT ... SELECT or RENAME TABLE, but rather query the tmp tables to generate a sort of a diff on the live table (ie. generate INSERT statements for new rows, DELETE for the old ones and update where necessary).
This way the only queries that are replicated are exactly the updates that are required, nothing else, ie. some (most?) of the recomputation queries hapenning every fifteen minutes might not even make its way to slave, and the ones that do will be minimal and not computationally expensive at all, just simple INSERTs and DELETEs.
In MySQL, as of 5.0 I believe, you can do table wildcards to replicate specific tables. There are a number of command-line options that can be set but you can also do this via your MySQL config file.
[mysqld]
replicate-do-db = db1
replicate-do-table = db2.mytbl2
replicate-wild-do-table= database_name.%
replicate-wild-do-table= another_db.%
The idea being that you tell it to not replicate any tables other than the ones you specify.