Periodically replacing a running MySQL database - mysql

I've got a large-ish MySQL database which contains weather information. The DB is about 3.5 million rows, and 486 MB in size.
I get a refresh of this data every 6 hours, in the form of a mysql dump file that I can import. It takes more than 2 minutes to import the data and probably a similar amount of time to create the index.
Any thoughts on how I can import this data while still keeping the DB available and not losing responsiveness? My first thought was two have two databases within the same MySQL instance. I'd be running off DB1 and would load data into DB2 and then switch. However, I'm concerned that the load process would make DB1 unresponsive (or significantly slow).
So, my next thought is two have two different MySQL instances running, on different ports. While DB instance 1 is serving queries, DB instance 2 can be loaded with the next dataset. Then on the next query, the code switches to DB2.
That seems like it would work to me, but I wanted to check with some who have tried similar things in the past to see if there were any "gotchas" I was missing.
Thoughts?

Have two databases and switch between them after the import finishes each time.
Load on one database shouldn't make the other database unresponsive. 486MB is not too big for it all to fit in memory a couple of times over - depending I guess on whether you're in a small virtual server.
But even so, two MySQL instances on one server shouldn't present any differences in performance than two databases on one instance, except that two instances may actually take more memory and be more complicated to set up.

Related

MySQL db reports a vastly different total size of table on prod and local

I've got a production database with a wp_options table reportedly totalling around 951,679,500,288 (900GB+) in total data length. However, when I export the database and examine it locally it only reports a small number of MB (usually 3-7MB).
There are about 2,000-10,000 rows of data in this table. The reason for this fluctuation is there is a great number of transient cache data being stored in this table and the cron is scheduled to remove them routinely. That's why there is a discrepancy in the number of rows in the 2 screenshots. Otherwise, I have checked numerous times and the non-transient data is all exactly the same and represented in both environments.
It's like there's almost a TB of garbage data hiding in this table that I can't access or see and it's only present on production. Staging and local environments with the same database operate just fine without the missing ~TB of data.
summary of table on production:
summary of table from same db on local:
summary of both db sizes in comparison:
What could be causing the export of a SQL file to dis-regard 900GB of data? I've exported SQL and CSV via Adminer as well as using the 'wp db export' command.
And how could there be 900GB of data on production that I cannot see or account for other than when it calculates the total data length of the table?
It seems like deleted rows have not been purged. You can try OPTIMIZE TABLE.
Some WP plugins create "options" but fail to clean up after themselves. Suggest you glance through that huge table to see what patterns you find in the names. (Yeah, that will be challenging.) Then locate the plugin, and bury it more than 6 feet under.
OPTIMIZE TABLE might clean it up. But you probably don't know what the setting of innodb_file_per_table was when the table was created. So, I can't predict whether it will help a lot, take a long time, not help at all, or even crash.

How to do a one-time load for 4 billion records from MySQL to SQL Server

We have a need to do the initial data copy on a table that has 4+ billion records to target SQL Server (2014) from source MySQL (5.5). The table in question is pretty wide with 55 columns, however none of them are LOB. I'm looking for options for copying this data in the most efficient way possible.
We've tried loading via Attunity Replicate (which has worked wonderfully for tables not this large) but if the initial data copy with Attunity Replicate fails then it starts over from scratch ... losing whatever time was spent copying the data. With patching and the possibility of this table taking 3+ months to load Attunity wasn't the solution.
We've also tried smaller batch loads with a linked server. This is working but doesn't seem efficient at all.
Once the data is copied we will be using Attunity Replicate to handle CDC.
For something like this I think SSIS would be the most simple. It's designed for large inserts as big as 1TB. In fact, I'd recommend this MSDN article We loaded 1TB in 30 Minutes and so can you.
Doing simple things like dropping indexes and performing other optimizations like partitioning would make your load faster. While 30 minutes isn't a feasible time to shoot for, it would be a very straightforward task to have an SSIS package run outside of business hours.
My business doesn't have a load on the scale you do, but we do refresh our databases of more than 100M nightly which doesn't take more than 45 minutes, even with it being poorly optimized.
One of the most efficient way to load huge data is to read them by chunks.
I have answered many similar question for SQLite, Oracle, Db2 and MySQL. You can refer to one of them for to get more information on how to do that using SSIS:
Reading Huge volume of data from Sqlite to SQL Server fails at pre-execute (SQLite)
SSIS failing to save packages and reboots Visual Studio (Oracle)
Optimizing SSIS package for millions of rows with Order by / sort in SQL command and Merge Join (MySQL)
Getting top n to n rows from db2 (DB2)
On the other hand there are many other suggestions such as drop indexes in destination table and recreate them after insert, Create needed indexes on source table, use fast-load option to insert data ...

Rails Writes Take 100% Longer After Postgres Migration

I'm working on a migration from MySQL to Postgres on a large Rails app, most operations are performing at a normal rate. However, we have a particular operation that will generate job records every 30 minutes or so. There are usually about 200 records generated and inserted after which we have separate workers that pick up the jobs and work on them from another server.
Under MySQL it takes about 15 seconds to generate the records, and then another 3 minutes for the worker to perform and write back the results, one at a time (so 200 more updates to the original job records).
Under Postgres it takes around 30 seconds, and then another 7 minutes for the worker to perform and write back the results.
The table being written to has roughly 2 million rows, and 1 sequence column under ID.
I have tried tweaking checkpoint timeouts and sizes with no luck.
The table is heavily indexed and really shouldn't be any different than it was before.
I can't post code samples as its a huge codebase and without posting pages and pages of code it wouldn't make sense.
My question is, can anyone think of why this would possibly be happening? There is nothing in the Postgres log and the process of creating these objects has not changed really. Is there some sort of blocking synchronous write behavior I'm not aware of with Postgres?
I've added all sorts of logging in my code to spot errors or transaction failures but I'm coming up with nothing, it just takes twice as long to run, which doesn't seem correct to me.
The Postgres instance is hosted on AWS RDS on a M3.Medium instance type.
We also use New Relic, and it's showing nothing of interest here, which is surprising
Why does your job queue contain 2 million rows? Are they all live or are have not moved them to an archive table to keep your reporting more simple?
Have you used EXPLAIN on your SQL from a psql prompt or your preferred SQL IDE/tool?
Postgres is a completely different RDBMS then MySQL. It allocates space differently and manipulates space differently so may need to be indexed differently.
Additionally there's a tool called pgtune that will suggest configuration changes.
edit: 2014-08-13
Also, rails comes with a profiler that might add some insight. Here's a StackOverflow thread about rails profiling.
You also want to watch your DB server at the disk IO level. Does your job fulfillment to a large number of updates? Postgres created new rows when you update a existing rows, and marks the old rows as available, instead of just overwriting the existing row. So you may be seeing a lot more IO as a result of your RDBMS switch.

Move very large MYSQL table

I have a few very large MySql tables on an Amazon Std EBS 1TB Volume (the file-per-table flag is ON and each ibd file is about 150 GB). I need to move all these tables from database db1 to database db2. Alongwith this, I would also like to move the tables out to a different Amazon Volume (which I think is considered a different Partition/File-System, even if the FileSystem type is the same). The reason I am moving to another volume is so I can get another 1TB space.
Things I have tried:
RENAME TABLE db1.tbl1 TO db2.tbl1 does not help because I cannot move it out to a different volume. I cannot mount a Volume at db2 because then it is considered a different file-system and MYSQL fails with an error:
"Invalid cross-device link" error 18
Created a stub db2.tbl1, stopped mysql, deleted db2's tbl1 and copied over db1's tbl.ibd. Doesn't work (the db information is buried in the ibd?)
I do not want to try the obvious mysqldump-import OR selectinto-loadfile because each table takes a day and a half to move even with most optimizations (foreign-key checks off etc). If I take indexes out before the import , re-indexing takes long and the overall time taken is still too long.
Any suggestions would be much appreciated.
Usually what I would suggest in this case, is to create an ec2 snapshot of the volume and write that snapshot into your larger volume.
You'll need to resize the partition afterwards.
As a sidenote, if your database is that large, EBS might be a major bottleneck. You're better off getting locally attached storage, but unfortunately the process is a bit different.
You might want to use Percona xtrabackup for this:
https://www.percona.com/doc/percona-xtrabackup/LATEST/index.html

What if Database Size Limit exceeds?

I was just thinking if a mysql database's size limit exceeds what will happen to my app running on that.
My hosting only allows 1 GB of space per database. I know thats too much, but what if i make an app on people discussing something, and sometime after many years the database limit exceeds.
Then what will I do? And approximately how much text data can be stored in 1 GB?
And can I have 2 databases running one application. Like one database contains usernames and profiles and that sort of stuff, and the other contains questions and answers? And will that slow down process of getting everything?
Update: can i set up mysql on my own server and have overcome the size limitation?
Thanks.
There will be no speed disadvantage from splitting your tables across two databases (assuming both databases are on the same MySQL server), but if the data are logically part of the same application then it is more sensible they be grouped together.
When you want to refer to a table in another database, you also have to qualify it with the appropriate database name, which you could see as an inefficiency.
My guess is that if you approach 1GB with two databases or with one, it's not going to make a difference how your host treats you (it shouldn't make a difference for MySQL, after all). I suggest you not worry about it unless you're going to be generating data like nobody's business, and in that case you require a more dedicated host.
If you figure out years down the line that you're coming to the limit, you can make a decision then whether to dump some of your older data or move to a host that permits you to store more data.
I don't think your application would stop working immediately when you hit 1GB. I think it more likely that your host would start writing you emails telling you off and suggesting you upgrade packages, or something.
Most of this is specific to your host. 1GB is ~1 billion bytes (one letter usually = on byte). Having 2 databases will not slow anything down, so long as they're both on the same host and they're properly set up.