I am having a huge amount of problems moving a large mysql database and would love to know anyones experience with such.
Situation:
Server 1 is housing the Mysql database. Size: 3TB
Attempts to move it:
1. Mysqldump does NOT work. It simply takes too long (I waited 3 days and got a ~100 gig sql file and it didn't seem like it was going to finish).
2. Attempts to copy directly the data directory, but ensuring it works and is consistent is a very difficult for me, and this manual process seems to be risky.
3. Which lead me to mysqldbcopy
I found the mysqldbcopy command and is wondering if anyone knows the perforamnce and how it works internally. Does anyone have any suggestions?
Can you provide more information? Do you mean to say, your set up has only one server with all the data?
If its a MyISAM table copy the files will work. I have tried this.
Have you considered setting up a slave DB, but this would require work like enabling replication logs etc. (This won't be too much work)
I once remember in a master slave db set up doing something like Load data from master, for the slave and that the master had hundreds of millions of rows.
If you can setup master-slave db setup, you might be able to continue to use the existing db while the Slave instance catches up, if this
option is available.
This might be a good example of master slave replication, to look at: https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql
I have never played with mysqldbcopy, but I would love to know more.
Related
We hired an intern and want to let him play around with our data to generate useful reports. Currently we just took a database snapshot and created a new RDS instance that we gave him access to. But that is out of date almost immediately due to changes on the production database.
What we'd like is a live (or close-to-live) mirror of our actual database that we can give him access to without worrying about him modifying any real data or accidentally bringing down our production database (eg by running a silly query like SELECT (*) FROM ourbigtable or a really slow join).
Would a read replica be suitable for this purpose? It looks like it would at least be staying up to date but I'm not clear what would happen if a read replica went down or if data was accidentally changed on it or any other potential liabilities.
The only thing I could find related to this was this SO question and this has me a bit worried (emphasis mine):
If you're trying to pre-calculate a lot of data and otherwise modify
what's on the read replica you need to be really careful you're not
changing data -- if the read is no longer consistent then you're in
trouble :)
TL;DR Don't do it unless you really know what you're doing and you
understand all the ramifications.
And bluntly, MySQL replication can be quirky in my experience, so even
knowing what is supposed to happen and what does happen if there's as
the master tries to write updated data to slave you've also
updated.... who knows.
Is there any risk to the production database if we let an intern have at it on an unreferenced read replica?
We've been running read-replicas of our production databases for a couple years now without any significant issues. All of our sales, marketing, etc. people who need the ability to run queries are provided access to the replica. It's worked quite well and has been stable for the most part. The production databases are locked down so that only our applications can connect to it, and the read-replicas are accessible only via SSL from our office. Setting up the security is pretty important since you would be creating all the user accounts on the master database and they'd then get replicated to the read-replica.
I think we once saw a read-replica get into a bad state due to a hardware-related issue. The great thing about read-replicas though is that you can simply terminate one and create a new one any time you want/need to. As long as the new replica has the exact same instance name as the old one its DNS, etc. will remain unchanged, so aside from being briefly unavailable everything should be pretty much transparent to the end users. Once or twice we've also simply rebooted a stuck read-replica and it was able to eventually catch up on its own as well.
There's no way that data on the read-replica can be updated by any method other than processing commands sent from the master database. RDS simply won't allow you to run something like an insert, update, etc. on a read-replica no matter what permissions the user has. So you don't need to worry about data changing on the read-replica causing things to get out of sync with the master.
Occasionally the replica can get a bit behind the production database if somebody submits a long running query, but it typically catches back up fairly quickly once the query completes. In all our production environments we have a few monitors set up to keep an eye on replication and to also check for long running queries. We make use of the pmp-check-mysql-replication-delay command in the Percona Toolkit for MySQL to keep an eye on replication. It's run every few minutes via Nagios. We also have a custom script that's run via cron that checks for long running queries. It basically parses the output of the "SHOW FULL PROCESSLIST" command and sends out an e-mail if a query has been running for a long period of time along with the username of the person running it and the command to kill the query if we decide we need to.
With those checks in place we've had very little problem with the read-replicas.
The MySQL replication works in a way that what happens on the slave has no effect on the master.
A replication slave asks for a history of events that happened on the master and applies them locally. The master never writes anything on the slaves: the slaves read from the master and do the writing themselves. If the slave fails to apply the events it read from the master, it will stop with an error.
The problematic part of this style of data replication is that if you modify the slave and later modify the master, you might have a different value on the slave than on the master. This can be avoided by turning on the global read_onlyvariable.
I am trying to build a website that uses MySQL DB. What I am trying to do is make my database accessed by two servers, which means when server 1 is down server 2 can access the same database and the website continues working normally. I've read about multimaster replication but it does not seem to be what I need. And what happens when using a master slave replication and the master server goes down ? How it can be restored ?
Thanks for your help.
I think the master slave pattern is exactly what you're looking for. The master handles all the writes and the slaves handle all the reads. If your cloud hosting with someone like Rackspace or AWS they make it very easy to set up the data replication across each mode. As for your last sub question about what happens if the master goes down, I believe it is pretty straight forward to set up fallbacks for that too. There are likely several approaches but at the most basic level I know you can set up multiple db nodes (with a fallback algorithm) just like any other instance.
A final note... If its your first time doing this I highly recommend Rackspace because their support is amazing and they make a huge effort when you start to explain all your option and help you pick the best strategy.
Ps: retreading your question, it's a little unclear what you're trying to accomplish. You mention two servers accessing one DB and you also talk about redundant setups for multiple db instances. They're really two separate issues. The former is trivially easy because you can always just point more than one server to a db. As long as the credentials are right it will work. But the tricky part is keeping the data synched properly. If both are reading and writing the same tables things are going to bang together. That's where the master slave pattern comes into play. All the writes go through the master but anyone can read from any slave because the data gets replicated.
I have 200GB / 400Mrows mysql/innodb database - far beyond what's reasonable as I found out.
One surprising problem is restoring backups. mysqldump generates huge sql files, and they take about a week to import back into a fresh database (attempts at making it faster like bigger/smaller transactions, turning off keys during import etc., network compression etc. failed so far, myisam import seems 2x faster but then there would be no transactions).
What's worse - and I hope to get some help with this - a network connection which transfers >200GB over a time period of a week has a non-trivial chance of breaking, and sql import process cannot be continued in any non-trivial way.
What would be the best way of dealing with it? Right now if I notice a broken connection I manually try to figure out when it ended by checking highest primary key of the last imported table, and then have a perlscript which basically does this:
perl -nle 'BEGIN{open F, "prelude.txt"; #a=<F>; print #a; close F;}; print if $x; $x++ if /INSERT.*last-table-name.*highest-primary-key/'
This really isn't the way to go, so what would be the best way?
Does your MySQL box have enough hard drive space for all the data doubled? Local storage would be best here, but if it's not an option, you could also try some sort of NAS device utilizing iSCSI. It's still happening over the network, but in this case you get more throughput and reliability, because you're only relying on a NAS which has a pretty slim OS and almost never has to be rebooted.
You can't use mysqldump to backup large databases - 200G is feasible but bigger ones it gets worse and worse.
Your best bet is to take a volume snapshot of the database directory and zip that up somehow - that's what we've generally done - or rsync it somewhere else.
If your filesystem or block device does not support snapshots then you're basically in trouble. You can shut the db down to take a backup, but I don't imagine you want to do that.
To restore it, just do the opposite then restart and wait (possibly some time) for innodb recovery to fix things up.
The maatkit mk-parallel-dump and restore tools are a bit better than mysqldump, speed-wise - but I'm not 100% convinced of their correctness
Edit: re-reading the question, I think filesystem snapshot + rsync is probably the best way to go; you can do this without impacting the live system (you'll only need to transfer what changed since the last backup) too much and you can resume the rsync if the connection fails, and it'll continue where it left off.
Do you need everything in the database?
Can you push some of the information to an archive database and add something into your application that would allow people to view records in the archive,
Obviously this depends a lot upon your application and set up, but it may be a solution? Your DB is probably only going to get bigger....
I've had a tough time setting up my replication server. Is there any program (OS X, Windows, Linux, or PHP no problem) that lets me monitor and resolve replication issues? (btw, for those following, I've been on this issue here, here, here and here)
My production database is several megs in size and growing. Every time the database replication stops and the databases inevitably begin to slide out of sync i cringe. My last resync from dump took almost 4 hours roundtrip!
As always, even after sync, I run into this kind of show-stopping error:
Error 'Duplicate entry '252440' for key 1' on query.
I would love it if there was some way to closely monitor whats going on and perhaps let the software deal with it. I'm even all ears for service companies which may help me monitor my data better. Or an alternate way to mirror altogether.
Edit: going through my previous questions i found this which helps tremendously. I'm still all ears on the monitoring solution.
To monitor the servers we use the free tools from Maatkit ... simple, yet efficient.
The binary replication is available in 5.1, so I guess you've got some balls. We still use 5.0 and it works OK, but of course we had our share of issues with it.
We use a Master-Master replication with a MySql Proxy as a load-balancer in front, and to prevent it from having errors:
we removed all unique indexes
for the few cases where we really needed unique constraints we made sure we used REPLACE instead of INSERT (MySql Proxy can be used to guard for proper usage ... it can even rewrite your queries)
scheduled scripts doing intensive reports are always accessing the same server (not the load-balancer) ... so that dangerous operations are replicated safely
Yeah, I know it sounds simple and stupid, but it solved 95% of all the problems we had.
We use mysql replication to replicate data to close to 30 servers. We monitor them with nagios. You can probably check the replication status and use an event handler to restart it with 'SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; Start Slave;'. That will fix the error, but you'll lose the insert that caused the error.
About the error, do you use memory tables on your slaves? I ask this because the only time we ever got a lot of these error they where caused by a bug in the latests releases of mysql. 'Delete From Table Where Field = Value' will delete only one row in memory tables even though they where multiple rows.
mysql bug descritpion
Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.
We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync
GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).
I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489