Full complete MySQL database replication? Ideas? What do people do? - mysql

Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.

We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync

GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).

I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489

Related

RDS Read Replica Considerations

We hired an intern and want to let him play around with our data to generate useful reports. Currently we just took a database snapshot and created a new RDS instance that we gave him access to. But that is out of date almost immediately due to changes on the production database.
What we'd like is a live (or close-to-live) mirror of our actual database that we can give him access to without worrying about him modifying any real data or accidentally bringing down our production database (eg by running a silly query like SELECT (*) FROM ourbigtable or a really slow join).
Would a read replica be suitable for this purpose? It looks like it would at least be staying up to date but I'm not clear what would happen if a read replica went down or if data was accidentally changed on it or any other potential liabilities.
The only thing I could find related to this was this SO question and this has me a bit worried (emphasis mine):
If you're trying to pre-calculate a lot of data and otherwise modify
what's on the read replica you need to be really careful you're not
changing data -- if the read is no longer consistent then you're in
trouble :)
TL;DR Don't do it unless you really know what you're doing and you
understand all the ramifications.
And bluntly, MySQL replication can be quirky in my experience, so even
knowing what is supposed to happen and what does happen if there's as
the master tries to write updated data to slave you've also
updated.... who knows.
Is there any risk to the production database if we let an intern have at it on an unreferenced read replica?
We've been running read-replicas of our production databases for a couple years now without any significant issues. All of our sales, marketing, etc. people who need the ability to run queries are provided access to the replica. It's worked quite well and has been stable for the most part. The production databases are locked down so that only our applications can connect to it, and the read-replicas are accessible only via SSL from our office. Setting up the security is pretty important since you would be creating all the user accounts on the master database and they'd then get replicated to the read-replica.
I think we once saw a read-replica get into a bad state due to a hardware-related issue. The great thing about read-replicas though is that you can simply terminate one and create a new one any time you want/need to. As long as the new replica has the exact same instance name as the old one its DNS, etc. will remain unchanged, so aside from being briefly unavailable everything should be pretty much transparent to the end users. Once or twice we've also simply rebooted a stuck read-replica and it was able to eventually catch up on its own as well.
There's no way that data on the read-replica can be updated by any method other than processing commands sent from the master database. RDS simply won't allow you to run something like an insert, update, etc. on a read-replica no matter what permissions the user has. So you don't need to worry about data changing on the read-replica causing things to get out of sync with the master.
Occasionally the replica can get a bit behind the production database if somebody submits a long running query, but it typically catches back up fairly quickly once the query completes. In all our production environments we have a few monitors set up to keep an eye on replication and to also check for long running queries. We make use of the pmp-check-mysql-replication-delay command in the Percona Toolkit for MySQL to keep an eye on replication. It's run every few minutes via Nagios. We also have a custom script that's run via cron that checks for long running queries. It basically parses the output of the "SHOW FULL PROCESSLIST" command and sends out an e-mail if a query has been running for a long period of time along with the username of the person running it and the command to kill the query if we decide we need to.
With those checks in place we've had very little problem with the read-replicas.
The MySQL replication works in a way that what happens on the slave has no effect on the master.
A replication slave asks for a history of events that happened on the master and applies them locally. The master never writes anything on the slaves: the slaves read from the master and do the writing themselves. If the slave fails to apply the events it read from the master, it will stop with an error.
The problematic part of this style of data replication is that if you modify the slave and later modify the master, you might have a different value on the slave than on the master. This can be avoided by turning on the global read_onlyvariable.

MySQL single DB accessed by two servers

I am trying to build a website that uses MySQL DB. What I am trying to do is make my database accessed by two servers, which means when server 1 is down server 2 can access the same database and the website continues working normally. I've read about multimaster replication but it does not seem to be what I need. And what happens when using a master slave replication and the master server goes down ? How it can be restored ?
Thanks for your help.
I think the master slave pattern is exactly what you're looking for. The master handles all the writes and the slaves handle all the reads. If your cloud hosting with someone like Rackspace or AWS they make it very easy to set up the data replication across each mode. As for your last sub question about what happens if the master goes down, I believe it is pretty straight forward to set up fallbacks for that too. There are likely several approaches but at the most basic level I know you can set up multiple db nodes (with a fallback algorithm) just like any other instance.
A final note... If its your first time doing this I highly recommend Rackspace because their support is amazing and they make a huge effort when you start to explain all your option and help you pick the best strategy.
Ps: retreading your question, it's a little unclear what you're trying to accomplish. You mention two servers accessing one DB and you also talk about redundant setups for multiple db instances. They're really two separate issues. The former is trivially easy because you can always just point more than one server to a db. As long as the credentials are right it will work. But the tricky part is keeping the data synched properly. If both are reading and writing the same tables things are going to bang together. That's where the master slave pattern comes into play. All the writes go through the master but anyone can read from any slave because the data gets replicated.

Best design for distributed databases

I have a project where we have one central system that exposes an API on top of MySQL. We now need to replicate that same service locally on several different boxes (which could be 50+). We wanted to have a local cache of the DB on each of those boxes to ensure quick responses and failover if the "central" system goes down.
Any idea what's the best design for this? I was thinking some sort of master/slave set up, but I'm not sure if that works with 50+ servers. I'm not sure what's the best approach.
What about MySQL's own replication solution? If you've already ruled that out, you should say why.
With the replication that I've seen, you have a master and slave(s). If the master goes down, one of the slaves takes over. With 50+ slaves, you'd have a long (and confusing) chain of masters.
Not knowing anything about the type of data you have or the read/write percentages, I would suggest one of the following:
Cache static data locally (memcache, etc). Reads would be local, with writes going back to the mysql master. This works for mostly-static configuration information. I have 6 servers in that setup now.
Shard your data. With 50 servers, set them up in 25 master/slave pairs and put 1/25th of the data on each shard. Get one more server for N+1 redundancy.
Hope that helps.

Mysql remote synch

We currently have an application located on a remote server, and our call center uses this application to perform customer transactions.
We plan to setup asterisk on a local server to help us with all the call routing and recording, for asterisk to work smoothly we have to move our application from the remote server to the local.
Its will be easy to mover all data to the local server and do transactions locally, but there is an option for users to do transactions online too which will hit the remote server database.
The reason we still have the remote application because of the reliable infrastructure and backup solution provided by rackspace.
If we move application to local server i am looking at a reliable solution for syncing remote and local databases so that we can handle local as well as online transactions.
Why not use mysql master-master replication and hold definitive data at both ends? (Note you'll have to do some reading on on auto_increment_increment and auto_increment_offset)
symcbean's answer is basically correct. I'd add this article as a good starting place to understand master-master replication. I'd further recommend High Performance MySQL as a good reference for a deeper understanding of the techniques and issues.
There are some issues that you will have to face doing writes to two non-colocated MySQL servers. You'll have replication lag to deal with, so the databases won't necessarily be completely in sync, but will only be "eventually consistent". Also, if you have both sides doing updates on content, you can end up with data integrity issues. If your system leans towards INSERTs more then UPDATES for the write operations, it is less likely that you'll run into issues. Also, if the subset of data that is likely to be modified tends to be localized around one or the other of the servers, you'll run into fewer issues.
Otherwise, you'll probably want to roll your own solution that is designed towards the specific use cases of your application.

Securely deleting/wiping MySQL data from hard disk

We're running MySQL 5.1 on CentOS 5 and I need to securely wipe data. Simply issuing a DELETE query isn't an option, we need to comply with DoD file deletion standards. This will be done on a live production server without taking MySQL down. Short of taking the server down and using a secure deletion utility on the DB files is there a way to do this?
Update
The data sanitization will be done once per database when we remove some of the tables. We don't need to delete data continuously. CPU time isn't an issue, these servers are nowhere near capacity.
If you need a really secure open source database, you could take a look at Security Enhanced PostgreSQL running on SELinux. A very aggresive vacuum strategy can assure your data gets overwritten quickly. Strong encryption can be of help as well, pgcrypto has some fine PGP functions.
Not as far as I know, secure deletion requires the CPU to do a bit of work, especially DoD standard which I believe is 3 passes of inflating 1's and 0's. You can, however, encrypt the harddrive. Given that a user would need phsyical access and a password for the CentOS to recover the data. As long as you routinely monitory access logs for suspicious activity on the server, this should be "secure".
While searching found this article: Six Steps to Secure Sensitive Data in MySQL
Short of that though, I do not think a DoD standard wipe is viable or even possible without taking the server down.
EDIT
One thing I found is this software: data wiper. If there is a linux comparable version of that, that might work "wipes unused disk space". But again this may take a major performance toll on your server, so may be advisable to run at night at a set time and I do not know what the re-precautions (if any) of doing this too often to a harddrive.
One other resource is this forum thread. It talks about wiping unused space etc. From that thread one resource stands out in particular: secure_deletion toolkit - sfill. The man page should be helpful.
If it's on a disk, you could just use: http://lambda-diode.com/software/wipe/