I'm setting up a SQL Server 2008 server on a production server, which way is the best to backup this data? Should I use replication and then backup that server? Should I just use a simple command-line script and export the data? Which replication method should i use?
The server is going to be pretty loaded so I need an efficent method.
I have access to multiple computers that I can use.
A very simple yet good solution is to run a full backup using sqlcmd (formerly osql) locally, then copy the BAK file over the network to a NAS or other store. It's sub-optimal in terms of network/disk usage, but it's very safe because every backup is independent and given that the process is very simple it is also very robust.
Moreover, this even works in Express editions.
The "best" backup solutions depends upon your recovery criteria.
If you need immediate access to the data in the event of a failure, a three server database mirroring scenario (live, mirror and witness) would seem to fit - although your application may need to be adapted to make use of automatic failover. "Log shipping" may produce similar results (although without automatic failover, or need for a witness).
If, however, there's some wiggle room in the recovery time, regular scheduled backups of the database (e.g., via SQL Agent) and it's transaction logs will allow you to do point-in-time restores. The frequency of backups would be determined by database size, how frequently the data is updated, and how far you are willing to rollback the database in the event of complete failure (unless you can extract a transaction log backup out of a failed server, you can only recover to the latest backup)
If you're looking to simply rollback to known-good states after, say, user error, you can make use of database snapshots as a lightweight "backup" scenario - but these are useless in the event of server failure. They're near instantaneous to create, and only take up room when the data changed - but incur a slight performance overhead.
Of course, these aren't the only backup solutions, nor are they mutually exclusive - just the ones that came to mind.
Related
We are running a Java PoS (Point of Sale) application at various shops, with a MySql backend. I want to keep the databases in the shops synchronised with a database on a host server.
When some changes happen in a shop, they should get updated on the host server. How do I achieve this?
Replication is not very hard to create.
Here's some good tutorials:
http://www.ghacks.net/2009/04/09/set-up-mysql-database-replication/
http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html
http://www.lassosoft.com/Beginners-Guide-to-MySQL-Replication
Here some simple rules you will have to keep in mind (there's more of course but that is the main concept):
Setup 1 server (master) for writing data.
Setup 1 or more servers (slaves) for reading data.
This way, you will avoid errors.
For example:
If your script insert into the same tables on both master and slave, you will have duplicate primary key conflict.
You can view the "slave" as a "backup" server which hold the same information as the master but cannot add data directly, only follow what the master server instructions.
NOTE: Of course you can read from the master and you can write to the slave but make sure you don't write to the same tables (master to slave and slave to master).
I would recommend to monitor your servers to make sure everything is fine.
Let me know if you need additional help
three different approaches:
Classic client/server approach: don't put any database in the shops; simply have the applications access your server. Of course it's better if you set a VPN, but simply wrapping the connection in SSL or ssh is reasonable. Pro: it's the way databases were originally thought. Con: if you have high latency, complex operations could get slow, you might have to use stored procedures to reduce the number of round trips.
replicated master/master: as #Book Of Zeus suggested. Cons: somewhat more complex to setup (especially if you have several shops), breaking in any shop machine could potentially compromise the whole system. Pros: better responsivity as read operations are totally local and write operations are propagated asynchronously.
offline operations + sync step: do all work locally and from time to time (might be once an hour, daily, weekly, whatever) write a summary with all new/modified records from the last sync operation and send to the server. Pros: can work without network, fast, easy to check (if the summary is readable). Cons: you don't have real-time information.
SymmetricDS is the answer. It supports multiple subscribers with one direction or bi-directional asynchronous data replication. It uses web and database technologies to replicate tables between relational databases, in near real time if desired.
Comprehensive and robust Java API to suit your needs.
Have a look at Schema and Data Comparison tools in dbForge Studio for MySQL. These tool will help you to compare, to see the differences, generate a synchronization script and synchronize two databases.
We currently have an application located on a remote server, and our call center uses this application to perform customer transactions.
We plan to setup asterisk on a local server to help us with all the call routing and recording, for asterisk to work smoothly we have to move our application from the remote server to the local.
Its will be easy to mover all data to the local server and do transactions locally, but there is an option for users to do transactions online too which will hit the remote server database.
The reason we still have the remote application because of the reliable infrastructure and backup solution provided by rackspace.
If we move application to local server i am looking at a reliable solution for syncing remote and local databases so that we can handle local as well as online transactions.
Why not use mysql master-master replication and hold definitive data at both ends? (Note you'll have to do some reading on on auto_increment_increment and auto_increment_offset)
symcbean's answer is basically correct. I'd add this article as a good starting place to understand master-master replication. I'd further recommend High Performance MySQL as a good reference for a deeper understanding of the techniques and issues.
There are some issues that you will have to face doing writes to two non-colocated MySQL servers. You'll have replication lag to deal with, so the databases won't necessarily be completely in sync, but will only be "eventually consistent". Also, if you have both sides doing updates on content, you can end up with data integrity issues. If your system leans towards INSERTs more then UPDATES for the write operations, it is less likely that you'll run into issues. Also, if the subset of data that is likely to be modified tends to be localized around one or the other of the servers, you'll run into fewer issues.
Otherwise, you'll probably want to roll your own solution that is designed towards the specific use cases of your application.
We're running MySQL 5.1 on CentOS 5 and I need to securely wipe data. Simply issuing a DELETE query isn't an option, we need to comply with DoD file deletion standards. This will be done on a live production server without taking MySQL down. Short of taking the server down and using a secure deletion utility on the DB files is there a way to do this?
Update
The data sanitization will be done once per database when we remove some of the tables. We don't need to delete data continuously. CPU time isn't an issue, these servers are nowhere near capacity.
If you need a really secure open source database, you could take a look at Security Enhanced PostgreSQL running on SELinux. A very aggresive vacuum strategy can assure your data gets overwritten quickly. Strong encryption can be of help as well, pgcrypto has some fine PGP functions.
Not as far as I know, secure deletion requires the CPU to do a bit of work, especially DoD standard which I believe is 3 passes of inflating 1's and 0's. You can, however, encrypt the harddrive. Given that a user would need phsyical access and a password for the CentOS to recover the data. As long as you routinely monitory access logs for suspicious activity on the server, this should be "secure".
While searching found this article: Six Steps to Secure Sensitive Data in MySQL
Short of that though, I do not think a DoD standard wipe is viable or even possible without taking the server down.
EDIT
One thing I found is this software: data wiper. If there is a linux comparable version of that, that might work "wipes unused disk space". But again this may take a major performance toll on your server, so may be advisable to run at night at a set time and I do not know what the re-precautions (if any) of doing this too often to a harddrive.
One other resource is this forum thread. It talks about wiping unused space etc. From that thread one resource stands out in particular: secure_deletion toolkit - sfill. The man page should be helpful.
If it's on a disk, you could just use: http://lambda-diode.com/software/wipe/
I have a fairly good feel for what MySQL replication can do. I'm wondering what other databases support replication, and how they compare to MySQL and others?
Some questions I would have are:
Is replication built in, or an add-on/plugin?
How does the replication work (high-level)? MySQL provides statement-based replication (and row-based replication in 5.1). I'm interested in how other databases compare. What gets shipped over the wire? How do changes get applied to the replicas?
Is it easy to check consistency between master and slaves?
How easy is it to get a failed replica back in sync with the master?
Performance? One thing I hate about MySQL replication is that it's single-threaded, and replicas often have trouble keeping up, since the master can be running many updates in parallel, but the replicas have to run them serially. Are there any gotchas like this in other databases?
Any other interesting features...
MySQL's replication is weak inasmuch as one needs to sacrifice other functionality to get full master/master support (due to the restriction on supported backends).
PostgreSQL's replication is weak inasmuch as only master/standby is supported built-in (using log shipping); more powerful solutions (such as Slony or Londiste) require add-on functionality. Archive log segments are shipped over the wire, which are the same records used to make sure that a standalone database is in working, consistent state on unclean startup. This is what I'm using presently, and we have resynchronization (and setup, and other functionality) fully automated. None of these approaches are fully synchronous. More complete support will be built in as of PostgreSQL 8.5. Log shipping does not allow databases to come out of synchronization, so there is no need for processes to test the synchronized status; bringing the two databases back into sync involves setting the backup flag on the master, rsyncing to the slave (with the database still runnning; this is safe), and unsetting the backup flag (and restarting the slave process) with the archive logs generated during the backup process available; my shop has this process (like all other administration tasks) automated. Performance is a nonissue, since the master has to replay the log segments internally anyhow in addition to doing other work; thus, the slaves will always be under less load than the master.
Oracle's RAC (which isn't properly replication, as there's only one storage backend -- but you have multiple frontends sharing the load, and can build redundancy into that shared storage backend itself, so it's worthy of mention here) is a multi-master approach far more comprehensive than other solutions, but is extremely expensive. Database contents aren't "shipped over the wire"; instead, they're stored to the shared backend, which all the systems involved can access. Because there is only one backend, the systems cannot come out of sync.
Continuent offers a third-party solution which does fully synchronous statement-level replication with support for all three of the above databases; however, the commercially supported version of their product isn't particularly cheap (though vastly less expensive. Last time I administered it, Continuent's solution required manual intervention for bringing a cluster back into sync.
I have some experience with MS-SQL 2005 (publisher) and SQLEXPRESS (subscribers) with overseas merge replication. Here are my comments:
1 - Is replication built in, or an add-on/plugin?
Built in
2 - How does the replication work
(high-level)?
Different ways to replicate, from snapshot (giving static data at the subscriber level) to transactional replication (each INSERT/DELETE/UPDATE instruction is executed on all servers). Merge replication replicate only final changes (successives UPDATES on the same record will be made at once during replication).
3 - Is it easy to check consistency between master and slaves?
Something I have never done ...
4 - How easy is it to get a failed replica back in sync with the master?
The basic resynch process is just a double-click one .... But if you have 4Go of data to reinitialize over a 64 Kb connection, it will be a long process unless you customize it.
5 - Performance?
Well ... You will of course have a bottleneck somewhere, being your connection performance, volume of data, or finally your server performance. In my configuration, users only write to subscribers, which all replicate with the main database = publisher. This server is then never sollicited by final users, and its CPU is strictly dedicated to data replication (to multiple servers) and backup. Subscribers are dedicated to clients and one replication (to publisher), which gives a very interesting result in terms of data availability for final users. Replications between publisher and subscribers can be launched together.
6 - Any other interesting features...
It is possible, with some anticipation, to keep on developping the database without even stopping the replication process....tables (in an indirect way), fields and rules can be added and replicated to your subscribers.
Configurations with a main publisher and multiple suscribers can be VERY cheap (when compared to some others...), as you can use the free SQLEXPRESS on the suscriber's side, even when running merge or transactional replications
Try Sybase SQL Anywhere
Just adding to the options with SQL Server (especially SQL 2008, which has Change Tracking features now). Something to consider is the Sync Framework from Microsoft. There's a few options there, from the basic hub-and-spoke architecture which is great if you have a single central server and sometimes-connected clients, right through to peer-to-peer sync which gives you the ability to do much more advanced syncing with multiple 'master' databases.
The reason you might want to consider this instead of traditional replication is that you have a lot more control from code, for example you can get events during the sync progress for Update/Update, Update/Delete, Delete/Update, Insert/Insert conflicts and decide how to resolve them based on business logic, and if needed store the loser of the conflict's data somewhere for manual or automatic processing. Have a look at this guide to help you decide what's possible with the different methods of replication and/or sync.
For the keen programmers the Sync Framework is open enough that you can have the clients connect via WCF to your WCF Service which can abstract any back-end data store (I hear some people are experimenting using Oracle as the back-end).
My team has just gone release with a large project that involves multiple SQL Express databases syncing sub-sets of data from a central SQL Server database via WAN and Internet (slow dial-up connection in some cases) with great success.
MS SQL 2005 Standard Edition and above have excellent replication capabilities and tools. Take a look at:
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
It's pretty capable. You can even use SQL Server Express as a readonly subscriber.
There are a lot of different things which databases CALL replication. Not all of them actually involve replication, and those which do work in vastly different ways. Some databases support several different types.
MySQL supports asynchronous replication, which is very good for some things. However, there are weaknesses. Statement-based replication is not the same as what most (any?) other databases do, and doesn't always result in the expected behaviour. Row-based replication is only supported by a non production-ready version (but is more consistent with how other databases do it).
Each database has its own take on replication, some involve other tools plugging in.
A bit off-topic but you might want to check Maatkit for tools to help with MySQL replication.
All the main commercial databases have decent replication - but some are more decent than others. IBM Informix Dynamic Server (version 11 and later) is particularly good. It actually has two systems - one for high availability (HDR - high-availability data replication) and the other for distributing data (ER - enterprise replication). And the the Mach 11 features (RSS - remote standalone secondary, and SDS - shared disk secondary) are excellent too, doubly so in 11.50 where you can write to either the primary or secondary of an HDR pair.
(Full disclosure: I work on Informix softare.)
I haven't tried it myself, but you might also want to look into OpenBaseSQL, which seems to have some simple to use replication built-in.
Another way to go is to run in a virtualized environment. I thought the data in this blog article was interesting
http://chucksblog.typepad.com/chucks_blog/2008/09/enterprise-apps.html
It's from an EMC executive, so obviously, it's not independent, but the experiment should be reproducible
Here's the data specific for Oracle
http://oraclestorageguy.typepad.com/oraclestorageguy/2008/09/to-rac-or-not-to-rac-reprise.html
Edit: If you run virtualized, then there are ways to make anything replicate
http://chucksblog.typepad.com/chucks_blog/2008/05/vmwares-srm-cha.html
Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.
We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync
GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).
I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489