I have a script that is part of my deployment process to push DB changes to the production server. If the script corrupts my data for some reason (a bad update), it is tough to recover.
One way to solve this is to shut down the application to users while updating, so if a problem occurs, just go back to the backup I made before deploying.
But I have heard of others who deploy and keep their site live... how would you go about doing this, and if you failed, how could you recover the data that came in since you took your backup before deploying?
This is a tricky problem in general, like with many things in database administration. There are basically three ways to approach this:
Avoid failure at all costs.
Lock everything down (and make the upgrade really fast).
It's OK to lose data.
If you have a complex system, isolate your components according to these or similar categories.
Have a staging system to test upgrades. The staging system is more or less a copy of the production system; it's separate from the test system. Another thing is to have an audit or logging system that you can refer to if you need to replay data.
The real problem is if you notice much later that your upgrade was faulty. Then you're quite screwed.
How big is your database? Can you afford to lose data that was updated while the customer were using it and before you had to go to the backup? Every plan for deployment involves some compromises somewhere, and you've got to decide which compromises are the least painful for what you want to do.
For simple websites running just pgsql, you can disconnect clients, and run the entire update in one big transaction. If any part fails, the whole thing rolls back and it's like you never did anything. Sadly this doesn't work exactly the same for other dbs, but with flashback or whatever oracle calls it you can get something similar.
For bigger, complex websites running on top of a replicated db server set, things get much more complex much more quickly. Where I've worked we've used Slony, and it doesn't take nicely to playing with others when you're deploying DDL changes, and you pretty much HAVE to take all the customers offline while deploying DDL. However, the downtime is measured in minutes for us, even with databases approaching 1TB in size.
Related
I'm working on a group project where we all have a mysql database working on a local machine. The table mainly has filenames and stats used for image processing. We all will run some processing, which updates the database locally with results.
I want to know what the best way is to update everyone else's database, once someone has changed theirs.
My idea is to perform a mysqldump after each processing run, and let that file be tracked by git (which we use religiously). I've written a bunch of python utils for the database, and it would be simple enough to read this dump into the database when we detect that the db is behind. I don't really want to do this though, less it clog up our git repo with unnecessary 10-50Mb files with every commit.
Does anyone know a better way to do this?
*I'll also note that we are Aerospace students. I have some DB experience, but it only comes out of need. We're busy and I'm not looking to become an IT networking guru. Just want to keep it hands off for them since they are DB noobs and get the glazed over look of fear whenever I tell them to do anything with the database. I made it hands off for them thus far.
You might want to consider following the Rails-style database migration concept, whereby as you are developing you provide roll-forward and roll-back SQL statements that work as patches, allowing you to roll your database to any particular revision state that is required.
Of course, this is typically meant for dealing with schema changes only (i.e. you don't worry about revisioning data that might be dynamically populated into tables.). For configuration tables or similar tables that are basically static in content, you can certainly add migrations as well.
A Google search for "rails migrations for python" turned up a number of results, including the following tool:
http://pypi.python.org/pypi/simple-db-migrate
I would suggest to create a DEV MySQL server on any shared hosting. (No DB experience is required).
Allow remote access to this server. (again, no experience is required, everything could be done through Control Panel)
And you and your group of developers will have access to the database at any time from any place and from any device. (As long as you have internet connection)
What processes do you put in place when collaborating in a small team on websites with databases?
We have no problems working on site files as they are under revision control, so any number of our developers can work from any location on this aspect of a website.
But, when database changes need to be made (either directly as part of the development or implicitly by making content changes in a CMS), obviously it is difficult for the different developers to then merge these database changes.
Our approaches thus far have been limited to the following:
Putting a content freeze on the production website and having all developers work on the same copy of the production database
Delegating tasks that will involve database changes to one developer and then asking other developers to import a copy of that database once changes have been made; in the meantime other developers work only on site files under revision control
Allowing developers to make changes to their own copy of the database for the sake of their own development, but then manually making these changes on all other copies of the database (e.g. providing other developers with an SQL import script pertaining to the database changes they have made)
I'd be interested to know if you have any better suggestions.
We work mainly with MySQL databases and at present do not keep track of revisions to these databases. The problems discussed above pertain mainly to Drupal and Wordpress sites where a good deal of the 'development' is carried out in conjunction with changes made to the database in the CMS.
You put all your database changes in SQL scripts. Put some kind of sequence number into the filename of each script so you know the order they must be run in. Then check in those scripts into your source control system. Now you have reproducible steps that you can apply to test and production databases.
While you could put all your DDL into the VC, this can get very messy very quickly if you try to manage lots and lots of ALTER statements.
Forcing all developers to use the same source database is not a very efficient approach either.
The solution I used was to maintain a file for each database entity specifying how to create the entity (primarily so the changes could be viewed using a diff utility), then manually creating ALTER statements by comparing the release version with the current version - yes, it is rather labour intensive but the only way I've found to solve the problem.
I had a plan to automate the generation of the ALTER statements - it should be relatively straightforward - indeed a quick google found this article and this one. Never got round to implementing one myself since the effort of doing so was much greater than the frequency of schema changes on the projects I was working on.
Where i work, every developer (actually, every development virtual machine) has its own database (or rather, its own schema on a shared Oracle instance). Our working process is based around complete rebuilds. We don't have any ability to modify an existing database - we only ever have the nuclear option of blowing away the whole schema and building from scratch.
We have a little 'drop everything' script, which uses queries on system tables to identify every object in the schema, constructs a pile of SQL to drop them, and runs it. Then we have a stack of DDL files full of CREATE TABLE statements, then we have a stack of XML files containing the initial data for the system, which are loaded by a loading tool. All of this is checked into source control. When a developer does an update from source control, if they see incoming database changes (DDL or data), they run the master build script, which runs them in order to create a fresh database from scratch.
The good thing is that this makes life simple. We never need to worry about diffs, deltas, ALTER TABLE, reversibility, etc, just straightforward DDL and data. We never have to worry about preserving the state of the database, or keeping it clean - you can get back to a clean state at the push of a button. Another important feature of this is that it makes it trivial to set up a new platform - and that means that when we add more development machines, or need to build an acceptance system or whatever, it's easy. I've seen projects fail because they couldn't build new instances from their muddled databases.
The main bad thing is that it takes some time - in our case, due to the particular depressing details of our system, a painfully long time, but i think a team that was really on top of its tools could do a complete rebuild like this in 10 minutes. Half an hour if you have a lot of data. Short enough to be able to do a few times during a working day without killing yourself.
The problem is what you do about data. There are two sides to this: data generated during development, and live data.
Data generated during development is actually pretty easy. People who don't work our way are presumably in the habit of creating that data directly in the database, and so see a problem in that it will be lost when rebuilding. The solution is simple: you don't create the data in the database, you create it in the loader scripts (XML in our case, but you could use SQL DML, or CSV with your database's import tool, or whatever). Think of the loader scripts as being source code, and the database as object code: the scripts are the definitive form, and are what you edit by hand; the database is what's made from them.
Live data is tougher. My company hasn't developed a single process which works in all cases - i don't know if we just haven't found the magic bullet yet, or if there isn't one. One of our projects is taking the approach that live is different to development, and that there are no complete rebuilds; rather, they have developed a set of practices for identifying the deltas when making a new release and applying them manually. They release every few weeks, so it's only a couple of days' work for a couple of people that often. Not a lot.
The project i'm on hasn't gone live yet, but it is replacing an existing live system, so we have a similar problem. Our approach is based on migration: rather than trying to use the existing database, we are migrating all the data from it into our system. We have written a rather sprawling tool to do this, which runs queries against the existing database (a copy of it, not the live version!), then writes the data out as loader scripts. These then feed into the build process just like any others. The migration is scripted, and runs every night as part of our daily build. In this case, the effort needed to write this tool was necessary anyway, because our database is very different in structure to the old one; the ability to do repeatable migrations at the push of a button came for free.
When we go live, one of our options will be to adapt this process to migrate from old versions of our database to new ones. We'll have to write completely new queries, but they should be very easy, because the source database is our own, and the mapping from it to the loader scripts is, as you would imagine, straightforward, even as the new version of the system drifts away from the live version. This would let us keep working in the complete rebuild paradigm - we still wouldn't have to worry about ALTER TABLE or keeping our databases clean, even when we're doing maintenance. I have no idea what the operations team will think of this idea, though!
You can use the replication module of the database engine, if it has one.
One server will be the master, changes are to be made on it.
Developers copies will be slaves.
Any changes on the master will be duplicated on the slaves.
It's a one way replication.
Can be a bit tricky to put into place as any changes on the slaves will be erased.
Also it means that the developers should have two copy of the database.
One will be the slave and another the "development" database.
There are also tools for cross database replications.
So any copies can be the master.
Both solutions can lead to disasters (replication errors).
The only solution is see fit is to have only one database for all developers and save it several times a day on a rotating history.
Won't save you from conflicts but you will be able to restore the previous version if it happens (and it always do...).
Where I work we are using Dotnetnuke and this poses the same problems. i.e. once released the production site has data going into the database as well as files being added to the file system by some modules and in the DNN file system.
We are versioning the site file system with svn which for the most part works ok. However, the database is a different matter. The best method we have come across so far is to use RedGate tools to synchronise the staging database with the production database. RedGate tools are very good and well worth the money.
Basically we all develop locally with a local copy of the database and site. If the changes are major we branch. Then we commit locally and do a RedGate merge to put our DB changes on the the shared dev server.
We use a shared dev server so others can do the testing. Once complete we then update the site on staging with svn and then merge the database changes from the development server to the staging server.
Then to go live we do the same from staging to prod.
This method works but is prone to error and is very time consuming when small changes need to be made. The prod DB is always backed up so we can roll back easily if a delivery goes wrong.
One major headache we have is that Dotnetnuke uses identity cols in many tables and if you have data going into tables on development and production such as tabs and permissions and module instances you have a nightmare syncing them. Ideally you want to find or build a cms that uses GUI's or something else in the database so you can easily sync tables that are in use concurrently.
We'd love to find a better method! As we have a lot of trouble with branching and merging when projects are concurrent.
Gus
I've been doing a lot of research, reading on replication, etc but just not sure as to what mysql solution would work.
This is what I'm looking at:
when my mysql fails for some reason or there are certain queries that are taking really long to execute and locking some tables, I want the other insert/update/select queries to still function at normal speed without having to wait for locks to be released or for the main database to be back up. I'm thinking there should be a second mysql server for this to happen, but is what I mentioned possible even if there is and would it involve a lot of change in my existing programming logic?
when my database is being backed up, I would still like my site to function normally, all inserts/selects/updates should function as normal.
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
So what do I need to do to get all this done and also would it require changing plenty of existing coding to suit the new set up? [My site has a lot of reads and writes]
There's no easy way. You're asking for a highly-available MySQL-based setup, and that requires a lot of work at the server and client ends.
Some issues, for example:
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
If you're altering the table, you can't trivially create a copy to work from during the update. What about the changes that are made to your copy while the first update is taking place?
Have a search for "High Availability MySQL". It's mostly a solved problem, but the solution depends heavily on your exact requirements. You cannot just ask for "I want my SQL server to run at full speed always forever no matter what I throw at it".
Not a MySQL specific answer, but a general one. Have a read only copy of your DB for site to render, which will be synced to the master DB regularly. This way, you can keep your site working even if the master DB is under load/lock due to insert/delete. For efficiency, keep this copy as denormalized as you can.
I have several databases hosted on a shared server, and a local testing server which I use for development.
I would like to keep both set of databases somewhat synchronized (more or less daily).
So far, my ideas to solve the problem seem very clumsy. Anyway, for reference, here is what I have considered so far:
Make a database dump from online databases, trash local databases, and recreate the databases from the dump. It's a lot of work and requires a lot of download time (which guarantees I won't do it as much as I would like it to be done)
Write a small web service to access the new data, and write a small application locally to communicate with said web service, download the newest data, and update the local databases.
Both solutions sound like a lot of work for a problem that is probably already solved a zillion times over. Or maybe it's even an existing feature which I completely overlooked.
Is there an easy way to keep databases more or less in synch? Ideally something that I can set up once, schedule and forget about.
I am using MySQL 5 (MyISAM) databases on both servers.
=============
Edit: I had a look at replication, but it seems that I can't go that route because the shared hosting does not give me enough control on the server itself (I got most permissions on my databases, but not on the MySQL server itself)
I only need to keep the data synchronized, nothing else. Is there any other solution that doesn't require full control on the server?
Edit 2:
Sorry, I forgot to mention I am running on a LAMP stack on the shared server, so Windows-only solutions won't work.
I am surprised to see that there is no obvious off-the-shelves solution for this problem.
Have you considered replication? It's not to be trifled with but may be what you want. See here for more details... http://dev.mysql.com/doc/refman/5.0/en/replication-configuration.html
Take a look at Microsoft Sync Framework - you will need to code in .net, but it can resolve your issues.
http://msdn.microsoft.com/en-in/sync/default(en-us).aspx
Here is a sample for SQL server, but it can be adapted to mysql as well using ado.net provider for Mysql.
http://code.msdn.microsoft.com/sync/Release/ProjectReleases.aspx?ReleaseId=4835
You will need the additional tables for change tracking and anchors (keeping track of last synchronization) for this to work, in your mysql database, but you wont need full control as long as you can access the db.
Replication would have simpler :), but this might just work in your case.
I'm working with PHP & mySQL. I've finally got my head around source control and am quite happy with the whole development (testing) v production v repository thing for the PHP part.
My new quandary is what to do with the database. Do I create one for the test environment and one for the production environment? I currently have just the one which both environments use, leaving my test data sitting there. I kind of feel that I should have two, but I'm nervous in terms of making sure that my production database looks and feels exactly the same as my test one.
Any thoughts on which way to go? And, if you think the latter, what the best way is to keep the two databases the same (apart from the data, of course...)?
Each environment should have a separate database. Script all of the database objects (tables, views, procedures, etc) and store the scripts in source control. The scripts are applied first to the development database, then promoted to test (QA, UAT, etc), then production. By applying the same scripts to each database, they should all be the same in the end.
If you have data that needs to be loaded (code tables, lookup values, etc), script that data load as part of the database creation process.
By scripting everything and keeping it in source control, a database structure can be recreated at any time for any given build level.
You should definitely have two. As far as keeping them in sync, you should always create DDL for creating your database objects. Treat these scripts as you do you PHP code - keep them in version control. Anytime you have to modify the test database, make a script to do so, and check it in. Then you can propogate those changes to the production system once you are ready.
As a minimum one database for each development workstation and one for production. Besides that you should have one for the test environment unless you are only one developer and have a similar setup as the production environment.
See also
How do you version your database schema?
It's a common question and has been asked and answered many times.
Thomas Owens: Replication is not usable for versioning schemas - it is for duplicating data. You never want to replicate from dev to production or vice versa.
Once I've deployed my database, any changes made to my development database(s), are done in an SQL script (not a tool), and the script is saved, and numbered.
deploy.001.description.sql
deploy.002.description.sql
deploy.003.description.sql
... etc..
Then I run each of those scripts in order when I deploy.
Then I archive them into a directory called something like
\deploy.YYMMDD\
And start all over.
If I make a mistake, I never go back to the previous deploy script, I'll create a new script and put my fix in there.
Good luck
One thing I've been working with is creating a VM with the database installed. you can save the VM as a playfile, including its data. What you can do then is take a snapshot of the playfile, and start up as many different VM's as you want. They can all be identical, or you can modify one or another. Here's the good thing: assuming you have a dev version of the database that you want to go out, you can simply start that VM on your production server instead of the current server.
It's another problem altogether if you have production data that is not on your dev machines. In that case though, one thing you can do is set up a tracking VM. Run replication from your main DB to the tracking VM. When you get to a point where you need to run some alters on the production database, first stop the slave and save a snapshot.
Start an instance of that snapshot, take it out of slave mode entirely, apply your changes, and point your QA box at that database. If it works as intended, you can run the patches against your main production database. If not, bring up the snapshot, and get it replicating off the master again until you are ready to repeat the update test.
I was having the same dilemmas. I got stuck thinking that there was a clear dichotomy between production db versus development db. I.e they were two sides of a coin and never the twain shall meet.
A lot of problems disappeared when I stopped making my application 'think' in terms of "Either production db OR development db". Instead my application uses a local db.
When its running on my virtual (dev) machine, that local db happens to be a dev db. My application doesn't really 'know' that though.
So, for the main part, the problem disappears.
But sometimes I want to run tests using live data, or move data from the code into the live production db and see the results quickly.
This is when I added the concept of a live-read-only db connection. The application treats this differently. Its a bit like how your application might treat a web service like Google Apps. Its 'some external resource that your app uses'.
By default my app uses the local db and in some very special conditions (in the test suite) it also uses the live-readonly db. (Because its a read-only connection I don't fear making a mess of the live data during tests).
So rather than asking the question "dev db OR production db?", my app asks "local db OR live-read-only db".
Obviously my situation could be different to yours, but I found this 'breakthrough in understanding' to be most helpful for me.