Sofware Engineering: Combining many modular programs in Unix [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Upfront my question is: Are there any standard/common methods for implementing a software package that maintains and updates a MySQL database?
I'm an undergraduate research assistant and I've been tasked with creating a cron job that updates one of our university's in house bioinformatics databases.
Instead of building on monolithic binary that does every aspect of the work, I've divided the problem into subtasks and written a few python/c++ modules to handle the different tasks, as listed in the pipeline below:
Query the remote database for a list of updated files and return the result for the given time interval (monthly updated files / weekly / daily);
Module implemented in python. URL of updated file(s) output to stdout
Read in relative URL's of updated files and download to local directory
Module implemented in python
Unzip each archive of new files
Implemented as bash script
Parse files into CSV format
Module implemented in C++
Run MySQL query to insert CSV files into database
Obviously just a bash script
I'm not sure how to go about combining these modules into one package that can be easily moved to another machine, say if our current servers run out of space and the DB needs to be copied to another file-system (It's already happened once before).
My first thought is to create a bash script that pipes all of these modules together given that they all operate with stdin/stdout anyway, but this seems like an odd way of doing things.
Alternatively, I could write my C++ code as a python extension, package all of these scripts together and just write one python file that does this work for me.
Should I be using a package manager so that my code is easily installed on different machines? Does a simple zip archive of the entire updater with an included makefile suffice?
I'm extremely new to database management, and I don't have a ton of experience with distributing software, but I want to do a good job with this project. Thanks for the help in advance.

Inter-process communication (IPC) is a standard mechanism of composing many disparate programs into a complex application. IPC includes piping the output of one program to the input of another, using sockets (e.g. issuing HTTP requests from one application to another or sending data via TCP streams), using named FIFOs, and other mechanisms. In any event, using a Bash script to combine these disparate elements (or similarly, writing a Python script that accomplishes the same thing with the subprocess module) would be completely reasonable. The only thing that I would point out with this approach is that, since you are reading/writing to/from a database, you really do need to consider security/authentication with this approach (e.g. can anyone who can invoke this application write to the database? How do you verify that the caller has the appropriate access).
Regarding distribution, I would say that the most important thing is to ensure that you can find -- at any given version and prior versions -- a snapshot of all components and their dependencies at the versions that they were at the time of release. You should set up a code repository (e.g. on GitHub or some other service that you trust), and create a release branch at the time of each release that contains a snapshot of all the tools at the time of this release. That way if, God forbid, the one and only machine in which you have installed the tools fails, you will still be able to instantly grab a copy of the code and install it on a new machine (and if something breaks, you will be able to go back to an earlier release and binary search until you find out where the breakage happened)
In terms of installation, it really depends on how many steps are involved. If it is as simple as unzipping a folder and ensuring that the folder is in your PATH environment variable, then it is probably not worth the hassle to create any special distribution mechanism (unless you are able to do so easily). What I would recommend, though, is clearly documenting the installation steps in the INSTALL or README documentation in the repository (so that the instructions are snapshotted) as well as on the website for your repository. If the number of steps is small and easy to accomplish, then I wouldn't bother with much more. If there are many steps involved (like downloading and installing a large number of dependencies), then I would recommend writing a script that can automate the installation process. That being said, it's really about what the University wants in this case.

Related

How to reconcile WordPress data between staging and production [duplicate]

I've had a hard time trying to find good examples of how to manage database schemas and data between development, test, and production servers.
Here's our setup. Each developer has a virtual machine running our app and the MySQL database. It is their personal sandbox to do whatever they want. Currently, developers will make a change to the SQL schema and do a dump of the database to a text file that they commit into SVN.
We're wanting to deploy a continuous integration development server that will always be running the latest committed code. If we do that now, it will reload the database from SVN for each build.
We have a test (virtual) server that runs "release candidates." Deploying to the test server is currently a very manual process, and usually involves me loading the latest SQL from SVN and tweaking it. Also, the data on the test server is inconsistent. You end up with whatever test data the last developer to commit had on his sandbox server.
Where everything breaks down is the deployment to production. Since we can't overwrite the live data with test data, this involves manually re-creating all the schema changes. If there were a large number of schema changes or conversion scripts to manipulate the data, this can get really hairy.
If the problem was just the schema, It'd be an easier problem, but there is "base" data in the database that is updated during development as well, such as meta-data in security and permissions tables.
This is the biggest barrier I see in moving toward continuous integration and one-step-builds. How do you solve it?
A follow-up question: how do you track database versions so you know which scripts to run to upgrade a given database instance? Is a version table like Lance mentions below the standard procedure?
Thanks for the reference to Tarantino. I'm not in a .NET environment, but I found their DataBaseChangeMangement wiki page to be very helpful. Especially this Powerpoint Presentation (.ppt)
I'm going to write a Python script that checks the names of *.sql scripts in a given directory against a table in the database and runs the ones that aren't there in order based on a integer that forms the first part of the filename. If it is a pretty simple solution, as I suspect it will be, then I'll post it here.
I've got a working script for this. It handles initializing the DB if it doesn't exist and running upgrade scripts as necessary. There are also switches for wiping an existing database and importing test data from a file. It's about 200 lines, so I won't post it (though I might put it on pastebin if there's interest).
There are a couple of good options. I wouldn't use the "restore a backup" strategy.
Script all your schema changes, and have your CI server run those scripts on the database. Have a version table to keep track of the current database version, and only execute the scripts if they are for a newer version.
Use a migration solution. These solutions vary by language, but for .NET I use Migrator.NET. This allows you to version your database and move up and down between versions. Your schema is specified in C# code.
Your developers need to write change scripts (schema and data change) for each bug/feature they work on, not just simply dump the entire database into source control. These scripts will upgrade the current production database to the new version in development.
Your build process can restore a copy of the production database into an appropriate environment and run all the scripts from source control on it, which will update the database to the current version. We do this on a daily basis to make sure all the scripts run correctly.
Have a look at how Ruby on Rails does this.
First there are so called migration files, that basically transform database schema and data from version N to version N+1 (or in case of downgrading from version N+1 to N). Database has table which tells current version.
Test databases are always wiped clean before unit-tests and populated with fixed data from files.
The book Refactoring Databases: Evolutionary Database Design might give you some ideas on how to manage the database. A short version is readable also at http://martinfowler.com/articles/evodb.html
In one PHP+MySQL project I've had the database revision number stored in the database, and when the program connects to the database, it will first check the revision. If the program requires a different revision, it will open a page for upgrading the database. Each upgrade is specified in PHP code, which will change the database schema and migrate all existing data.
You could also look at using a tool like SQL Compare to script the difference between various versions of a database, allowing you to quickly migrate between versions
Name your databases as follows - dev_<<db>> , tst_<<db>> , stg_<<db>> , prd_<<db>> (Obviously you never should hardcode db names
Thus you would be able to deploy even the different type of db's on same physical server ( I do not recommend that , but you may have to ... if resources are tight )
Ensure you would be able to move data between those automatically
Separate the db creation scripts from the population = It should be always possible to recreate the db from scratch and populate it ( from the old db version or external data source
do not use hardcode connection strings in the code ( even not in the config files ) - use in the config files connection string templates , which you do populate dynamically , each reconfiguration of the application_layer which does need recompile is BAD
do use database versioning and db objects versioning - if you can afford it use ready products , if not develop something on your own
track each DDL change and save it into some history table ( example here )
DAILY backups ! Test how fast you would be able to restore something lost from a backup (use automathic restore scripts
even your DEV database and the PROD have exactly the same creation script you will have problems with the data, so allow developers to create the exact copy of prod and play with it ( I know I will receive minuses for this one , but change in the mindset and the business process will cost you much less when shit hits the fan - so force the coders to subscript legally whatever it makes , but ensure this one
This is something that I'm constantly unsatisfied with - our solution to this problem that is. For several years we maintained a separate change script for each release. This script would contain the deltas from the last production release. With each release of the application, the version number would increment, giving something like the following:
dbChanges_1.sql
dbChanges_2.sql
...
dbChanges_n.sql
This worked well enough until we started maintaining two lines of development: Trunk/Mainline for new development, and a maintenance branch for bug fixes, short term enhancements, etc. Inevitably, the need arose to make changes to the schema in the branch. At this point, we already had dbChanges_n+1.sql in the Trunk, so we ended up going with a scheme like the following:
dbChanges_n.1.sql
dbChanges_n.2.sql
...
dbChanges_n.3.sql
Again, this worked well enough, until we one day we looked up and saw 42 delta scripts in the mainline and 10 in the branch. ARGH!
These days we simply maintain one delta script and let SVN version it - i.e. we overwrite the script with each release. And we shy away from making schema changes in branches.
So, I'm not satisfied with this either. I really like the concept of migrations from Rails. I've become quite fascinated with LiquiBase. It supports the concept of incremental database refactorings. It's worth a look and I'll be looking at it in detail soon. Anybody have experience with it? I'd be very curious to hear about your results.
We have a very similar setup to the OP.
Developers develop in VM's with private DB's.
[Developers will soon be committing into private branches]
Testing is run on different machines ( actually in in VM's hosted on a server)
[Will soon be run by Hudson CI server]
Test by loading the reference dump into the db.
Apply the developers schema patches
then apply the developers data patches
Then run unit and system tests.
Production is deployed to customers as installers.
What we do:
We take a schema dump of our sandbox DB.
Then a sql data dump.
We diff that to the previous baseline.
that pair of deltas is to upgrade n-1 to n.
we configure the dumps and deltas.
So to install version N CLEAN we run the dump into an empty db.
To patch, apply the intervening patches.
( Juha mentioned Rail's idea of having a table recording the current DB version is a good one and should make installing updates less fraught. )
Deltas and dumps have to be reviewed before beta test.
I can't see any way around this as I've seen developers insert test accounts into the DB for themselves.
I'm afraid I'm in agreement with other posters. Developers need to script their changes.
In many cases a simple ALTER TABLE won't work, you need to modify existing data too - developers need to thing about what migrations are required and make sure they're scripted correctly (of course you need to test this carefully at some point in the release cycle).
Moreover, if you have any sense, you'll get your developers to script rollbacks for their changes as well so they can be reverted if need be. This should be tested as well, to ensure that their rollback not only executes without error, but leaves the DB in the same state as it was in previously (this is not always possible or desirable, but is a good rule most of the time).
How you hook that into a CI server, I don't know. Perhaps your CI server needs to have a known build snapshot on, which it reverts to each night and then applies all the changes since then. That's probably best, otherwise a broken migration script will break not just that night's build, but all subsequent ones.
Check out the dbdeploy, there are Java and .net tools already available, you could follow their standards for the SQL file layouts and schema version table and write your python version.
We are using command-line mysql-diff: it outputs a difference between two database schemas (from live DB or script) as ALTER script. mysql-diff is executed at application start, and if schema changed, it reports to developer. So developers do not need to write ALTERs manually, schema updates happen semi-automatically.
If you are in the .NET environment then the solution is Tarantino (archived). It handles all of this (including which sql scripts to install) in a NANT build.
I've written a tool which (by hooking into Open DBDiff) compares database schemas, and will suggest migration scripts to you. If you make a change that deletes or modifies data, it will throw an error, but provide a suggestion for the script (e.g. when a column in missing in the new schema, it will check if the column has been renamed and create xx - generated script.sql.suggestion containing a rename statement).
http://code.google.com/p/migrationscriptgenerator/ SQL Server only I'm afraid :( It's also pretty alpha, but it is VERY low friction (particularly if you combine it with Tarantino or http://code.google.com/p/simplescriptrunner/)
The way I use it is to have a SQL scripts project in your .sln. You also have a db_next database locally which you make your changes to (using Management Studio or NHibernate Schema Export or LinqToSql CreateDatabase or something). Then you execute migrationscriptgenerator with the _dev and _next DBs, which creates. the SQL update scripts for migrating across.
For oracle database we use oracle-ddl2svn tools.
This tool automated next process
for every db scheme get scheme ddls
put it under version contol
changes between instances resolved manually

MySQL & GIT - Is it possible to use GIT versioning to merge MySQL databases? [duplicate]

I am a WordPress Designer/Developer, who is getting more and more heavily involved with using version control, notably Git, though I do use SVN for some projects. I am currently using Beanstalk for my remote repo.
Adding all of the WordPress files to my repo is no problem, if I wanted to I know I could .gitignore the wp-config file, but since I'm the only developer, currently, and these projects are closed source, it really makes little sense.
WordPress relies heavily on the database, as any CMS does, to keep textual content, and many settings depending on the specific plugin/theme configuration I'm using. I'm wondering what the best way of using version control on the database would be, if it's even possible. I guess I could do a SQL dump, though my MySQL server is running on Windows (read as: I don't know how to do it), and then add the SQL dump to my repository. But when I push something live, that poses huge security threats.
Is there an accepted practice of doing this?
You can backup your database within a git repository. Of course, if you place the data into git in a binary form, you will lose all of git's ability to efficiently store the data using diffs (changes). So the number one best practice is this: store the data in a text serialised format.
mysqldump is a suitable program to help you do this. It isn't perfect though. If anything disturbs the serialisation order of items (eg. as a result of creating new tables, etc.) then artificial breaks will enter into the diff. That will decrease the efficiency of storage. You could write a custom serialiser to serialise changes only -- but then you are doing the hard work that git is already good at. Just use the sql dump.
That being said, what you are wanting to do isn't what devs normally mean when they talk about putting the database in git. For instance, if you read the link posted by #eggyal (link to codinghorror) you will see that what is actually placed in git are the scripts needed to generate the initial database. There may be additional scripts, like those to populate the database data with a clean state, or to populate it with testing data. All such sql scripts are text files, and pretty much the same format as the sql dump you would get from mysqldump. So there's no reason you can't do it that way with your day-to-day data as well.
There are not many software available to version control databases like MySQL and MongoDB.
But one is under development and the beta version is about to be launched soon. Check out Klonio - Version Control for databases
The article How to Sync A Local & Remote WordPress Blog Using Version Control gives advice on how to automate sync between two instances (development, production) of a WordPress blog using Mercurial. Is mentions that for this scenario, Git and Mercurial are very similar.
Step 4 (Synchronizing The Databases) is of interest here.
The database content will be exported to a file that is tracked by the revision control. Each time we pull changes, the database content will be replaced by this file, making our database up-to-date.
Then, it elaborates on conflicts and the scripting part of the job.
There is a version control tutorial in Mercurial out there, if you're not familiar with it.
If you are only interested in schema changes under version control, there is a nice stuff SqlRog. It extracts schema into the project files that can be put under the git.
Be aware that Wordpress stores all news feed content in the database, so even if you don't make any changes, there will be a lot of changing content.

How to deploy ms access form in mulituser network server? Cant update because file in use

I save the ms access file form in the network where is accessible by everyone. Everyone has a link on their desktop to the file in the network. Every time I change the file I have to ensure every one has the ms access file closed.
I really don't want to go looking everywhere to check if the app is closed. I've also set the file to read only, but no luck...
The basic concept for MOST software in the market places is you have two parts to application:
Word documents (data) + the word program executable
Excel document (data) + the word program executable.
So for 20 years in your company, how do you deploy software?
Answer:
You install the program on EACH computer.
So you are talking about a SOFTWARE program that you developed. Just because you use c++, vb.net or Access to CREATE SOFTWARE, then you don't break this rule. There is a difference between a data file a document and COMPUTER PROGRAM.
A COMPUTER program has code, forms and user interface. So if your computer support people don't know the difference between a document and a computer program then this would be the source of your problems.
As a result, you don't allow multiple people into the one program. I mean, if one person in your building has trouble with word does everyone go home?
And while you work on the NEXT great version of your software, users should not care.
So, this means:
Like most of your software, you deploy it to each user's workstation. This will allow you to work on the next great version of your software.
When you are done your changes and added new code and new features you then:
Compile the program into an executable (in Access, this is an mde, or now an accDE. You then deploy this to each users workstation.
The above approach thus allows you to work on the NEXT GREAT version of your software. You can add to your software some update code, or even just adopt a logon script that copies down the next great version of this software to each user's desktop.
The result is:
You don't care if users are in their current version of their software, since you are working on that copy which is not locked.
So, you should be distribution a compiled version of your software, and like most of your software you REALLY WANT to distribute that software to each user's workstation.
Last but not least:
Programs like Word, or Excel have what is called re-entrant code. So if you are working in a terminal server environment, then those office programs tolerate multiple users on that server, but your program created with Access DOES NOT. So even in this case, each logged on user is to receive their OWN SEPARATE copy of the COMPILED program you created.
So how c++ works, vb.net, VB6 or in this case Access works is the SAME for most programs:
You distribute that program to EACH user's desktop, and thus the issue of you working on the next great version is a non-issue. You simply have to setup and adopt a means to check for a new version. That means you distribute a program update, not a single form update. Even in the case of changing come code, or modify a report, that will cause you to like MOST software to have to issue a new version.
So, if you issue a new version of your software you not have the problem you outline. Those users don't have to get out of their current version since you are working on a copy of the program.
You thus need to adopt some system and code that will allow you to roll out that update to each user.
I explain in details not only should you run a split database with the data and tables separate (as you have), but ALSO why you distribute a compiled edition and ALSO why EACH desktop should receive a copy of this new program here:
http://www.kallal.ca/Articles/split/index.htm
So the ONLY basic concept you need to grasp here is that in the computer and software industry we have a concept of a computer program. Once you are able to grasp that you are creating a computer program, then you can NOW grasp the concept that such programs AFTER you create such a program or AFTER you modify or create a new version of that program, you THEN distribute that NEW program to your user base. This is quite much how all software works, and Access is no different in this regards.
You don't try to include one form into the production software, but add the new form, modify code and then COMPILE the software into an executable. In fact, using a compiled Access application is strongly recommend since un-handled errors NEVER re-set global variables and as a result your software becomes far more reliable and that even includes code without error handling since local and global vars are never re-set.
One solution would be to maintain a copy of all forms, reports, modules, queries, etc (front-end code) in a separate database from the tables. You can then link the tables to a 2nd database (back-end) in a shared network Access DB. The front-end database would be deployed to each workstation, while the back-end database is shared. Obviously, this introduces the need to update each workstation, but it means that when you need to roll out new front-end logic, you don't have to go around kicking everybody out at once.
Otherwise, sadly, you need to acquire an exclusive lock on the database, and the only way to do that, is to be the only one with it open.
With the help of a script, you could automate pulling the latest "client". You could just ask all users to exit the app and double-click a batch file which copies the front-end from some network location.

Using version control (Git) on a MySQL database

I am a WordPress Designer/Developer, who is getting more and more heavily involved with using version control, notably Git, though I do use SVN for some projects. I am currently using Beanstalk for my remote repo.
Adding all of the WordPress files to my repo is no problem, if I wanted to I know I could .gitignore the wp-config file, but since I'm the only developer, currently, and these projects are closed source, it really makes little sense.
WordPress relies heavily on the database, as any CMS does, to keep textual content, and many settings depending on the specific plugin/theme configuration I'm using. I'm wondering what the best way of using version control on the database would be, if it's even possible. I guess I could do a SQL dump, though my MySQL server is running on Windows (read as: I don't know how to do it), and then add the SQL dump to my repository. But when I push something live, that poses huge security threats.
Is there an accepted practice of doing this?
You can backup your database within a git repository. Of course, if you place the data into git in a binary form, you will lose all of git's ability to efficiently store the data using diffs (changes). So the number one best practice is this: store the data in a text serialised format.
mysqldump is a suitable program to help you do this. It isn't perfect though. If anything disturbs the serialisation order of items (eg. as a result of creating new tables, etc.) then artificial breaks will enter into the diff. That will decrease the efficiency of storage. You could write a custom serialiser to serialise changes only -- but then you are doing the hard work that git is already good at. Just use the sql dump.
That being said, what you are wanting to do isn't what devs normally mean when they talk about putting the database in git. For instance, if you read the link posted by #eggyal (link to codinghorror) you will see that what is actually placed in git are the scripts needed to generate the initial database. There may be additional scripts, like those to populate the database data with a clean state, or to populate it with testing data. All such sql scripts are text files, and pretty much the same format as the sql dump you would get from mysqldump. So there's no reason you can't do it that way with your day-to-day data as well.
There are not many software available to version control databases like MySQL and MongoDB.
But one is under development and the beta version is about to be launched soon. Check out Klonio - Version Control for databases
The article How to Sync A Local & Remote WordPress Blog Using Version Control gives advice on how to automate sync between two instances (development, production) of a WordPress blog using Mercurial. Is mentions that for this scenario, Git and Mercurial are very similar.
Step 4 (Synchronizing The Databases) is of interest here.
The database content will be exported to a file that is tracked by the revision control. Each time we pull changes, the database content will be replaced by this file, making our database up-to-date.
Then, it elaborates on conflicts and the scripting part of the job.
There is a version control tutorial in Mercurial out there, if you're not familiar with it.
If you are only interested in schema changes under version control, there is a nice stuff SqlRog. It extracts schema into the project files that can be put under the git.
Be aware that Wordpress stores all news feed content in the database, so even if you don't make any changes, there will be a lot of changing content.

How do you manage databases in development, test, and production?

I've had a hard time trying to find good examples of how to manage database schemas and data between development, test, and production servers.
Here's our setup. Each developer has a virtual machine running our app and the MySQL database. It is their personal sandbox to do whatever they want. Currently, developers will make a change to the SQL schema and do a dump of the database to a text file that they commit into SVN.
We're wanting to deploy a continuous integration development server that will always be running the latest committed code. If we do that now, it will reload the database from SVN for each build.
We have a test (virtual) server that runs "release candidates." Deploying to the test server is currently a very manual process, and usually involves me loading the latest SQL from SVN and tweaking it. Also, the data on the test server is inconsistent. You end up with whatever test data the last developer to commit had on his sandbox server.
Where everything breaks down is the deployment to production. Since we can't overwrite the live data with test data, this involves manually re-creating all the schema changes. If there were a large number of schema changes or conversion scripts to manipulate the data, this can get really hairy.
If the problem was just the schema, It'd be an easier problem, but there is "base" data in the database that is updated during development as well, such as meta-data in security and permissions tables.
This is the biggest barrier I see in moving toward continuous integration and one-step-builds. How do you solve it?
A follow-up question: how do you track database versions so you know which scripts to run to upgrade a given database instance? Is a version table like Lance mentions below the standard procedure?
Thanks for the reference to Tarantino. I'm not in a .NET environment, but I found their DataBaseChangeMangement wiki page to be very helpful. Especially this Powerpoint Presentation (.ppt)
I'm going to write a Python script that checks the names of *.sql scripts in a given directory against a table in the database and runs the ones that aren't there in order based on a integer that forms the first part of the filename. If it is a pretty simple solution, as I suspect it will be, then I'll post it here.
I've got a working script for this. It handles initializing the DB if it doesn't exist and running upgrade scripts as necessary. There are also switches for wiping an existing database and importing test data from a file. It's about 200 lines, so I won't post it (though I might put it on pastebin if there's interest).
There are a couple of good options. I wouldn't use the "restore a backup" strategy.
Script all your schema changes, and have your CI server run those scripts on the database. Have a version table to keep track of the current database version, and only execute the scripts if they are for a newer version.
Use a migration solution. These solutions vary by language, but for .NET I use Migrator.NET. This allows you to version your database and move up and down between versions. Your schema is specified in C# code.
Your developers need to write change scripts (schema and data change) for each bug/feature they work on, not just simply dump the entire database into source control. These scripts will upgrade the current production database to the new version in development.
Your build process can restore a copy of the production database into an appropriate environment and run all the scripts from source control on it, which will update the database to the current version. We do this on a daily basis to make sure all the scripts run correctly.
Have a look at how Ruby on Rails does this.
First there are so called migration files, that basically transform database schema and data from version N to version N+1 (or in case of downgrading from version N+1 to N). Database has table which tells current version.
Test databases are always wiped clean before unit-tests and populated with fixed data from files.
The book Refactoring Databases: Evolutionary Database Design might give you some ideas on how to manage the database. A short version is readable also at http://martinfowler.com/articles/evodb.html
In one PHP+MySQL project I've had the database revision number stored in the database, and when the program connects to the database, it will first check the revision. If the program requires a different revision, it will open a page for upgrading the database. Each upgrade is specified in PHP code, which will change the database schema and migrate all existing data.
You could also look at using a tool like SQL Compare to script the difference between various versions of a database, allowing you to quickly migrate between versions
Name your databases as follows - dev_<<db>> , tst_<<db>> , stg_<<db>> , prd_<<db>> (Obviously you never should hardcode db names
Thus you would be able to deploy even the different type of db's on same physical server ( I do not recommend that , but you may have to ... if resources are tight )
Ensure you would be able to move data between those automatically
Separate the db creation scripts from the population = It should be always possible to recreate the db from scratch and populate it ( from the old db version or external data source
do not use hardcode connection strings in the code ( even not in the config files ) - use in the config files connection string templates , which you do populate dynamically , each reconfiguration of the application_layer which does need recompile is BAD
do use database versioning and db objects versioning - if you can afford it use ready products , if not develop something on your own
track each DDL change and save it into some history table ( example here )
DAILY backups ! Test how fast you would be able to restore something lost from a backup (use automathic restore scripts
even your DEV database and the PROD have exactly the same creation script you will have problems with the data, so allow developers to create the exact copy of prod and play with it ( I know I will receive minuses for this one , but change in the mindset and the business process will cost you much less when shit hits the fan - so force the coders to subscript legally whatever it makes , but ensure this one
This is something that I'm constantly unsatisfied with - our solution to this problem that is. For several years we maintained a separate change script for each release. This script would contain the deltas from the last production release. With each release of the application, the version number would increment, giving something like the following:
dbChanges_1.sql
dbChanges_2.sql
...
dbChanges_n.sql
This worked well enough until we started maintaining two lines of development: Trunk/Mainline for new development, and a maintenance branch for bug fixes, short term enhancements, etc. Inevitably, the need arose to make changes to the schema in the branch. At this point, we already had dbChanges_n+1.sql in the Trunk, so we ended up going with a scheme like the following:
dbChanges_n.1.sql
dbChanges_n.2.sql
...
dbChanges_n.3.sql
Again, this worked well enough, until we one day we looked up and saw 42 delta scripts in the mainline and 10 in the branch. ARGH!
These days we simply maintain one delta script and let SVN version it - i.e. we overwrite the script with each release. And we shy away from making schema changes in branches.
So, I'm not satisfied with this either. I really like the concept of migrations from Rails. I've become quite fascinated with LiquiBase. It supports the concept of incremental database refactorings. It's worth a look and I'll be looking at it in detail soon. Anybody have experience with it? I'd be very curious to hear about your results.
We have a very similar setup to the OP.
Developers develop in VM's with private DB's.
[Developers will soon be committing into private branches]
Testing is run on different machines ( actually in in VM's hosted on a server)
[Will soon be run by Hudson CI server]
Test by loading the reference dump into the db.
Apply the developers schema patches
then apply the developers data patches
Then run unit and system tests.
Production is deployed to customers as installers.
What we do:
We take a schema dump of our sandbox DB.
Then a sql data dump.
We diff that to the previous baseline.
that pair of deltas is to upgrade n-1 to n.
we configure the dumps and deltas.
So to install version N CLEAN we run the dump into an empty db.
To patch, apply the intervening patches.
( Juha mentioned Rail's idea of having a table recording the current DB version is a good one and should make installing updates less fraught. )
Deltas and dumps have to be reviewed before beta test.
I can't see any way around this as I've seen developers insert test accounts into the DB for themselves.
I'm afraid I'm in agreement with other posters. Developers need to script their changes.
In many cases a simple ALTER TABLE won't work, you need to modify existing data too - developers need to thing about what migrations are required and make sure they're scripted correctly (of course you need to test this carefully at some point in the release cycle).
Moreover, if you have any sense, you'll get your developers to script rollbacks for their changes as well so they can be reverted if need be. This should be tested as well, to ensure that their rollback not only executes without error, but leaves the DB in the same state as it was in previously (this is not always possible or desirable, but is a good rule most of the time).
How you hook that into a CI server, I don't know. Perhaps your CI server needs to have a known build snapshot on, which it reverts to each night and then applies all the changes since then. That's probably best, otherwise a broken migration script will break not just that night's build, but all subsequent ones.
Check out the dbdeploy, there are Java and .net tools already available, you could follow their standards for the SQL file layouts and schema version table and write your python version.
We are using command-line mysql-diff: it outputs a difference between two database schemas (from live DB or script) as ALTER script. mysql-diff is executed at application start, and if schema changed, it reports to developer. So developers do not need to write ALTERs manually, schema updates happen semi-automatically.
If you are in the .NET environment then the solution is Tarantino (archived). It handles all of this (including which sql scripts to install) in a NANT build.
I've written a tool which (by hooking into Open DBDiff) compares database schemas, and will suggest migration scripts to you. If you make a change that deletes or modifies data, it will throw an error, but provide a suggestion for the script (e.g. when a column in missing in the new schema, it will check if the column has been renamed and create xx - generated script.sql.suggestion containing a rename statement).
http://code.google.com/p/migrationscriptgenerator/ SQL Server only I'm afraid :( It's also pretty alpha, but it is VERY low friction (particularly if you combine it with Tarantino or http://code.google.com/p/simplescriptrunner/)
The way I use it is to have a SQL scripts project in your .sln. You also have a db_next database locally which you make your changes to (using Management Studio or NHibernate Schema Export or LinqToSql CreateDatabase or something). Then you execute migrationscriptgenerator with the _dev and _next DBs, which creates. the SQL update scripts for migrating across.
For oracle database we use oracle-ddl2svn tools.
This tool automated next process
for every db scheme get scheme ddls
put it under version contol
changes between instances resolved manually