How to handle database changes made by automatic update script while using Liquibase? - auto-update

I'm developing a web application that also use Wordpress as part of it. I want to use Liquibase to track my database changes.
How to handle database changes made by automatic update script of Wordpress?
Can I just ignore them? and put only my own changes in Liquibase changelog file?

You could do a diffChangelog of the schemas after each WordPress upgrade so that Liquibase could keep track of the changes. You can just ignore them though - Liquibase doesn't really care about unknown schema objects. The only issue would be if your changes and the WordPress changes conflicted.

You can and should just ignore them.
Liquibase just does one thing. It keeps track of the fact that:
a certain command (say, createTable)...
...that looked a certain way at time 0 (the name of the table, its columns, etc.)...
...was definitively executed at time 0 (it stores this record in DATABASECHANGELOG).
That's it. It is not a structure enforcer or a database state reconstitution engine. It is quite possible—and permitted, and often expected—that the database will be changed by other tools and Liquibase will have no idea what went on.
So just keep your commands in your changelogs, don't worry about preexisting database structure, use preconditions to control whether or not your changesets run, and ignore everything else that might be going on in the database that happened due to other tools.

Related

How to reconcile WordPress data between staging and production [duplicate]

I've had a hard time trying to find good examples of how to manage database schemas and data between development, test, and production servers.
Here's our setup. Each developer has a virtual machine running our app and the MySQL database. It is their personal sandbox to do whatever they want. Currently, developers will make a change to the SQL schema and do a dump of the database to a text file that they commit into SVN.
We're wanting to deploy a continuous integration development server that will always be running the latest committed code. If we do that now, it will reload the database from SVN for each build.
We have a test (virtual) server that runs "release candidates." Deploying to the test server is currently a very manual process, and usually involves me loading the latest SQL from SVN and tweaking it. Also, the data on the test server is inconsistent. You end up with whatever test data the last developer to commit had on his sandbox server.
Where everything breaks down is the deployment to production. Since we can't overwrite the live data with test data, this involves manually re-creating all the schema changes. If there were a large number of schema changes or conversion scripts to manipulate the data, this can get really hairy.
If the problem was just the schema, It'd be an easier problem, but there is "base" data in the database that is updated during development as well, such as meta-data in security and permissions tables.
This is the biggest barrier I see in moving toward continuous integration and one-step-builds. How do you solve it?
A follow-up question: how do you track database versions so you know which scripts to run to upgrade a given database instance? Is a version table like Lance mentions below the standard procedure?
Thanks for the reference to Tarantino. I'm not in a .NET environment, but I found their DataBaseChangeMangement wiki page to be very helpful. Especially this Powerpoint Presentation (.ppt)
I'm going to write a Python script that checks the names of *.sql scripts in a given directory against a table in the database and runs the ones that aren't there in order based on a integer that forms the first part of the filename. If it is a pretty simple solution, as I suspect it will be, then I'll post it here.
I've got a working script for this. It handles initializing the DB if it doesn't exist and running upgrade scripts as necessary. There are also switches for wiping an existing database and importing test data from a file. It's about 200 lines, so I won't post it (though I might put it on pastebin if there's interest).
There are a couple of good options. I wouldn't use the "restore a backup" strategy.
Script all your schema changes, and have your CI server run those scripts on the database. Have a version table to keep track of the current database version, and only execute the scripts if they are for a newer version.
Use a migration solution. These solutions vary by language, but for .NET I use Migrator.NET. This allows you to version your database and move up and down between versions. Your schema is specified in C# code.
Your developers need to write change scripts (schema and data change) for each bug/feature they work on, not just simply dump the entire database into source control. These scripts will upgrade the current production database to the new version in development.
Your build process can restore a copy of the production database into an appropriate environment and run all the scripts from source control on it, which will update the database to the current version. We do this on a daily basis to make sure all the scripts run correctly.
Have a look at how Ruby on Rails does this.
First there are so called migration files, that basically transform database schema and data from version N to version N+1 (or in case of downgrading from version N+1 to N). Database has table which tells current version.
Test databases are always wiped clean before unit-tests and populated with fixed data from files.
The book Refactoring Databases: Evolutionary Database Design might give you some ideas on how to manage the database. A short version is readable also at http://martinfowler.com/articles/evodb.html
In one PHP+MySQL project I've had the database revision number stored in the database, and when the program connects to the database, it will first check the revision. If the program requires a different revision, it will open a page for upgrading the database. Each upgrade is specified in PHP code, which will change the database schema and migrate all existing data.
You could also look at using a tool like SQL Compare to script the difference between various versions of a database, allowing you to quickly migrate between versions
Name your databases as follows - dev_<<db>> , tst_<<db>> , stg_<<db>> , prd_<<db>> (Obviously you never should hardcode db names
Thus you would be able to deploy even the different type of db's on same physical server ( I do not recommend that , but you may have to ... if resources are tight )
Ensure you would be able to move data between those automatically
Separate the db creation scripts from the population = It should be always possible to recreate the db from scratch and populate it ( from the old db version or external data source
do not use hardcode connection strings in the code ( even not in the config files ) - use in the config files connection string templates , which you do populate dynamically , each reconfiguration of the application_layer which does need recompile is BAD
do use database versioning and db objects versioning - if you can afford it use ready products , if not develop something on your own
track each DDL change and save it into some history table ( example here )
DAILY backups ! Test how fast you would be able to restore something lost from a backup (use automathic restore scripts
even your DEV database and the PROD have exactly the same creation script you will have problems with the data, so allow developers to create the exact copy of prod and play with it ( I know I will receive minuses for this one , but change in the mindset and the business process will cost you much less when shit hits the fan - so force the coders to subscript legally whatever it makes , but ensure this one
This is something that I'm constantly unsatisfied with - our solution to this problem that is. For several years we maintained a separate change script for each release. This script would contain the deltas from the last production release. With each release of the application, the version number would increment, giving something like the following:
dbChanges_1.sql
dbChanges_2.sql
...
dbChanges_n.sql
This worked well enough until we started maintaining two lines of development: Trunk/Mainline for new development, and a maintenance branch for bug fixes, short term enhancements, etc. Inevitably, the need arose to make changes to the schema in the branch. At this point, we already had dbChanges_n+1.sql in the Trunk, so we ended up going with a scheme like the following:
dbChanges_n.1.sql
dbChanges_n.2.sql
...
dbChanges_n.3.sql
Again, this worked well enough, until we one day we looked up and saw 42 delta scripts in the mainline and 10 in the branch. ARGH!
These days we simply maintain one delta script and let SVN version it - i.e. we overwrite the script with each release. And we shy away from making schema changes in branches.
So, I'm not satisfied with this either. I really like the concept of migrations from Rails. I've become quite fascinated with LiquiBase. It supports the concept of incremental database refactorings. It's worth a look and I'll be looking at it in detail soon. Anybody have experience with it? I'd be very curious to hear about your results.
We have a very similar setup to the OP.
Developers develop in VM's with private DB's.
[Developers will soon be committing into private branches]
Testing is run on different machines ( actually in in VM's hosted on a server)
[Will soon be run by Hudson CI server]
Test by loading the reference dump into the db.
Apply the developers schema patches
then apply the developers data patches
Then run unit and system tests.
Production is deployed to customers as installers.
What we do:
We take a schema dump of our sandbox DB.
Then a sql data dump.
We diff that to the previous baseline.
that pair of deltas is to upgrade n-1 to n.
we configure the dumps and deltas.
So to install version N CLEAN we run the dump into an empty db.
To patch, apply the intervening patches.
( Juha mentioned Rail's idea of having a table recording the current DB version is a good one and should make installing updates less fraught. )
Deltas and dumps have to be reviewed before beta test.
I can't see any way around this as I've seen developers insert test accounts into the DB for themselves.
I'm afraid I'm in agreement with other posters. Developers need to script their changes.
In many cases a simple ALTER TABLE won't work, you need to modify existing data too - developers need to thing about what migrations are required and make sure they're scripted correctly (of course you need to test this carefully at some point in the release cycle).
Moreover, if you have any sense, you'll get your developers to script rollbacks for their changes as well so they can be reverted if need be. This should be tested as well, to ensure that their rollback not only executes without error, but leaves the DB in the same state as it was in previously (this is not always possible or desirable, but is a good rule most of the time).
How you hook that into a CI server, I don't know. Perhaps your CI server needs to have a known build snapshot on, which it reverts to each night and then applies all the changes since then. That's probably best, otherwise a broken migration script will break not just that night's build, but all subsequent ones.
Check out the dbdeploy, there are Java and .net tools already available, you could follow their standards for the SQL file layouts and schema version table and write your python version.
We are using command-line mysql-diff: it outputs a difference between two database schemas (from live DB or script) as ALTER script. mysql-diff is executed at application start, and if schema changed, it reports to developer. So developers do not need to write ALTERs manually, schema updates happen semi-automatically.
If you are in the .NET environment then the solution is Tarantino (archived). It handles all of this (including which sql scripts to install) in a NANT build.
I've written a tool which (by hooking into Open DBDiff) compares database schemas, and will suggest migration scripts to you. If you make a change that deletes or modifies data, it will throw an error, but provide a suggestion for the script (e.g. when a column in missing in the new schema, it will check if the column has been renamed and create xx - generated script.sql.suggestion containing a rename statement).
http://code.google.com/p/migrationscriptgenerator/ SQL Server only I'm afraid :( It's also pretty alpha, but it is VERY low friction (particularly if you combine it with Tarantino or http://code.google.com/p/simplescriptrunner/)
The way I use it is to have a SQL scripts project in your .sln. You also have a db_next database locally which you make your changes to (using Management Studio or NHibernate Schema Export or LinqToSql CreateDatabase or something). Then you execute migrationscriptgenerator with the _dev and _next DBs, which creates. the SQL update scripts for migrating across.
For oracle database we use oracle-ddl2svn tools.
This tool automated next process
for every db scheme get scheme ddls
put it under version contol
changes between instances resolved manually

merge design of mysql between localhost and server?

I'm kinda new to this kind of problem. I'm developing a web-app and changing DB design trying to improve it and add new tables.
well since we had not published the app since some days ago,
what I would do was to dump all the tables in server and import my local version but now we've passed the version 1 and users are starting to use it.
so I can't dump the server, but I still would need to update design of server DB when I want to publish a new version. What are the best practices here?
I like to know how I can manage differences between local and server in mysql?
I need to preserve data in server and just change the design, data on local DB are only for test.
Before this all my other apps were small and I would change a single table or column but I can't keep track of all changes now, since I might revert many of them later and managing all team members on this is impossible.
Assuming you are not using a framework that provides a migration tool for database, you need to keep track of the changes manually.
Create a folder sql_upgrades (or whatever name you name) in your code repository
Whenever a team member updates the SQL schema, he creates a file in this folder with the corresponding ALTER statements, and possibly UPDATE, CREATE TABLE etc. So basically the file contains all the statements used to update the dev database.
Name the files so that it's easy to manage, and that statements for the same feature are grouped together. I suggest something like YYYYMMDD-description.sql, e.g. 20150825-queries-for-feature-foobar.sql
When you push to production, execute the files to upgrade you SQL schema in production. Only execute the files that have been created since your last deployment, and execute them in the order they have been created.
Should you need to rollback a file, check the queries it contains, and write queries to undo what was done (drop added columns, re-create dropped columns, etc.). Note that this is "non-trivial", as many changes cannot be rolled back fully (e.g. you can recreate a dropped column, but you will have lost the data inside).
Many web frameworks (such as Ruby of Rails) have tools that will do exactly that process for you. They usually work together with the ORM provided by the framework. Keeping track of the changes manually in SQL works just as well.

What's a good way to publish database definition changes from a development copy to a production copy of a database?

How can I track changes to a development database and apply those changes to a production database (SQL Server 2008)?
I keep a local copy of a database on my development server, and as I'm adding new features, I may add new tables or change field and table names in the database. What's a good way to track such changes and then apply them to the main database?
Is there some way to do a "diff"-like operation between two databases and merge definition changes?
I considered merge-replication, but I'm not sure how well than handles schema changes. For example, here: http://technet.microsoft.com/en-us/library/ms151870.aspx it mentions that I basically cannot use SSMS to make definition changes, because it drops and recreates tables, which is not allowed for published objects.
A smart piece of software could compare column counts, types, positions, and apply other fuzzy matching/logical deduction methods to figure out that a table was renamed or a new table was added or a column name changed, after which it could present the differences to the user for confirmation and automatic application.
Does anything like what I've described above exist, or am I stuck remembering to save DDL statements in SSMS and running them manually in the production database?
Maybe you need a migration tool like (for example) FluentMigrator, which helps you track database changes in source code.
Here is a tutorial from the original author of Fluent Migrator, explaining what Fluent Migrator is, why you might need it and how it works.
Another alternative would be what you already mentioned:
A smart piece of software could compare column counts, types,
positions, and apply other fuzzy matching/logical deduction methods to
figure out that a table was renamed or a new table was added or a
column name changed, after which it could present the differences to
the user for confirmation and automatic application.
I never tried it myself, but I've seen lots of recommendations for Redgate SQL Compare (which apparently does exactly what you asked for) here at Stack Overflow.

How do you maintain revision control of your database structure?

What is the simplest way of keeping track of changes to a projects database structure?
When I change something about the database (eg, add a new table, add a new field to an existing table, add an index etc), I want that to be propagated to the rest of the team, and ultimately the production server, with the minimal fuss and effort.
At the moment, the solution is pretty weak and relies of people remembering to do things, which is an accident waiting to happen.
Everything else is managed with standard revision control software (Perforce in our case).
We use MySQL, so tools that understand that would be helpful, though I would also be interested to learn how other places are handling this anyway, regardless of database engine.
You can dump the schema and commit it -- and the RCS will take care of the changes between versions.
You can get a tool like Sql Compare from Red-Gate which allows you to point to two databases and it will let you know what is different, and will build alter scripts for you.
If you're using .NET(Visual Studio), you can create a Database project and check that into source control.
This has alrady been discussed a lot I think. Anyhow I really like Rails approach to the issue. It's code that has three things:
The version number
The way of applying the changes (updates a version table)
The way of rolling the changes back (sets the version on the version table to the previous)
So, each time you make a changeset you create this code file that can rollback or upgrade the database schema when you execute it.
This, being code, you can commit in any revision control system. You commit the first dump and then the scripts only.
The great thing about this approach is that you can easily distribute the database changes to customers, whereas with a standard just dump the schema and update it approach generating an upgrade/rollback script is a nuisance
In my company each developer is encouraged to save all db sctructure changes to a script files in the folder containing module's revision number. These scripts are kept in svn repository.
When application starts, the db upgrade code compares current db version and code version and if the code is newer - looks into scripts folder and applies all db changes automatically.
This way every instance of application (on production or developers machines) always upgrades db to their code version and it works great.
Of course, some automation could be added - if we find a suitable tool.
Poor mans version control:
Separate file for each object (table, view, etc)
When altering tables, you want to diff CREATE TABLE to CREATE TABLE. Source code history is for communicating a story. You can't do a meaningful diff of CREATE TABLE and ALTER TABLE
Try to make changes to the files, then commit them to source control, then commit them to the SQL database. Most tools poorly support this because you shouldn't commit to source control until you test and you can't test without putting the code into SQL. So in practice, you try to use SQL Redgate to compare your files to the SQL database. Failing that, you adopt a harsh policy of dropping everything in the database and replacing it with what made it into source control.
Change scripts usually are single use, but applications exist, like wordpress, where you need to move the schema from 1.0 to 1.1, 1.1 to 1.5, etc. Each of those should be under source control and modified as such (i.e. as you find bugs in the script that moves you from 1.0 to 1.1, you create a new version of that script, not yet-another script)

How do you manage databases in development, test, and production?

I've had a hard time trying to find good examples of how to manage database schemas and data between development, test, and production servers.
Here's our setup. Each developer has a virtual machine running our app and the MySQL database. It is their personal sandbox to do whatever they want. Currently, developers will make a change to the SQL schema and do a dump of the database to a text file that they commit into SVN.
We're wanting to deploy a continuous integration development server that will always be running the latest committed code. If we do that now, it will reload the database from SVN for each build.
We have a test (virtual) server that runs "release candidates." Deploying to the test server is currently a very manual process, and usually involves me loading the latest SQL from SVN and tweaking it. Also, the data on the test server is inconsistent. You end up with whatever test data the last developer to commit had on his sandbox server.
Where everything breaks down is the deployment to production. Since we can't overwrite the live data with test data, this involves manually re-creating all the schema changes. If there were a large number of schema changes or conversion scripts to manipulate the data, this can get really hairy.
If the problem was just the schema, It'd be an easier problem, but there is "base" data in the database that is updated during development as well, such as meta-data in security and permissions tables.
This is the biggest barrier I see in moving toward continuous integration and one-step-builds. How do you solve it?
A follow-up question: how do you track database versions so you know which scripts to run to upgrade a given database instance? Is a version table like Lance mentions below the standard procedure?
Thanks for the reference to Tarantino. I'm not in a .NET environment, but I found their DataBaseChangeMangement wiki page to be very helpful. Especially this Powerpoint Presentation (.ppt)
I'm going to write a Python script that checks the names of *.sql scripts in a given directory against a table in the database and runs the ones that aren't there in order based on a integer that forms the first part of the filename. If it is a pretty simple solution, as I suspect it will be, then I'll post it here.
I've got a working script for this. It handles initializing the DB if it doesn't exist and running upgrade scripts as necessary. There are also switches for wiping an existing database and importing test data from a file. It's about 200 lines, so I won't post it (though I might put it on pastebin if there's interest).
There are a couple of good options. I wouldn't use the "restore a backup" strategy.
Script all your schema changes, and have your CI server run those scripts on the database. Have a version table to keep track of the current database version, and only execute the scripts if they are for a newer version.
Use a migration solution. These solutions vary by language, but for .NET I use Migrator.NET. This allows you to version your database and move up and down between versions. Your schema is specified in C# code.
Your developers need to write change scripts (schema and data change) for each bug/feature they work on, not just simply dump the entire database into source control. These scripts will upgrade the current production database to the new version in development.
Your build process can restore a copy of the production database into an appropriate environment and run all the scripts from source control on it, which will update the database to the current version. We do this on a daily basis to make sure all the scripts run correctly.
Have a look at how Ruby on Rails does this.
First there are so called migration files, that basically transform database schema and data from version N to version N+1 (or in case of downgrading from version N+1 to N). Database has table which tells current version.
Test databases are always wiped clean before unit-tests and populated with fixed data from files.
The book Refactoring Databases: Evolutionary Database Design might give you some ideas on how to manage the database. A short version is readable also at http://martinfowler.com/articles/evodb.html
In one PHP+MySQL project I've had the database revision number stored in the database, and when the program connects to the database, it will first check the revision. If the program requires a different revision, it will open a page for upgrading the database. Each upgrade is specified in PHP code, which will change the database schema and migrate all existing data.
You could also look at using a tool like SQL Compare to script the difference between various versions of a database, allowing you to quickly migrate between versions
Name your databases as follows - dev_<<db>> , tst_<<db>> , stg_<<db>> , prd_<<db>> (Obviously you never should hardcode db names
Thus you would be able to deploy even the different type of db's on same physical server ( I do not recommend that , but you may have to ... if resources are tight )
Ensure you would be able to move data between those automatically
Separate the db creation scripts from the population = It should be always possible to recreate the db from scratch and populate it ( from the old db version or external data source
do not use hardcode connection strings in the code ( even not in the config files ) - use in the config files connection string templates , which you do populate dynamically , each reconfiguration of the application_layer which does need recompile is BAD
do use database versioning and db objects versioning - if you can afford it use ready products , if not develop something on your own
track each DDL change and save it into some history table ( example here )
DAILY backups ! Test how fast you would be able to restore something lost from a backup (use automathic restore scripts
even your DEV database and the PROD have exactly the same creation script you will have problems with the data, so allow developers to create the exact copy of prod and play with it ( I know I will receive minuses for this one , but change in the mindset and the business process will cost you much less when shit hits the fan - so force the coders to subscript legally whatever it makes , but ensure this one
This is something that I'm constantly unsatisfied with - our solution to this problem that is. For several years we maintained a separate change script for each release. This script would contain the deltas from the last production release. With each release of the application, the version number would increment, giving something like the following:
dbChanges_1.sql
dbChanges_2.sql
...
dbChanges_n.sql
This worked well enough until we started maintaining two lines of development: Trunk/Mainline for new development, and a maintenance branch for bug fixes, short term enhancements, etc. Inevitably, the need arose to make changes to the schema in the branch. At this point, we already had dbChanges_n+1.sql in the Trunk, so we ended up going with a scheme like the following:
dbChanges_n.1.sql
dbChanges_n.2.sql
...
dbChanges_n.3.sql
Again, this worked well enough, until we one day we looked up and saw 42 delta scripts in the mainline and 10 in the branch. ARGH!
These days we simply maintain one delta script and let SVN version it - i.e. we overwrite the script with each release. And we shy away from making schema changes in branches.
So, I'm not satisfied with this either. I really like the concept of migrations from Rails. I've become quite fascinated with LiquiBase. It supports the concept of incremental database refactorings. It's worth a look and I'll be looking at it in detail soon. Anybody have experience with it? I'd be very curious to hear about your results.
We have a very similar setup to the OP.
Developers develop in VM's with private DB's.
[Developers will soon be committing into private branches]
Testing is run on different machines ( actually in in VM's hosted on a server)
[Will soon be run by Hudson CI server]
Test by loading the reference dump into the db.
Apply the developers schema patches
then apply the developers data patches
Then run unit and system tests.
Production is deployed to customers as installers.
What we do:
We take a schema dump of our sandbox DB.
Then a sql data dump.
We diff that to the previous baseline.
that pair of deltas is to upgrade n-1 to n.
we configure the dumps and deltas.
So to install version N CLEAN we run the dump into an empty db.
To patch, apply the intervening patches.
( Juha mentioned Rail's idea of having a table recording the current DB version is a good one and should make installing updates less fraught. )
Deltas and dumps have to be reviewed before beta test.
I can't see any way around this as I've seen developers insert test accounts into the DB for themselves.
I'm afraid I'm in agreement with other posters. Developers need to script their changes.
In many cases a simple ALTER TABLE won't work, you need to modify existing data too - developers need to thing about what migrations are required and make sure they're scripted correctly (of course you need to test this carefully at some point in the release cycle).
Moreover, if you have any sense, you'll get your developers to script rollbacks for their changes as well so they can be reverted if need be. This should be tested as well, to ensure that their rollback not only executes without error, but leaves the DB in the same state as it was in previously (this is not always possible or desirable, but is a good rule most of the time).
How you hook that into a CI server, I don't know. Perhaps your CI server needs to have a known build snapshot on, which it reverts to each night and then applies all the changes since then. That's probably best, otherwise a broken migration script will break not just that night's build, but all subsequent ones.
Check out the dbdeploy, there are Java and .net tools already available, you could follow their standards for the SQL file layouts and schema version table and write your python version.
We are using command-line mysql-diff: it outputs a difference between two database schemas (from live DB or script) as ALTER script. mysql-diff is executed at application start, and if schema changed, it reports to developer. So developers do not need to write ALTERs manually, schema updates happen semi-automatically.
If you are in the .NET environment then the solution is Tarantino (archived). It handles all of this (including which sql scripts to install) in a NANT build.
I've written a tool which (by hooking into Open DBDiff) compares database schemas, and will suggest migration scripts to you. If you make a change that deletes or modifies data, it will throw an error, but provide a suggestion for the script (e.g. when a column in missing in the new schema, it will check if the column has been renamed and create xx - generated script.sql.suggestion containing a rename statement).
http://code.google.com/p/migrationscriptgenerator/ SQL Server only I'm afraid :( It's also pretty alpha, but it is VERY low friction (particularly if you combine it with Tarantino or http://code.google.com/p/simplescriptrunner/)
The way I use it is to have a SQL scripts project in your .sln. You also have a db_next database locally which you make your changes to (using Management Studio or NHibernate Schema Export or LinqToSql CreateDatabase or something). Then you execute migrationscriptgenerator with the _dev and _next DBs, which creates. the SQL update scripts for migrating across.
For oracle database we use oracle-ddl2svn tools.
This tool automated next process
for every db scheme get scheme ddls
put it under version contol
changes between instances resolved manually