How do I manage database evolutions when I'm using JPA? - mysql

I learned play by following the tutorial on their website for building a little blogging engine.
It uses JPA and in it's bootstrap calls Fixtures.Deletemodels(), (or something like that).
It basically nukes all the tables every time it runs and I lose all the data.
I've deployed a production system like (sans the nuke statement).
Now I need to deploy a large update to the production system. Lots of classes have changed, been added, and been removed. In my testing locally, without nuking the tables on every run I ran into sync issues. When I would try to write or read from tables play would throw errors. I opened up mysql and sure enough the tables had only been partially modified and modified incorrectly in some cases. Even if I have the DDL mode set to "create" in my config JPA can't seem to "figure out" how to reconcile the changes and modify my schema accordingly.
So I have to put back in the bootstrap statement that nukes all my tables.
So I started looking into database evolutions within Play and read an article on the play framework website about database evolutions. The article talked about version scripts, but it said, "If you work with JPA, Hibernate can handle database evolutions for you automatically. Evolutions are useful if you don’t use JPA".
So if JPA is supposed to be taking care of this for me, how do I deploy large updates to a large Play app? So far JPA has not been able to make the schema changes correctly and the app will throw errors. I'm not interested in losing all my data so the fix on dev "Fixtures.deleteModels()" can't really be used in prod.
Thanks in advance,
Josh

No, JPA should not take care of it for you. It's not a magic tool. If you decide to rename the table "client" to "customer", the column "street" to "line1" and to switch the values of the customer type column from 1, 2, 3 to "bronze", "silver", "gold", there is no way for JPA to read in your mind and figure all the changes to do automagically.
To migrate from one schema to another, you use the same tools as if you didn't use JPA: SQL scripts, or more adavanced schema and data migration tools, or even custom migration JDBC code.

Have a look at flyway. You may trigger database migrations from your code or maven.

There is a property called hbm2ddl.auto=update which will update your schema.
I would STRONGLY suggest to not use this setting in production as it introduces a whole new level of problems if something goes wrong.
It's perfectly fine for development though.

When a JPA container starts (say, EclipseLink or any other), it expects to find a database which matches #Entity classes you've defined in your code. If the database has been migrated already, everything will work smoothly; otherwise: probably it will fail.
So, long story short, you need to perform database migrations (or evolutions, if you prefer) before the JPA container starts. Apparently, Play performs migrations for you, before Play kicks off the database manager you configured. So, in theory, regardless the ORM you are using, Play decides when it's time for the ORM to start its work. So, conceptually it should work.
For a good presentation about this subject, please have a look at the second video at: http://flywaydb.org/documentation/videos.html

Related

What's the appropriate way to test code that uses MySQL-specific queries internally

I am collecting data and store this data in a MySQL database using Java. Additionally, I use Maven for building the project, TestNG as a test framework, and Spring-Jdbc for accessing the database. I've implemented a DAO layer that encapsulates the access to the database. Besides adding data using the DAO classes I want to execute some queries which aggregate the data and store the results in some other tables (like materialized views).
Now, I would like to write some testcases which check whether the DAO classes are working as they should. Therefore, I thought of using an in-memory database which will be populated with some test data. Since I am also using MySQL-specific SQL queries for aggregating data, I went into some trouble:
Firstly, I've thought of simply using the embedded-database functionality provided by Spring-Jdbc to instantiate an embedded database. I've decided to use the H2 implementation. There I ran into trouble because of the aggregation queries, which are using MySQL-specific content (e.g. time-manipulation functions like DATE()). Another disadvantage of this approach is that I need to maintain two ddl files - the actual ddl file defining the tables in MySQL (here I define the encoding and add comments to tables and columns, both features are MySQL-specific); and the test ddl file that defines the same tables but without comments etc. since H2 does not support comments.
I've found a description for using MySQL as an embedded database which I can use within the test cases (http://literatitech.blogspot.de/2011/04/embedded-mysql-server-for-junit-testing.html). That sounded really promising to me. Unfortunately, it didn't worked: A MissingResourceExcpetion occurred "Resource '5-0-21/Linux-amd64/mysqld' not found". It seems that the driver is not able to find the database daemon on my local machine. But I don't know what I have to look for to find a solution for that issue.
Now, I am a little bit stuck and I am wondering if I should have created the architecture differently. Do someone has some tips how I should setup an appropriate system? I have two other options in mind:
Instead of using an embedded database, I'll go with a native MySQL instance and setup a database that is only used for the testcases. This options sounds slow. Actually, I might want to setup a CI server later on and I thought that using an embedded database would be more appropriate since the test run faster.
I erase all the MySQL-specific stuff out of the SQL queries and use H2 as an embedded database for testing. If this option is the right choice, I would need to find another way to test the SQL queries that aggregates the data into materialized views.
Or is there a 3rd option which I don't have in mind?
I would appreciate any hints.
Thanks,
XComp
I've created Maven plugin exactly for this purpose: jcabi-mysql-maven-plugin. It starts a local MySQL server on pre-integration-test phase and shuts it down on post-integration-test.
If it is not possible to get the in-memory MySQL database to work I suggest using the H2 database for the "simple" tests and a dedicated MySQL instance to test MySQL-specific queries.
Additionally, the tests for the real MySQL database can be configured as integration tests in a separate maven profile so that they are not part of the regular maven build. On the CI server you can create an additional job that runs the MySQL tests periodically, e.g. daily or every few hours. With such a setup you can keep and test your product-specific queries while your regular build will not slow down. You can also run a normal build even if the test database is not available.
There is a nice maven plugin for integration tests called maven-failsafe-plugin. It provides pre- and post- integration test steps that can be used to setup the test data before the tests and to cleanup the database after the tests.

what is advantage of CodeFirst over Database First?

I was watching some videos and tutorials for EF 4.1, and I do not understand any benefit of CodeFirst (except some if DB is very small 3-4 tables and I am lazy for creating DB first).
Mostly, the best approach so far is to create Database in some sort of Database editor, which is sure faster then editing in the Entity Model and EF picks up every relationships and creates associations correctly. I know there are challenges in naming convention etc, but I feel its very confusing to manage Code First as everything looks like code and its too much to code either.
What is it that CodeFirst can do and Db first cannot?
CodeFirst cannot do anything that DB first cannot. At the end of the day they are both using the Entity Framework.
The main advantages of using codefirst are:
Development Speed - You do not have to worry about creating a DB you just start coding. Good for developers coming from a programming background without much DBA experience. It also has automatic database updates so whenever you model changes the DB is also automatically updated.
POCOs - The code is a lot cleaner you do not end up with loads of auto-generated code. You have full control of each of your classes.
Simple - you do not have a edmx model to update or maintain
For more info see Code-first vs Model/Database-first
and here Code-First or Database-First, how to choose?
Coming from a DataCentric approach, I will always find it strange the people like to create in a Code First Approach. When I design my database, I am already thinking about what each of the tables are as if they were classes anyway. How they link together and how the data will flow. I can image the whole system through the database.
I have always been taught that you work from the ground up, get your foundations right and everything else will follow. I create lots and lots of different systems for lots of different companies and the speed that I do it is based on the fact that once I have got a strong database model, I then run my custom code generator that creates the Views/Stored Procedures as well as my Controller/BusinessLayer/DataLayer for me, Put all of these together and all I have to do is create the front end.
If I had to create the whole system in code first to generate the database, as well as all of the other items then I would image it taking a lot longer. I am not saying that I am right in any terms, and I am sure that there are probably faster and more experienced ways of developing systems, but so far, I haven't found one.
Thanks for letting me speak and I hope my views have helped a little.
Migration was enabled in EntityFramework 4.3 for CodeFirst , so
you can easily update changes from model to the database seamlessly Reference 1
detailed video:Complete Reference Video
Well, it depends on your project. I'll try to make a synthase some ideas:
You have total control on the entity classes. They are no more generated, you don’t have to update T4 templates or use partial classes…
EDMX model will disappear in EF7 in favor of CodeFirst model. So keep in mind if you plan to migrate to EF or you have projects start in the near future that could use EF7.
Easier to do merge in case multiple devs are working on the model
+/- Annotations and mapping should be done manually. I would say code first approach seems lighter (less bloat) and we can keep things simple (visual model could hide undesired complexity). Open to Fluent API.
You can still visualize model via Power Tools, but the model is read-only. Any change to the model should be done manually (even the initial entities can be generated from scratch). You don’t have partial models (diagrams), but our models should be small enough.
It seems database first is better integrated with SPs and function results (some improvements have been done in EF6)

Migrating subsets of production data back to dev

In our rails app we sometimes have db entries created by users that we'd like to make part of our dev environment, without exporting the whole table. So, we'd like to be able to have a special 'dev and testing' dump.
Any recommended best practices? mysqldump seems pretty cumbersome, and we'd like to pull in rails associations as well, so maybe a rake task would make more sense.
Ideas?
You could use an ETL tool like Pentaho Kettle. Once you have initial transformation setup that you want you could easily run it with different parameters in the future. This way you could also keep all your associations. I wrote a little blurb about Pentaho for another question here.
If you provide a rough schema I could probably help you get started on what your transformation would look like.
I had a similar need and I ended up creating a plugin for that. It was developed for Rails 2.x and worked fine for me, but I didn't have much use for it lately.
The documentation is lacking, but it's pretty simple. You basically install the plugin and then have a method to_sql available on all your models. Options are explained in README.
You can try it out and let me know if you have any issues, I'll try to help.
I'd go after it using a Rails runner script. That will allow your code to access the same things your Rails app would, including the database initializations. ActiveRecord will be able to take advantage of the model relationships you've defined.
Create some "transfer" tables in your production database and copy the desired data into those using the "runner" script. From there you could serialize the data, or use a dump tool, since you'll be dealing with a reduced amount of records. Reverse the process in the development environment to move the data into the database.
I had a need to populate the database in one of my apps from remote web logs and wrote a runner script that fired off periodically via cron, ftps the data from my site and inserts the data.

MySQL Version Control - Subversion

Wondering if it is possible to have a version control of a MySQL database.
I realize this question has been asked before however the newest is almost a year ago, and at the rate things change...
The problem is coming that each developer has apache/MySQL/PHP on their own computers to which they sometimes edit the database. Its rather inconvenient if they have to send an email to all the other developers and then manually edit the test servers database.
How do you deal with this problem?
Thanks
This is not a MySQL-related solution in itself, but we've had a lot of success with a product called liquibase. (http://www.liquibase.org/)
It's a migration solution which covers many different database vendors, allowing all database changes to be coded in configuration files, all of which are kept in Subversion. Since all configuration is kept in XML files, it's easy to merge other people's changes into the mainline script and it plays well with tags and branches.
The database can be brought up to the current revision level by running the "update database" command. Most changes also have the ability to roll-back a database change, which can be helpful too. I would recommend following the practice of making sure you get current before running the migration, as this would likely be easiest.
Finally, when it comes to a production delivery, you can choose to have all the database changes output as a full SQL script so it can allow DBAs to run it and maintain a separation of duties.
So far, it's worked like a charm.
Well we use Rails which keeps all the change in the migration files. I know that a couple of PHP frameworks do the same thing - Symphony for instance. So when all the changes are merged in our repository ( we user mercurial) - we can see all the changes in migrations that need to or were applied on database in development. Than the person responsible for production rolls out code to production after a full backup is made. However if you don't use a PHP framework that takes care of this than, awied's suggestion sounds very interesting - I haven't heard of liquidbase before but I will definitely check it out.
There is a tool called iBatis, now called MyBatis that handles versions of databases perfectly.
It takes a little work to have all your changes in script instead of with a graphical tool, but, if you are familiar with coding, it's not a problem.
When you have multiple databases (like dev-test-prod), you just make 3 environment files and you can update one environment with only one command-line instruction.

How do you manage databases in development, test, and production?

I've had a hard time trying to find good examples of how to manage database schemas and data between development, test, and production servers.
Here's our setup. Each developer has a virtual machine running our app and the MySQL database. It is their personal sandbox to do whatever they want. Currently, developers will make a change to the SQL schema and do a dump of the database to a text file that they commit into SVN.
We're wanting to deploy a continuous integration development server that will always be running the latest committed code. If we do that now, it will reload the database from SVN for each build.
We have a test (virtual) server that runs "release candidates." Deploying to the test server is currently a very manual process, and usually involves me loading the latest SQL from SVN and tweaking it. Also, the data on the test server is inconsistent. You end up with whatever test data the last developer to commit had on his sandbox server.
Where everything breaks down is the deployment to production. Since we can't overwrite the live data with test data, this involves manually re-creating all the schema changes. If there were a large number of schema changes or conversion scripts to manipulate the data, this can get really hairy.
If the problem was just the schema, It'd be an easier problem, but there is "base" data in the database that is updated during development as well, such as meta-data in security and permissions tables.
This is the biggest barrier I see in moving toward continuous integration and one-step-builds. How do you solve it?
A follow-up question: how do you track database versions so you know which scripts to run to upgrade a given database instance? Is a version table like Lance mentions below the standard procedure?
Thanks for the reference to Tarantino. I'm not in a .NET environment, but I found their DataBaseChangeMangement wiki page to be very helpful. Especially this Powerpoint Presentation (.ppt)
I'm going to write a Python script that checks the names of *.sql scripts in a given directory against a table in the database and runs the ones that aren't there in order based on a integer that forms the first part of the filename. If it is a pretty simple solution, as I suspect it will be, then I'll post it here.
I've got a working script for this. It handles initializing the DB if it doesn't exist and running upgrade scripts as necessary. There are also switches for wiping an existing database and importing test data from a file. It's about 200 lines, so I won't post it (though I might put it on pastebin if there's interest).
There are a couple of good options. I wouldn't use the "restore a backup" strategy.
Script all your schema changes, and have your CI server run those scripts on the database. Have a version table to keep track of the current database version, and only execute the scripts if they are for a newer version.
Use a migration solution. These solutions vary by language, but for .NET I use Migrator.NET. This allows you to version your database and move up and down between versions. Your schema is specified in C# code.
Your developers need to write change scripts (schema and data change) for each bug/feature they work on, not just simply dump the entire database into source control. These scripts will upgrade the current production database to the new version in development.
Your build process can restore a copy of the production database into an appropriate environment and run all the scripts from source control on it, which will update the database to the current version. We do this on a daily basis to make sure all the scripts run correctly.
Have a look at how Ruby on Rails does this.
First there are so called migration files, that basically transform database schema and data from version N to version N+1 (or in case of downgrading from version N+1 to N). Database has table which tells current version.
Test databases are always wiped clean before unit-tests and populated with fixed data from files.
The book Refactoring Databases: Evolutionary Database Design might give you some ideas on how to manage the database. A short version is readable also at http://martinfowler.com/articles/evodb.html
In one PHP+MySQL project I've had the database revision number stored in the database, and when the program connects to the database, it will first check the revision. If the program requires a different revision, it will open a page for upgrading the database. Each upgrade is specified in PHP code, which will change the database schema and migrate all existing data.
You could also look at using a tool like SQL Compare to script the difference between various versions of a database, allowing you to quickly migrate between versions
Name your databases as follows - dev_<<db>> , tst_<<db>> , stg_<<db>> , prd_<<db>> (Obviously you never should hardcode db names
Thus you would be able to deploy even the different type of db's on same physical server ( I do not recommend that , but you may have to ... if resources are tight )
Ensure you would be able to move data between those automatically
Separate the db creation scripts from the population = It should be always possible to recreate the db from scratch and populate it ( from the old db version or external data source
do not use hardcode connection strings in the code ( even not in the config files ) - use in the config files connection string templates , which you do populate dynamically , each reconfiguration of the application_layer which does need recompile is BAD
do use database versioning and db objects versioning - if you can afford it use ready products , if not develop something on your own
track each DDL change and save it into some history table ( example here )
DAILY backups ! Test how fast you would be able to restore something lost from a backup (use automathic restore scripts
even your DEV database and the PROD have exactly the same creation script you will have problems with the data, so allow developers to create the exact copy of prod and play with it ( I know I will receive minuses for this one , but change in the mindset and the business process will cost you much less when shit hits the fan - so force the coders to subscript legally whatever it makes , but ensure this one
This is something that I'm constantly unsatisfied with - our solution to this problem that is. For several years we maintained a separate change script for each release. This script would contain the deltas from the last production release. With each release of the application, the version number would increment, giving something like the following:
dbChanges_1.sql
dbChanges_2.sql
...
dbChanges_n.sql
This worked well enough until we started maintaining two lines of development: Trunk/Mainline for new development, and a maintenance branch for bug fixes, short term enhancements, etc. Inevitably, the need arose to make changes to the schema in the branch. At this point, we already had dbChanges_n+1.sql in the Trunk, so we ended up going with a scheme like the following:
dbChanges_n.1.sql
dbChanges_n.2.sql
...
dbChanges_n.3.sql
Again, this worked well enough, until we one day we looked up and saw 42 delta scripts in the mainline and 10 in the branch. ARGH!
These days we simply maintain one delta script and let SVN version it - i.e. we overwrite the script with each release. And we shy away from making schema changes in branches.
So, I'm not satisfied with this either. I really like the concept of migrations from Rails. I've become quite fascinated with LiquiBase. It supports the concept of incremental database refactorings. It's worth a look and I'll be looking at it in detail soon. Anybody have experience with it? I'd be very curious to hear about your results.
We have a very similar setup to the OP.
Developers develop in VM's with private DB's.
[Developers will soon be committing into private branches]
Testing is run on different machines ( actually in in VM's hosted on a server)
[Will soon be run by Hudson CI server]
Test by loading the reference dump into the db.
Apply the developers schema patches
then apply the developers data patches
Then run unit and system tests.
Production is deployed to customers as installers.
What we do:
We take a schema dump of our sandbox DB.
Then a sql data dump.
We diff that to the previous baseline.
that pair of deltas is to upgrade n-1 to n.
we configure the dumps and deltas.
So to install version N CLEAN we run the dump into an empty db.
To patch, apply the intervening patches.
( Juha mentioned Rail's idea of having a table recording the current DB version is a good one and should make installing updates less fraught. )
Deltas and dumps have to be reviewed before beta test.
I can't see any way around this as I've seen developers insert test accounts into the DB for themselves.
I'm afraid I'm in agreement with other posters. Developers need to script their changes.
In many cases a simple ALTER TABLE won't work, you need to modify existing data too - developers need to thing about what migrations are required and make sure they're scripted correctly (of course you need to test this carefully at some point in the release cycle).
Moreover, if you have any sense, you'll get your developers to script rollbacks for their changes as well so they can be reverted if need be. This should be tested as well, to ensure that their rollback not only executes without error, but leaves the DB in the same state as it was in previously (this is not always possible or desirable, but is a good rule most of the time).
How you hook that into a CI server, I don't know. Perhaps your CI server needs to have a known build snapshot on, which it reverts to each night and then applies all the changes since then. That's probably best, otherwise a broken migration script will break not just that night's build, but all subsequent ones.
Check out the dbdeploy, there are Java and .net tools already available, you could follow their standards for the SQL file layouts and schema version table and write your python version.
We are using command-line mysql-diff: it outputs a difference between two database schemas (from live DB or script) as ALTER script. mysql-diff is executed at application start, and if schema changed, it reports to developer. So developers do not need to write ALTERs manually, schema updates happen semi-automatically.
If you are in the .NET environment then the solution is Tarantino (archived). It handles all of this (including which sql scripts to install) in a NANT build.
I've written a tool which (by hooking into Open DBDiff) compares database schemas, and will suggest migration scripts to you. If you make a change that deletes or modifies data, it will throw an error, but provide a suggestion for the script (e.g. when a column in missing in the new schema, it will check if the column has been renamed and create xx - generated script.sql.suggestion containing a rename statement).
http://code.google.com/p/migrationscriptgenerator/ SQL Server only I'm afraid :( It's also pretty alpha, but it is VERY low friction (particularly if you combine it with Tarantino or http://code.google.com/p/simplescriptrunner/)
The way I use it is to have a SQL scripts project in your .sln. You also have a db_next database locally which you make your changes to (using Management Studio or NHibernate Schema Export or LinqToSql CreateDatabase or something). Then you execute migrationscriptgenerator with the _dev and _next DBs, which creates. the SQL update scripts for migrating across.
For oracle database we use oracle-ddl2svn tools.
This tool automated next process
for every db scheme get scheme ddls
put it under version contol
changes between instances resolved manually