How do you version and sync your MySQL data model? - mysql

What's the best way to save my MySQL data model and automatically apply changes to my development database server as they are made (or at least nightly)?
For example, today I'm working on my project and create this table in my database, and save the statement to SQL file to deploy to production later:
create table dog (
uid int,
name varchar(50)
);
And tomorrow, I decide I want to record the breed of each dog too. So I change the SQL file to read:
create table dog (
uid int,
name varchar(50),
breed varchar(30)
);
That script will work in production for the first release, but it won't help me update my development database because ERROR 1050 (42S01): Table 'dog' already exists. Furthermore, it won't work in production if this change was made after the first release. So I really need to ALTER the table now.
So now I have two concerns:
Is this how I should be saving my
data model (a bunch of create
statements in a SQL file), and
How
should I be applying changes like
this to my database?
My goal is to release changes accurately and enable continuous integration. I use a tool called DDLSYNC do find and apply difference in an Oracle database, but I'm not sure what similar tools exist for MySQL.

At work, we developed a small script to manage our database versioning. Every change to any table or set of data gets it's own SQL file.
The files are numbered sequentially. We keep track of which update files have been run by storing that information in the database. The script inserts a row with the filename when the file is about to be executed, and updates the row with a completion timestamp when the execution finishes. This is wrapped inside a transaction. (It's worth remembering that DDL commands in MySQL can not occur within a transaction. Any attempt to perform DDL in a transaction causes an implicit commit.)
Because the SQL files are part of our source code repository, we can make running the update script part of the normal rollout process. This makes keeping the database and the code in sync easy as pie. Honestly, the hardest part is making sure another dev hasn't grabbed the next number in a pending commit.
We combine this update system with an (optional) nightly wipe of our dev database, replacing the contents with last night's live system backup. After the backup is restored, the update gets run, with any pending update files getting run in the process.
The restoration occurs in such a way that only tables that were in the live database get overwritten. Any update that adds a table therefore also has to be responsible for only adding it if it doesn't exist. DROP TABLE IF EXISTS is handy. Unfortunately not all databases support that, so the update system also allows for execution of scripts written in our language of choice, not just SQL.
All of this in about 150 lines of code. It's as easy as reading a directory, comparing the contents to a table, and executing anything that hasn't already been executed, in a determined order.

There are standard tools for this in many frameworks: Rails has something called Migrations, something that's easily replicated in PHP or any similar language.

Related

Database Version / Change Control for Data not Schema?

After reading a few articles here and around, I have realised that database version control in a development team is actually of high importance.
Until now I have been using a simple dump whole database each time there is an update, if only 1 table was altered sometimes we can get away with just dumping the single table then reimporting. Not the best but it works quite well, for additive changes and we haven't had any hiccups yet.
Now, I save a .mwb (Mysql Workbench diagram) file in the git repository of the project I'm working on.
Then I also use dbv for schema management, along with git, with each branch being named based on the project and it's working quite well. This allows me to version schematic changes with the ability to revert or rollback.
However, what about the data contained in the tables. How can this be maintained? Maybe I'm better off just sticking with the old method. I understand on projects with the same DB structure but different data that's fine but what about sites with specific database data that needs to be versioned and managed.
Also what about the base of already deployed sites that need database changes, how can this be seamless. Some have suggested the use of update/alter scripts and that works fine with default values and such. But what if I have made a change on a website platform that requires every websites database to be changed, and keep the data intact?
I've worked mostly in business application development and configuration management. Your question is representative for the challenges in such an environment; when you upgrade for instance Microsoft Word, you don't need to change all documents right away from doc to docx. And the documents even have a more simple structure a full relation database.
Not so for business applications; users skip releases, make unauthorized changes to the data model and the system needs to keep running and providing the correct numbers...
We use for our own applications (largest one is like 600 tables) a self-developed CASE tool which includes branching/merging, but the approach can also be done manually.
Versioning Datamodel
The data model can be written down in a structured way. For instance as table contents (CSV to be loaded in a table with meta data) or as code that detects the version in use and adds columns and tables when missing, including non-trivial migrations.
This even allows multiple users at the same time to change the data model.
When you use auto-detection (for instance, we use a call named "verify_column" instead of "add_column"), this even allows smooth migration independent of the release number the customer is starting the upgrade from. Such a procedure analyzes the table to be changed and issues the correct DDL such as alter table t1 add col1 number not null when a column is missing or alter table t1 modify col1 not null when the column was already present but nullable.
For Oracle and SQL Server I can provide you with a few sample procedures. In MySQL I would code this using a client side language, preferably OS independent to allow installations to run on Windows and Linux. Maybe using Apache Ant when you have experience with that.
Versioning Data
We split the tables in four categories:
R: referential data; data the application site must provide before he actually use the system. For instance, general ledger account codes. The referential data seldomly changes after go live and does not continuously grow in size. The contents reflect the site's business model where the application is used.
T: transaction data; data the site registers, changes and removes during use of the application. For instance, general ledger entries. The transaction data starts at 0 an grows continuously. When company doubles in revenues, transaction data also doubles.
S: seeded data; data NOT maintained by the user at the site but provided and maintained by the developing party. Essentially this is code turned into data. For example, 'F' stands for 'Female'. Errors in seeded data can lead to system errors.
O: the rest (ideally not needed, since they are technical, but some systems require a temporary table A or a scratch table B).
The contents of tables of category 'S' (seeded data) is placed under version control. We normally register these as metadata in our case tool, then named 'data sets', but you can also use for instance Microsoft Excel or even code.
For example, in Excel you would have a list of rows of seeded data. In column A you might enter an Excel function like =B..&"|"&C..& "|" & ... which concatenates everything and makes it suitable for loading by a loader tool.
For example in code, you might have a call like:
verifySeed('TABLE_A', 'CODE', 'VALUE')
The Excel is a little bit hard to bring under version control allowing multiple users to change contents at the same time. The approach with code is very simple.
Please remember to also add features to remove obsoleted seeded data. For instance, by explicitly listing obsoleted seeded data or by automatically removing all seeded data present in the tables but not touched by the last installation.
You would need to keep a journal of transactions on your datamodel that is synchronised to your code versions. For each update that adds information (i.e. a new field) you can simply enter the statements like 'ALTER TABLE x ADD COLUMN y ...' and provide a DEFAULT VALUE (with a function perhaps) in an update script. And a 'ALTER TABLE x REMOVE COLUMN y ...' for the downdate script. You would need to export your data before you truncate information in a table. You can convert the dumped table data to SQL for the inverse transaction so that you can add the missing information using these.
You can use a 'journal' table within your data-model to keep track of these transactions using simple ordinals that denote the applied scripts. Whenever the software is installed it can compare these numbers to create a list of transactions to play to move the database from state N to state X, backwards or forwards, without losing any data!

Perl: How to copy/mirror remote MYSQL table(s) to another database? Possibly different structure too?

I am very new to this and a good friend is in a bind. I am at my wits end. I have used gui's like navicat and sqlyog to do this but, only manually.
His band info data (schedules and whatnot) is in a MYSQL database on a server (admin server).
I am putting together a basic site for him written in Perl that grabs data from a database that resides on my server (public server) and displays schedule info, previous gig newsletters and some fan interaction.
He uses an administrative interface, which he likes and desires to keep, to manage the data on the admin server.
The admin server db has a bunch of tables and even table data the public db does not need.
So, I created tables on the public side that only contain relevant data.
I basically used a gui to export the data, then insert to the public side whenever he made updates to the admin db (copy and paste).
(FYI I am using DBI module to access the data in/via my public db perl script.)
I could access the admin server directly to grab only the data I need but, the whole purpose of this is to "mirror" the data not access the admin server on every query. Also, some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me. There is however a "time" column which could be utilized to compare to.
I cannot "sync" due to the fact that the structures are different, I only need the relevant table data from only three tables.
SO...... I desire to automate!
I read "copy" was a fast way but, my findings in how to implement were too advanced for my level.
I do not have the luxury of placing a script on the admin server to notify when there was an update.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db.
I would then desire to update or insert the new or changed data to the public servers db.
This "check" could be set up in a cron job I guess or triggered when a specific page loads on the public side. (the same sub routine called by the cron I would assume).
This data does not need to be "real time" but, if he updates something it would be nice to have it appear as quickly as possible.
I have done much reading, module research and experimenting but, here I am again at stackoverflow where I always get great advice and examples.
Much of the terminology is still quite over my head so verbose examples with explanations really help me learn quicker.
Thanks in advance.
The two terms you are looking for are either "replication" or "ETL".
First, replication approach.
Let's assume your admin server has tables T1, T2, T3 and your public server has tables TP1, TP2.
So, what you want to do (since you have different table structres as you said) is:
Take the tables from public server, and create exact copies of those tables on the admin server (TP1 and TP2).
Create a trigger on the admin server's original tables to populate the data from T1/T2/T3 into admin server's copy of TP1/TP2.
You will also need to do initial data population from T1/T2/T3 into admin server's copy of TP1/TP2. Duh.
Set up the "replication" from admin server's TP1/TP2 to public server's TP1/TP2
A different approach is to write a program (such programs are called ETL - Extract-Transform-Load) which will extract the data from T1/T2/T3 on admin server (the "E" part of "ETL"), massage the data into format suitable for loading into TP1/TP2 tables (the "T" part of "ETL"), transfer (via ftp/scp/whatnot) those files to public server, and the second half of the program (the "L") part will load the files into the tables TP1/TP2 on public server. Both halfs of the program would be launched by cron or your scheduler of choice.
There's an article with a very good example of how to start building Perl/MySQL ETL: http://oreilly.com/pub/a/databases/2007/04/12/building-a-data-warehouse-with-mysql-and-perl.html?page=2
If you prefer not to build your own, here's a list of open source ETL systems, never used any of them so no opinions on their usability/quality: http://www.manageability.org/blog/stuff/open-source-etl
I think you've misunderstood ETL as a problem domain, which is complicated, versus ETL as a one-off solution, which is often not much harder than writing a report. Unless I've totally misunderstood your problem, you don't need a general ETL solution, you need a one-off solution that works on a handful of tables and a few thousand rows. ETL and Schema mapping sound scarier than they are for a single job. (The generalization, scaling, change-management, and OLTP-to-OLAP support of ETL are where it gets especially difficult.) If you can use Perl to write a report out of a SQL database, you probably know enough to handle the ETL involved here.
1- I would like to set up a script to check a table to see if a row was updated or added on the admin servers db. I would then desire to update or insert the new or changed data to the public servers db.
If every table you need to pull from has an update timestamp column, then your cron job includes some SELECT statements with WHERE clauses based on the last time the cron job ran to get only the updates. Tables without an update timestamp will probably need a full dump.
I'd use a one-to-one table mapping unless normalization was required... just simpler to my opinion. Why complicate it with "big" schema changes if you don't have to?
some tables are THOUSANDS of rows and parsing every row in a loop seemed too "bulky" to me.
Limit your queries to only the columns you need (and if there are no BLOBs or exceptionally big columns in what you need) a few thousand rows should not be a problem via DBI with a FETCHALL method. Loop all you want locally, just make as few trips to the remote database as possible.
If a row is has a newer date, update it. I will also have to check for new rows for insertion.
Each table needs one SELECT ... WHERE updated_timestamp_columnname > last_cron_run_timestamp. That result set will contain all rows with newer timestamps, which contains newly inserted rows (if the timestamp column behaves like I'd expect). For updating your local database, check out MySQL's ON DUPLICATE KEY UPDATE syntax... this will let you do it in one step.
... how to implement were too advanced for my level ...
Yes, I have actually done this already but, I have to manually update...
Some questions to help us understand your level... Are you hitting the database from the mysql client command-line or from a GUI? Have you gotten to the point where you've wrapped your SQL queries in Perl and DBI, yet?
If the two databases have different, you'll need an ETL solution to map from one schema to another.
If the schemas are the same, all you have to do is replicate the data from one to the other.
Why not just create identical structure on the 'slave' server to the master server. Then create a small table that keeps track of the last timestamp or id for the updated tables.
Then select from the master all rows changed since the last timestamp or greater than the id. Insert them into the matching table on the slave server.
You will need to be careful of updated rows. If a row on the master is updated but the timestamp doesn't change then how will you tell which rows to fetch? If that's not an issue the process is quite simple.
If it is an issue then you need to be more sophisticated, but without knowing the data structure and update mechanism its a goose chase to give pointers on it.
The script could be called by cron every so often to update the changes.
if the database structures must be different on the two servers then a simple translation step may need to be added, but most of the time that can be done within the sql select statement and maybe a join or two.

Managing mysql schema changes with SQL scripts and transactions

I am working with multiple databases in a PHP/MySQL application. I have development, testing, staging and production databases to keep in sync.
Currently we're still building the thing, so it's easy to keep them in sync. I use my dev db as the master and when I want to update the others I just nuke them and recreate them from mine. However, in future once there's real data I can't do this.
I would like to write SQL scripts as text files that I can version with the PHP changes that accompany them in svn, then apply the scripts to each db instance as I update them.
I would like to use transactions so that if there are any errors during the script, it will roll back any partial changes made. All tables are InnoDB
When I try to add a column that already exists, and add one new column like this:
SET FOREIGN_KEY_CHECKS = 0;
START TRANSACTION;
ALTER TABLE `projects` ADD COLUMN `foo1` varchar(255) NOT NULL after `address2`;
ALTER TABLE `projects` ADD COLUMN `foo2` varchar(255) NOT NULL after `address2`;
COMMIT;
SET FOREIGN_KEY_CHECKS = 1;
... it still commits the new column even though it failed to add the first one, of course, because I issued COMMIT instead of ROLLBACK.
I need it to issue the rollback command conditionally upon error. How can I do this in an adhoc SQL script?
I know of the 'declare exit handler' feature of stored procs but I don't want to store this; I just want to run it as an adhoc script.
Do I need to make it into a stored proc anyway in order to get conditional rollbacks, or is there another way to make the whole transaction atomic in a single adhoc SQL script?
Any links to examples welcome - I've googled but am only finding stored proc examples so far
Many thanks
Ian
EDIT - This is never going to work; ALTER TABLE causes an implicit commit when encountered: http://dev.mysql.com/doc/refman/5.0/en/implicit-commit.html Thanks to Brian for the reminder
I learned the other day that data definition language statements are always acted on in MySQL and cause transactions to be committed when they are applied. I think you'll probably have to do this interactively if you want to be sure of success.
I can't find the question on this website where this was discussed (it was only a couple of days ago).
If you need to keep multiple databases in synch, you could look into replication. Although replication isn't to be trifled with, it may be what you need. See http://dev.mysql.com/doc/refman/5.0/en/replication-features.html

Large scale MySQL changes to active sites

Just some pointers here.
I am making fairly extensive modifications to a site, including the MySQL database.
My plan is to do everything on my development server, export the new MySQL structure for the db and import it onto the clients server.
Basically I need to know that performing a structure only import will not overwrite/delete existing data. I am not making changes to the data type or field length.
In my experience, when you export a database (through phpMyAdmin for instance), part of the SQL script that is created includes a "DROP TABLE IF EXISTS 'table_name';" before doing a "CREATE TABLE 'table_name'...;" to build the new table.
My guess is that this is not what you want to do! Certainly use the dev system to alter the structure in order to make everything correct, but then look around for a database synchronisation routine where you can provide the old structure, the new structure, and the software will create the appropriate "ALTER TABLE 'table_name'...;" scripts to make the required changes.
You should then really examine these change files before executing them on the live database, and of course BACKUP the live database, and ensure you are able to fully recover from the backup before starting any of the alterations!
I've had to do this a lot, and it always goes like this:
Make a backup of the live database, complete with data.
Make a backup of the live database schema only.
Calculate the differences between the old (live) schema and the new (devel) schema.
Create all of the 'ALTER TABLE ...' DDL statements necessary to upgrade from the old schema to the new one. Keep in mind that if you rename a field, you probably won't be able to just rename it -- you'll need to create the new field, copy the data from the old field, and then drop the old field.
If you changed relationships between tables, you'll probably need to drop indexes and foreign key relationships first, and then add them back afterwards.
You'll need to populate any new fields based upon their default values, if any.
Once you've got all the pieces working, you'll need to combine them into one large script, and then run it on a copy of the live database.
Dump the schema and compare it against the desired new schema -- if they don't match, go back to step 3 and repeat.
Dump the data and compare it against the expected changes -- again, if they don't match, go back to step 3 and repeat.
You're going to learn a lot more about SQL DDL/DML during this process than you ever thought you'd learn. (For one project, where we were switching from natural keys to UUID keys for 50+ tables, I ended up writing programs to generate all of the DDL/DML.)
Good luck, and make frequent backups.
I'd recommend to prepare a sql script for every change you do on development server, so you will be able to reproduce it on development. You shouldn't get to the point where you need to calculate differences between database structures
This is how I do it, all changes are reflected in sql scripts, and I can reconstruct the history of my database running all these files if needed.
Test the final release version on a "staging" mysql server. Make a copy of your production server on another machine and test your script to make sure everything's ok.
Of course, preliminary database backup is a must.

Is there any way to automatically create a trigger on creation of new table in MySQL?

Is there any way to automatically create a trigger on creation of new table in MySQL?
As I've pointed out in your other question, I think a process and security review is in order here. It's an audited database, so nobody (especially third-party service providers) should be creating tables in your database without your knowledge.
The issue you've got is, as well as the new table being created, you will also need to have another table created to store the audited/changed records, which will have an identical structure as the original table with possibly a time/date and user column. If a third-party provider is creating this table, they won't know to create the auditing table, therefore even if you could generate your triggers dynamically, they wouldn't work.
It's impossible to create a single table that will hold all changes record for all other tables in your database because the structure between tables inevitably differs.
Therefore: make all change requests (e.g. providers wants to create TableX, they submit a change request (including the SQL script) explaining the reason for the change) to yourself and/or your team.
You execute the SQL on a test copy of your database, and use the same structure to create another table to hold the modified records.
You then create and test the necessary triggers, generate a new SQL script to create the two tables and your triggers and execute that on your live database. You give your provider permissions to use the new table and away they go.
Everyone's happy. Yes, it may take a little while longer, and yes you'll have more work to do, but that's a hell of a lot less work than is required to try and parse query logs to re-create records that have already been changed/deleted, or parse the binary log and keep up-to-date with every change, and modify your code when the format of the log file changes etc etc.