Modern, smart ways of updating MySql table definitions with software updates? - mysql

A common occurrence when rolling out the next version of a software package is that some of the data structures change. When you are using a Sql database, an appropriate series of alters and updates may be required. I've seen (and created myself) many ways of doing this over the years. For example RoR has the concept of migrations. However, everything I've done so far seems a bit hairy to maintain or has other shortcomings.
In a magical world I'd be able to specify the desired schema definition, and have something automatically sort out what alters, updates, etc. are needed to move from the existing database layout...
What modern methodologies/practices/patterns exist for rolling out table definition changes with software updates? Do any MySql specific tools/scripts/commands exist for this kind of thing?

Have you looked into flyway or dbdeploy ? Flyway is Java specific, but I believe works with any DB, dbdeploy supports more languages, and again multiple databases.

Related

How to keep two databases with different schemas up-to-date

Our company has really old legacy system with such a bad database design (no foreign keys, columns with serialized PHP arrays, etc. :(). We decided to rewrite a system from a scratch with new database schema.
We want to rewrite a system by parts. So we will split old monolithic application to many smaller ones.
Problem is: we want to have live data in two databases. Old and New schema.
I'd like to ask you if anyone of you knows best practices how to do this.
What we think of:
asynchronous data synchronization with message queue
make a REST API in new system and make legacy application to use
it instead of db calls
some kind of table replication
Thank you very much
I had to deal with a similar problem in the past. There was a system which didn't have support but there was people using it because, It had some features (security holes) which allowed them certain functionalities. However, they also needed new functionalities.
I selected the tables which involved the new system and I created some triggers for cross update the tables, so when I created a register on the old system the trigger created a copy in the new system and reversal. If you design this system properly you would have both systems working at the same time in real time.
The drawback is that while the both system are running the system is going to become slower since you have to maintain the integrity of two databases in every operation.
I would start by adding a database layer to accept API calls from the business layer, then write to both the old schema and the new. This adds complexity up front, but it lets you guarantee that the data stays in sync.
This would require changing the legacy system to call an API instead of issuing SQL statements. If they did not have the foresight to do that originally, you may not be able to take my approach. But, you should do it going forward.
Triggers may or may not work out. In older versions of MySQL, there can be only one trigger of a given type on a given table. This forces you to lump unrelated things into a single trigger.
Replication can solve some changes -- Engine, datatypes, etc. But it cannot help with splitting one table into two. Be careful of the replication of Triggers and where the Trigger has effect (between Master and Slave). In general, a stored routine should be performed on the Master, letting the effect be replicated to the slave. But it may be worth considering how to have the trigger run in the Slave instead. Or different triggers in the two servers.
Another thought is to do the transformation in stages. By careful planning of schema changes versus application of triggers versus code changes versus database layer, you can do partial transformations one at a time, sometimes without having a big outage to update everything simultaneously (with your fingers crossed). A simple example: (1) change code to dynamically handle either new or old schema, (2) change the schema, (3) clean up the code (remove handling of old schema).
Doing a database migration may be a tedious task considering the complexity of data and structure of the tables which is of-course with out any constraints or a proper design. But given that your legacy application was doing its job - the amount of corrupt usable data will be minimal.
For the said problem I would suggest a db migration task which would convert all the old legacy data into the new form. And develop the new application. The advantages being.
1) There is not need to keep 2 different applications.
2) No need to change the code in the legacy application - which can become messy.
3) DB migration will give us a chance to correct any corrupt data (if needed).
DB migration may not be practical under all scenarios but if you can do it in lesser effort than making the changes for database sync, new api's for legacy application - I would suggest to go for it.

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

PostgreSQL to MySQL Conversion (Ruby on Rails)

Is there any Ruby script for converting a PostgreSQL database to a MySQL database? I have searched many sites to no avail.
To be honest these migrations can be tricky. I don't know that there are any good tools to do it. Also note that this can be a major pain, and you end up giving up on a lot of nice features that PostgreSQL has for agile development (like transactional DDL). This being said, here's the way to go about it:
Rebuild your schema on MySQL. Do not try to convert schema files per se. Use your existing approaches to generate a new schema using MySQL's syntax.
Write a script which pulls data from PostgreSQL and inserts it one row at a time into MySQL. MySQL has some thread locking problems that interfere with bulk loads, updating indexes, etc,. where multiple rows are inserted per statement. For deciding the table order, what I have usually started with is the order the tables are listed in pg_dump, though in Rails you may be able to use your model definition instead.
Review your indexing strategies to make sure they are still applicable.
On the whole these dbs are very different. I would not expect that the migration will be easy.

How to replicate two different database systems?

I'm not sure, if it fits exactly stackoverflow, however as i'm seeking for some code rather than a tool, i think it does.
I'm looking for a way of how to replicate / synchronize different database systems -- in this case: mysql and mongodb. We are running both for different purpose. We started with a mysql database and added mongodb later on for special applications. There's data we would like to have in both databases, where we want to have constraints in mysql respectivly dbrefs in mongodb. For example: We need a user-record in mysql, but also in mongodb for references between tables respectivly objects. At the moment we have a cronjob, which dumps the mysql data and imports it in mongodb. However though it works quite well, that's not the solution we would like to have.
I think for the moment a one-way replication would be enough -- mysql->mongodb, the important part is, that the replication works in "realtime", much like a mysql master->slave replication works.
Are there already any solutions for this problem or ideas anyone of how to achieve this?
Thanks!
SymmetricDS is open source, Java-based, web-enabled, database independent, data synchronization/replication software that might do the trick with a few tweaks. It has an extension point called IDataLoaderFilter which you could use to implement a MongodbDataLoader.
This would help with one way database replication. It might be a little more difficult to synchronized from MongoDb -> relational database, but the SymmetricDS team would be very helpful in trying to find the solution.
What you're looking for is called EAI (Enterprise application integration). There are a lot of commercial tools around but under the provided link, you'll also find a couple OSS solutions. The basis of EAI is that you have data sources and data sinks. The EAI framework offers tools to build custom pumps between the two.
I suggest to either use a DB trigger to start the synchronization or send a trigger signal in your applications. Note that there is no key-hole solution since synchronization can become arbitrarily complex (for example, how do you make sure that all rows are copied?).
As far as I see you need to develop some sort of "Control program" that has the drivers for each DBMS and run it as a daemon. The daemon should have a trigger or a very small recheck interval to keep the DBs synchronized
Technically, you could set up a process which parses the binary log of the MySQL server and replicate the relevant sql queries. I've never done such a thing with a a different database as a slave, but maybe it is worth a shot?

How do I migrate easily from MySQL to PostgreSQL?

I'd like to migrate an existing MySQL database (around 40tables, 400mb data) to Postgres before it gets bigger. I searched the web and tried some migration-scripts (some of them can be found here). None of them works seamlessly - if it would be just a few glitches I had to fix manually, it wouldn't be a problem, but the resulting dumps don't look like valid PostgreSQL at all.
Did anybody succeed in migrating a production table without using a full workday - is there an easy solution to that problem?
Note: I also would consider commercial products (as long as pricing is still feasible).
In spite of SQL being a standard, it's not full featured enough to do without each server software implementing extensions. The translation from MySQL to PostgreSQL is not simple, unless your schema is trivial. Automated translation scripts will only get you so far.
The very best approach would be to hand translate the schema, and then write your own transfer scripts for the data itself. You should also write verification scripts to make sure the schema and data come over correctly.
This isn't a cop-out answer. If your database is important enough to migrate then it's important enough to spend some time on yourself. In the end you would spend at least as much time figuring out the quirks and subtle messes than an automated migration script would cause as in the time to migrate the data yourself. But doing it yourself you have the chance to take advantage of features in PostgreSQL that aren't present in MySQL, as well as the chance to make the kinds of improvements that only come from having the chance to do something a second time.
Bite the bullet and do it.