Is it possible to have Alembic auto-generate migrations without it having access to the database?
For example django / south are able to do this by comparing the current version of a Model against a previous snapshot of the Model.
No, this isn't possible. In the relevant issue zzzeek said
while the reflection-based comparison has its issues, it really is a very fundamental assumption these days particularly in the openstack world where autogen features are used in unit test suites to ensure the migrated schema matches the model. I don't have plans right now to pursue the datafile-based approach, it would be an enormous undertaking for a system that people seem to be mostly OK with as is.
Though an alternative approach could be to spin up a new database on demand, run the migrations from empty to head, generate against it, then discard the database.
Related
Our company has really old legacy system with such a bad database design (no foreign keys, columns with serialized PHP arrays, etc. :(). We decided to rewrite a system from a scratch with new database schema.
We want to rewrite a system by parts. So we will split old monolithic application to many smaller ones.
Problem is: we want to have live data in two databases. Old and New schema.
I'd like to ask you if anyone of you knows best practices how to do this.
What we think of:
asynchronous data synchronization with message queue
make a REST API in new system and make legacy application to use
it instead of db calls
some kind of table replication
Thank you very much
I had to deal with a similar problem in the past. There was a system which didn't have support but there was people using it because, It had some features (security holes) which allowed them certain functionalities. However, they also needed new functionalities.
I selected the tables which involved the new system and I created some triggers for cross update the tables, so when I created a register on the old system the trigger created a copy in the new system and reversal. If you design this system properly you would have both systems working at the same time in real time.
The drawback is that while the both system are running the system is going to become slower since you have to maintain the integrity of two databases in every operation.
I would start by adding a database layer to accept API calls from the business layer, then write to both the old schema and the new. This adds complexity up front, but it lets you guarantee that the data stays in sync.
This would require changing the legacy system to call an API instead of issuing SQL statements. If they did not have the foresight to do that originally, you may not be able to take my approach. But, you should do it going forward.
Triggers may or may not work out. In older versions of MySQL, there can be only one trigger of a given type on a given table. This forces you to lump unrelated things into a single trigger.
Replication can solve some changes -- Engine, datatypes, etc. But it cannot help with splitting one table into two. Be careful of the replication of Triggers and where the Trigger has effect (between Master and Slave). In general, a stored routine should be performed on the Master, letting the effect be replicated to the slave. But it may be worth considering how to have the trigger run in the Slave instead. Or different triggers in the two servers.
Another thought is to do the transformation in stages. By careful planning of schema changes versus application of triggers versus code changes versus database layer, you can do partial transformations one at a time, sometimes without having a big outage to update everything simultaneously (with your fingers crossed). A simple example: (1) change code to dynamically handle either new or old schema, (2) change the schema, (3) clean up the code (remove handling of old schema).
Doing a database migration may be a tedious task considering the complexity of data and structure of the tables which is of-course with out any constraints or a proper design. But given that your legacy application was doing its job - the amount of corrupt usable data will be minimal.
For the said problem I would suggest a db migration task which would convert all the old legacy data into the new form. And develop the new application. The advantages being.
1) There is not need to keep 2 different applications.
2) No need to change the code in the legacy application - which can become messy.
3) DB migration will give us a chance to correct any corrupt data (if needed).
DB migration may not be practical under all scenarios but if you can do it in lesser effort than making the changes for database sync, new api's for legacy application - I would suggest to go for it.
Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.
i must design a system which unifies 4 applications, these applications share a lot of information (which at the current system, the information is duplicate in databases).
My first idea was to use a distributed database system in order to avoid all these duplications and the manual synchronization among the systems, the think is that almost everything needs implementation from the beginning (since the database is the heart of these systems) so i don't know if the time/money/implementation combination is the best solution or not.
The technologies that i have in my mind to use:
MySQL Federated Engine to achieve the distribution in databases
CakePHP: 2/4 applications are in CakePHP so i will keep it the same language.
Python: 1 application is in python
Java: 1 application is in java
Will i have any problem with the above languages and Database engine ?
Any ideas, suggestions ?
Any feedback will be appreciated!
You design from the top down. You build from the bottom up.
Databases are the bottom layer. It's the last stage of design, and the first stage of construction. Data modeling, database design, and database administration are fundamental to good data management. And without good data management, the rest of the project is doomed. While the database is going to be what you build first, you need to have a clear idea of what you are going to do with the data. Look at the needs from the top down. you need to do this before you select particular technologies. You may have done this, but just didn't mention it in yuor question.
Unofortunately, databases designed with a narrow scope in mind seems to be the rule today, rather than the exception. Integrating disjoint databases into a coherent unified database (whether it's distributed or not) is far from a trivial task. There will be trivial differences in such things as naming and composition, and non trivial differences in the conceptual data model.
Good luck!
A common occurrence when rolling out the next version of a software package is that some of the data structures change. When you are using a Sql database, an appropriate series of alters and updates may be required. I've seen (and created myself) many ways of doing this over the years. For example RoR has the concept of migrations. However, everything I've done so far seems a bit hairy to maintain or has other shortcomings.
In a magical world I'd be able to specify the desired schema definition, and have something automatically sort out what alters, updates, etc. are needed to move from the existing database layout...
What modern methodologies/practices/patterns exist for rolling out table definition changes with software updates? Do any MySql specific tools/scripts/commands exist for this kind of thing?
Have you looked into flyway or dbdeploy ? Flyway is Java specific, but I believe works with any DB, dbdeploy supports more languages, and again multiple databases.
I used to build Ruby on Rails apps with MySQL.
MongoDB currently become more and more famous and I am now starting to give it a try.
The problem is, I don't know the underlying theory of how MongoDB is working (am using mongoid gem if it matter)
So I would like to have a comparison on the performance between using MySQL+ActiveRecord and model generated by mongoid gem, could anyone help me to figure it out?
The article entitled: What the heck are you actually using NoSQL for? does a very good job at presenting the pros and cons of using NoSQL.
Edit: Also read http://blog.fatalmind.com/2011/05/13/choosing-nosql-for-the-right-reason/ blog post too
Re-edit: I found some recent material (published in 2014) on this topic that I consider to be relevant: What’s left of NoSQL?
I don't know much of the underlying theory. But this is the advice I got: only use MongoDB if you run it across multiple servers; that's when it'll shine. As far as I understand, the NoSQL movement appeared in no small part due to the pain of load-balancing relational databases across multiple servers. So if you're hosting your application on no more than one server, MySQL would be the preferred choice.
The good people over at the Doctrine project recently wrote a quite useful blog post on the subject.
From what I have read so far... here is my take on it.
Standard SQL trades lower performance for feature richness... i.e. it allows you to do Joins and Transactions across data sets (tables/collections if you will) among other things.
This allows a application developer to push some of the application complexity into the database layer. This has it's advantages of not having to worry about data integrity and the rest of the ACID properties by the application by depending upon proven technology.
The lack of extreme scalability works for pretty much all projects as long as one can manage to keep the application working within expected time limits, which may sometimes result in having to purchase high performance/expensive relational database systems.
On the other hand, Mongo DB, deliberately excludes much of the inherent complexity associated with relational databases, there by allowing for better scalable performance.
This approach forces the application developer to re-architect the application to work around the lack of relational features... which in and itself is a good thing, but the effort involved is generally only worth it if you have the scalability requirements. Please note that with MongoDB depending upon the data requirements w.r.t ACID properties, the application will have to step up and handle as necessary.