My application works with huge database contents for which we can't always migrate to the latest database schema version we designed at software upgrade time. And yes, we're using database migration tools (Alembic), but this doesn't yet allow us to have Python application code that can handle multiple schema versions. At some point in time when the system downtime is accepted, a migration to the latest version will be performed, but in the meantime the application code is required to be able to handle both (multiple) versions.
So, for example, we can offer Feature X only if the database migration has been performed. It should also be able to function if the migration hasn't been performed yet, but then doesn't offer Feature X with a warning printed in the log. I see several ways of doing this with SQLAlchemy, but they all feel hackish ugly. I'd like to get some advice on how to handle this properly.
Example:
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(MyCustomType, nullable=False, primary_key=True)
column_a = Column(Integer, nullable=True)
column_b = Column(String(32)) # old schema
column_b_new = Column(String(64)) # new schema
New schema version has a new column replacing the old one. Note that both the column name and column data specification change.
Another requirement is that the use of this class from other parts of the code must be transparent to support backwards compatibility. Other components of the product will get awareness of the new column/datatype later. This means that if initialized with the new schema, the old attribute still has to be functional. So, if a new object is created with Mytable(column_a=123, column_b="abc"), it should work with both the new and old schema.
What would be the best way to move from here? Options to support two schemas I see:
Define two MyTable classes for both schema versions, then determine the schema version (how?) and based on the result, use either version. With this approach I think this requires the logic for which schema to use in every place the MyTable class is used, and therefore breaks easily. Link the class attributes to each other (column_b = column_b_new) for backward compatibility (does that actually work?).
Initialize the database normally and alter the MyTable class object based on the schema version detected. I'm not sure whether SQLAlchemy support changing the attributes (columns) of a declarative base class after initialization.
Create a custom Mapper configuration as desribed here: http://docs.sqlalchemy.org/en/rel_1_1/orm/extensions/declarative/basic_use.html#mapper-configuration I'm not sure how to get from this SQLAlchemy feature to my desired solution. Perhaps a custom attribute set dynamically can be checked in a custom mapper function?
Related
I use sqlalchemy 1.4 and alembic for migrations.
Previously, my column type looked like this:
has_bubble_in_countries = sa.Column((ARRAY(sa.Enum(Country))), nullable=False, default=[])
Which was not allowing me to add or remove any elements from this array.
Then, I made it immutable by changing it like this:
has_bubble_in_countries = sa.Column(MutableList.as_mutable(ARRAY(sa.Enum(Country))), nullable=False, default=[])
Does this change require a migration? If so, what is the alembic's property in order to detect this type of change?
My first thought was that this is not an altering of the type so I considered that migration is not needed.
Found the answer. The change from a mutable sqlalchemy ARRAY type to an immutable ARRAY is just a Python behavioral change and not a change in the schema of the database, so no migration is needed.
The answer to this is "it depends".
If you are using a database that supports the ALTER COLUMN TYPE command, then you can change the type of a column without needing to create a new column and copy the data over. However, if you are using a database that does not support that command, then you will need to create a new column and copy the data over.
If you are using a database that does not support ALTER COLUMN TYPE, then you can use the postgresql_alter_column_type operation in Alembic to do this. This will create a new column, copy the data over, and then drop the old column. You can read more about it here:
https://alembic.sqlalchemy.org/en/latest/ops.html#alembic.operations.Operations.postgresql_alter_column_type
I'm using Knex, because I'm working on an application that I would like to use with multiple database servers, currently Sqlite3, Postgres and MySQL.
I'm realizing that this might be more difficult that I expected.
On MySQL, it appears that this syntax will return an array with an id:
knex('table').insert({ field: 'value'}, 'id');
On postgres I need something like this:
knex('table').insert({ field: 'value'}, 'id').returning(['id']);
In each case, the structure they return is different. The latter doesn't break MySQL, but on SQlite it will throw a fatal error.
The concept of 'insert a record, get an id' seems to exist everywhere though. What am I missing in Knex that lets me write this once and use everywhere?
Way back in 2007, I implemented the database access class for a PHP framework. It was to support MySQL, PostgreSQL, SQLite, Microsoft SQL Server, Oracle, and IBM DB2.
When it came time to support auto-incremented columns, I discovered that all of these implement that feature differently. Some have SERIAL, some have AUTO-INCREMENT (or AUTOINCREMENT), some have SEQUENCE, some have GENERATED, some support multiple solutions.
The solution was to not try to write one implementation that worked with all of them. I wrote classes using the Adapter Pattern, one for each brand of SQL database, so I could implement each adapter class tailored to the features supported by the respective database. The adapter satisfied an interface that I defined in my framework, to allow the primary key column to be defined and the last inserted id to be fetched in a consistent manner. But the internal implementation varied.
This was the only sane way to develop that code, in my opinion. When it comes to variations of SQL implementations, it's a fallacy that one can develop "portable" code that works on multiple brands.
Using SQLAlchemy with flask_sqlalchemy and alembic for PostgreSQL. I have the following definition for a field:
date_updated = db.Column(db.DateTime, server_default=db.func.now(), server_onupdate=db.func.now())
However the field never updates when the record is modified. It is set on create and never updates. This is what is generated by alembic to create the table:
sa.Column('date_updated', sa.DateTime(), server_default=sa.text('now()'), nullable=True),
So it's no wonder that it's not being updated, since the server_onupdate param doesn't seem to be getting past alembic.
I'm not sure of the right way to do this. The SQLAlchemy documentation is frustratingly complex and unclear where this is concerned.
Edit: From looking at how to do this in PostgreSQL directly, it looks like it requires the use of triggers. I would prefer to do it at the DB level rather than at the application level if possible, but I don't know if I can add a trigger through SQLAlchemy. I can add it directly at the DB but that could break when migrations are applied.
The way you say "I'm not sure of the right way to do this", I'm not sure if you mean specifically updating the date on the server side, or just updating it in general.
If you just want to update it and it doesn't matter how, the cleanest way in my opinion is to use event listeners.
Here's an example using the normal sqlalchemy, it will probably be the same (or at least very similar) in flask-sqlalchemy:
from datetime import datetime
from sqlalchemy import event
#event.listens_for(YourModel, 'before_insert')
#event.listens_for(YourModel, 'before_update')
def date_insert(mapper, connection, target):
target.date_updated = datetime.utcnow()
I am using TypeORM with MySQL and am setting up automatic auditing of all columns and database tables via MySQL Triggers - not TypeORM's "Logger" feature (unless you have some extra info)...
Without getting bogged down, the MySQL Triggers approach works very well and means no app-side code is required.
The problem: I cannot provide MySQL queries with the logged in app user's ID in a way that does not require we apply it in every query created in this app. We do have a central "CRUD" class, but that is for generic CRUD, so our more "specialist" queries would require special treatment - undesired.
Each of our tables has an int field "editedBy" where we would like to update with the user ID who edited the row (by using our app).
Question: Is there a way to intercept all non-read queries in TypeORM (regardless if its active record or query builder) and be able to update a column in the affected tables ('editedBy' int field)?
This would allow our Triggers solution to be complete.
P.S. I tried out TypeORM's custom logging function:
... typeorm createConnection({ ....
logger: new MyCustomLogger()
... });
class MyCustomLogger { // 'extend' has issue - works without anyway: extends Logger {
logQuery(query, parameters, somethingelse) // WORKS
{ ... }
logQuery does appear to fire before the query (I think) is sent to MySQL, but I cannot find a way how to extract the "Json-like" javascript object out of this, to modify each table's "editedBy". It would be great if there was a way to find all tables within this function and adjust editedBy. Happy to try other options... that don't entail updating the many files we have containing database calls.
Thanks
IMHO it should not be correct to use the logging feature of TypeOrm to modify your queries, it is very dangerous even if it would work with a bit of effort.
If you want to manage the way the upsert queries are done in TypeOrm, the best practice is to use custom repositories and then always calling it (not spawning vanilla repositories aftewards like in entityManager.getRepository(Specialist), instead use yours with entityManager.getCustomRepository(SpecialistRepository)).
The official documentation on the subject should help you a lot: https://github.com/typeorm/typeorm/blob/master/docs/custom-repository.md
Then in your custom repository you can override the save method and add whatever you want. Your code will be explicit and a good advantage is that it does not apply to every entity so if you have other different cases when you want to save differently, you are not stuck (you can also add custom save methods).
If you want to generalize the processing of the save methods, you can create an abstract repository to extend TypeOrm repository that you can then extend with your custom repository, in it you can add your custom code so that you don't end up copying it in every custom repository.
SpecialistRepository<Specialist> -> CustomSaveRepository<T> -> Repository<T>
I used a combination of https://github.com/skonves/express-http-context node module to pass user ID to TypeORM's Event Subscribers feature to make the update to data about to be submitted to DB: https://github.com/typeorm/typeorm/blob/master/sample/sample5-subscribers/subscriber/EverythingSubscriber.ts
I know EF checks the EdmMetadata table to determine if the version of model classes is same as database tables.
I want to know exactly how EF can find if the version of a model has changed. In other words, I want to know what does EF compare to the modelhash in the database?
Have a look at this blog post about the EdmMetadata table.
For your question, this is the relevant parts:
The EdmMetadata table is a simple way for Code First to tell if the
model used to create a database is the same model that is now being
used to access the database. As of EF 4.1 the only thing stored in the
table is a single row containing a hash of the SSDL part of the model
used to create the database.
(Geek details: when you look in an EDMX file, the SSDL is the part of
that file that represents the database (store) schema. This means that
the EdmMetadata model hash only changes if the database schema that
would be generated changes; changes to the conceptual model (CSDL) or
the mapping between the conceptual model and the database (MSL) will
not affect the hash.)