Issue with seed data in Alembic - sqlalchemy

I have been using SQLAlchemy for year now. And I use Alembic for migrating changes.
I am also using alembic to seed data. Take an example of the authorization seeds.
def upgrade():
op.bulk_insert(Role.__table__, ROLE_LIST)
op.bulk_insert(Permission.__table__, PERMISSION_LIST)
op.bulk_insert(RolePermission.__table__, ROLE_PERMISSION_LIST)
def downgrade():
[op.execute(a_table.__table__.delete()) for a_table in [RolePermission, Permission, Role]]
This works perfect if we assume there are going to be no changes in my data set. (permissions, roles, etc in my case)
ROLE_LIST = [
dict(id=generate_business_id(), name='customer', default=1),
dict(id=generate_business_id(), name='admin', default=0),
]
PERMISSION_LIST = [
dict(id=generate_business_id(), name='profile'),
dict(id=generate_business_id(), name='delete_user'),
dict(id=generate_business_id(), name='make_payment'),
]
But it's not the case. I am gonna have new roles and permissions and role_permission mappings. And alembic upgrade head will not migrate new data because my head has gone far ahead of the seed migration.
I could have the data in the respective revision only, but I am also using ROLE_LIST and PERMISSION_LIST in my business logic.
Is there any way I can do replace query in migrations? Is there any better option?

Related

SQLModel in Fastapi using __table_args__ is unable to create tables for unit testing [duplicate]

I have a Pylons project and a SQLAlchemy model that implements schema qualified tables:
class Hockey(Base):
__tablename__ = "hockey"
__table_args__ = {'schema':'winter'}
hockey_id = sa.Column(sa.types.Integer, sa.Sequence('score_id_seq', optional=True), primary_key=True)
baseball_id = sa.Column(sa.types.Integer, sa.ForeignKey('summer.baseball.baseball_id'))
This code works great with Postgresql but fails when using SQLite on table and foreign key names (due to SQLite's lack of schema support)
sqlalchemy.exc.OperationalError: (OperationalError) unknown database "winter" 'PRAGMA "winter".table_info("hockey")' ()
I'd like to continue using SQLite for dev and testing.
Is there a way of have this fail gracefully on SQLite?
I'd like to continue using SQLite for
dev and testing.
Is there a way of have this fail
gracefully on SQLite?
It's hard to know where to start with that kind of question. So . . .
Stop it. Just stop it.
There are some developers who don't have the luxury of developing on their target platform. Their life is a hard one--moving code (and sometimes compilers) from one environment to the other, debugging twice (sometimes having to debug remotely on the target platform), gradually coming to an awareness that the gnawing in their gut is actually the start of an ulcer.
Install PostgreSQL.
When you can use the same database environment for development, testing, and deployment, you should.
Not to mention the QA team. Why on earth are they testing stuff they're not going to ship? If you're deploying on PostgreSQL, assure the quality of your work on PostgreSQL.
Seriously.
I'm not sure if this works with foreign keys, but someone could try to use SQLAlchemy's Multi-Tenancy Schema Translation for Table objects. It worked for me but I have used custom primaryjoin and secondaryjoinexpressions in combination with composite primary keys.
The schema translation map can be passed directly to the engine creator:
...
if dialect == "sqlite":
url = lambda: "sqlite:///:memory:"
execution_options={"schema_translate_map": {"winter": None, "summer": None}}
else:
url = lambda: f"postgresql://{user}:{pass}#{host}:{port}/{name}"
execution_options=None
engine = create_engine(url(), execution_options=execution_options)
...
Here is the doc for create_engine. There is a another question on so which might be related in that regard.
But one might get colliding table names all schema names are mapped to None.
I'm just a beginner myself, and I haven't used Pylons, but...
I notice that you are combining the table and the associated class together. How about if you separate them?
import sqlalchemy as sa
meta = sa.MetaData('sqlite:///tutorial.sqlite')
schema = None
hockey_table = sa.Table('hockey', meta,
sa.Column('score_id', sa.types.Integer, sa.Sequence('score_id_seq', optional=True), primary_key=True),
sa.Column('baseball_id', sa.types.Integer, sa.ForeignKey('summer.baseball.baseball_id')),
schema = schema,
)
meta.create_all()
Then you could create a separate
class Hockey(Object):
...
and
mapper(Hockey, hockey_table)
Then just set schema above = None everywhere if you are using sqlite, and the value(s) you want otherwise.
You don't have a working example, so the example above isn't a working one either. However, as other people have pointed out, trying to maintain portability across databases is in the end a losing game. I'd add a +1 to the people suggesting you just use PostgreSQL everywhere.
HTH, Regards.
I know this is a 10+ year old question, but I ran into the same problem recently: Postgres in production and sqlite in development.
The solution was to register an event listener for when the engine calls the "connect" method.
#sqlalchemy.event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
dbapi_connection.execute('ATTACH "your_data_base_name.db" AS "schema_name"')
Using ATTACH statement only once will not work, because it affects only a single connection. This is why we need the event listener, to make the ATTACH statement over all connections.

Data Migration approaches with Alembic and SQLAlchemy

I've been fiddling with SQLAlchemy + Alembic for around 2 days and need some guidance. I know Alembic is for schema migrations and the autogenrate stuff is awesome but my question lies now with how do data migrations work?
An example: I create a table and I want to insert data into that table in an automated way (upgrade & downgrade). Ideally when someone updates to the latest migration all this is done for the developer.
From my research I can either:
Write normal code that updates the tables, insert, delete data but I loose automation efforts here, i.e. the developer would have to run a script that inserts the data, there isn't really an option to do rollbacks etc. The use of ORM stuff is nice and cleaner than operations though. So something like this:
def create_users():
with session_maker() as session:
for user in users:
session.add(user)
session.commit()
I can make use Alembic operations like bulk_insert, batch_alter_table etc. With this approach the developer would just need to run the latest migrations but the use of Alembic operations seems clunky, the functions themselves aren't as shiny as an ORM approach.
op.bulk_insert(accounts_table,
[
{'id':1, 'name':'John Smith',
'create_date':date(2010, 10, 5)},
{'id':2, 'name':'Ed Williams',
'create_date':date(2007, 5, 27)},
{'id':3, 'name':'Wendy Jones',
'create_date':date(2008, 8, 15)},
]
)
Have a hybrid approach and use ORM stuff inside of a migration? So something like this:
session_maker = sessionmaker(bind=create_engine(db_url))
business_units = [
BusinessUnits(name="Marketing"),
BusinessUnits(name="Sales"),
BusinessUnits(name="Software"),
]
with session_maker() as session:
for unit in business_units:
session.add(unit)
session.commit()
def upgrade() -> None:
create_business_units()
# ### commands auto generated by Alembic - please adjust! ###
pass
# ### end Alembic commands ###
def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
pass
# ### end Alembic commands ###```
Questions:
Is there something wrong with the 3rd approach because why would option 2 exist? Approach 3 seems to be the best of both worlds.
What is the best approach 1,2,3 or is there no best approach and it varies from my to place? I understand this is subjective but there must be some sort of measuring stick used to determine one approach over the other?
Please bear in mind I'm new to this and I know this is subjective but ya... just need some guidance and perspective. Not too sure where else to post / ask this question.

SQLAlchemy automap: Best practices for performance

I'm building a python app around an existing (mysql) database and am using automap to infer tables and relationships:
base = automap_base()
self.engine = create_engine(
'mysql://%s:%s#%s/%s?charset=utf8mb4' % (
config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
), echo=False
)
# reflect the tables
base.prepare(self.engine, reflect=True)
self.TableName = base.classes.table_name
Using this I can do things like session.query(TableName) etc...
However, I'm worried about performance, because every time the app runs it will do the whole inference again.
Is this a legitimate concern?
If so, is there a possibility to 'cache' the output of Automap?
I think that "reflecting" the structure of your database is not the way to go. Unless your app tries to "infer" things from the structure, like static code analysis would for source files, then it is unnecessary. The other reason for reflecting it at run-time would be the reduced time to begin "using" the database using SQLAlchemy. However:
Another option would be to use something like SQLACodegen (https://pypi.python.org/pypi/sqlacodegen):
It will "reflect" your database once and create a 99.5% accurate set of declarative SQLAlchemy models for you to work with. However, this does require that you keep the model subsequently in-sync with the structure of the database. I would assume that this is not a big concern seeing as the tables you're already-working with are stable-enough such that run-time reflection of their structure does not impact your program much.
Generating the declarative models is essentially a "cache" of the reflection. It's just that SQLACodegen saved it into a very readable set of classes + fields instead of data in-memory. Even with a changing structure, and my own "changes" to the generated declarative models, I still use SQLACodegen later-on in a project whenever I make database changes. It means that my models are generally consistent amongst one-another and that I don't have things such as typos and data-mismatches due to copy-pasting.
Performance can be a legitimate concern. If the database schema is not changing, it can be time consuming to reflect the database every time a script is run. This is more of an issue during development, not starting up a long running application. It's also a significant time saver if your database is on a remote server (again, particularly during development).
I use code that is similar to the answer here (as noted by #ACV). The general plan is to perform the reflection the first time, then pickle the metadata object. The next time the script is run, it will look for the pickle file and use that. The file can be anywhere, but I place mine in ~/.sqlalchemy_cache. This is an example based on your code.
import os
from sqlalchemy.ext.declarative import declarative_base
self.engine = create_engine(
'mysql://%s:%s#%s/%s?charset=utf8mb4' % (
config.DB_USER, config.DB_PASSWD, config.DB_HOST, config.DB_NAME
), echo=False
)
metadata_pickle_filename = "mydb_metadata"
cache_path = os.path.join(os.path.expanduser("~"), ".sqlalchemy_cache")
cached_metadata = None
if os.path.exists(cache_path):
try:
with open(os.path.join(cache_path, metadata_pickle_filename), 'rb') as cache_file:
cached_metadata = pickle.load(file=cache_file)
except IOError:
# cache file not found - no problem, reflect as usual
pass
if cached_metadata:
base = declarative_base(bind=self.engine, metadata=cached_metadata)
else:
base = automap_base()
base.prepare(self.engine, reflect=True) # reflect the tables
# save the metadata for future runs
try:
if not os.path.exists(cache_path):
os.makedirs(cache_path)
# make sure to open in binary mode - we're writing bytes, not str
with open(os.path.join(cache_path, metadata_pickle_filename), 'wb') as cache_file:
pickle.dump(Base.metadata, cache_file)
except:
# couldn't write the file for some reason
pass
self.TableName = base.classes.table_name
For anyone using declarative table class definitions, assuming a Base object defined as e.g.
Base = declarative_base(bind=engine)
metadata_pickle_filename = "ModelClasses_trilliandb_trillian.pickle"
# ------------------------------------------
# Load the cached metadata if it's available
# ------------------------------------------
# NOTE: delete the cached file if the database schema changes!!
cache_path = os.path.join(os.path.expanduser("~"), ".sqlalchemy_cache")
cached_metadata = None
if os.path.exists(cache_path):
try:
with open(os.path.join(cache_path, metadata_pickle_filename), 'rb') as cache_file:
cached_metadata = pickle.load(file=cache_file)
except IOError:
# cache file not found - no problem
pass
# ------------------------------------------
# define all tables
#
class MyTable(Base):
if cached_metadata:
__table__ = cached_metadata.tables['my_schema.my_table']
else:
__tablename__ = 'my_table'
__table_args__ = {'autoload':True, 'schema':'my_schema'}
...
# ----------------------------------------
# If no cached metadata was found, save it
# ----------------------------------------
if cached_metadata is None:
# cache the metadata for future loading
# - MUST DELETE IF THE DATABASE SCHEMA HAS CHANGED
try:
if not os.path.exists(cache_path):
os.makedirs(cache_path)
# make sure to open in binary mode - we're writing bytes, not str
with open(os.path.join(cache_path, metadata_pickle_filename), 'wb') as cache_file:
pickle.dump(Base.metadata, cache_file)
except:
# couldn't write the file for some reason
pass
Important Note!! If the database schema changes, you must delete the cached file to force the code to autoload and create a new cache. If you don't, the changes will be be reflected in the code. It's an easy thing to forget.
The answer to your first question is largely subjective. You are adding database queries to fetch the reflection metadata to the application load time. Whether or not that overhead is significant depends on your project requirements.
For reference, I have an internal tool at work that uses a reflection pattern because the the load-time is acceptable for our team. That might not be the case if it were an externally-facing product. My hunch is that for most applications the reflection overhead will not dominate the total application load time.
If you decide it is significant for your purposes, this question has an interesting answer where the user pickles the database metadata in order to locally cache it.
Adding to this, what #Demitri answered is close to correct but (at least in sqlalchemy 1.4.29), the example will fail on the last line self.TableName = base.classes.table_name when generating from the cached file. In this case declarative_base() has no attribute classes.
To fix is as simple as altering:
if cached_metadata:
base = declarative_base(bind=self.engine, metadata=cached_metadata)
else:
base = automap_base()
base.prepare(self.engine, reflect=True) # reflect the tables
to
if cached_metadata:
base = automap_base(declarative_base(bind=self.engine, metadata=cached_metadata))
base.prepare()
else:
base = automap_base()
base.prepare(self.engine, reflect=True) # reflect the tables
This will create your automap object with the proper attributes.

Handling multiple models.py with alembic

Our web app is based on sqlalchemy in pyramid framework and we are looking to use alembic for managing database migrations. The web application consists of various packages that operate on one database. This consequently means we have multiple models.py that need to be migrated. I am confused as to how to handle this. I could progress some far using the following in my env.py
from pkg_a.app.models import Base as pkg_a_base
from pkg_b.app.models import Base as pkg_b_base
from pkg_c.app.models import Base as pkg_c_base
def combine_metadata(*args):
m = MetaData()
for metadata in args:
for t in metadata.tables.values():
t.tometadata(m)
return m
target_metadata = combine_metadata(pkg_a_base, pkg_b_base, pkg_c_base)
This works great the first time. However, if I add one more model later, just adding that to this list doesn't do much. I was expecting that running
alembic revision -m "added a new model pkg_d.models" --version-path=migrations/versions --autogenerate
would generate a new version file that would have the code for adding the tables from pkg_d.models. But it isn't so. What am I doing wrong here.
If your packages are completely independent and separate then each of them should have a separate migration history - either stored inside each package (pkg_a.migrations, pkg_b.migrations etc.) or at least stored in a separate top-level migrations directory via having a separate section in alembic.ini and using -n parameter to alembic command to specify which section to use:
[pkg_a]
# path to migration scripts
script_location = migrations_a
sqlalchemy.url = xxx
[pkg_b]
script_location = migrations_b
sqlalchemy.url = xxx
[pkg_c]
script_location = migrations_c
sqlalchemy.url = xxx
And then you'll be able to use alembic revision -n pkg_a -m "added a new model pkg_a.models"
If, however, your models are dependent in any way then they should use a common Base - you do realize you don't have to keep all your SQLAlchemy stuff in a single models.py file, don't you? I would create a separate "base" package which would contain a common MetaData, Base and other SQLAlchemy configuration stuff which would then be imported by your other packages.

What does an Alembic revision ID represent?

I have just started looking at Alembic, and coming from Django, where we have South to migrate our database schemas (which is soon to be included) which uses a friendly old fixed-width number like 0037_fix_my_schema.py to talk about the order in which migrations are to be applied, I am naturally intrigued by Alembic's revision ID. Is there a DAG backing Alembic, or can someone give a little overview of its internals in this respect?
I took a look myself. The source says:
def rev_id():
val = int(uuid.uuid4()) % 100000000000000
return hex(val)[2:-1]
Not so fascinating.