SQLAlchemy ORM - map an object to dynamically created schema - sqlalchemy

I'm considering porting my app to the SQLAlchemy as it's much more extensive than my own ORM implementation, but all the examples I could find show how to set the schema name at class declaration rather than dynamically at runtime.
I need to map my objects to Postgres tables from multiple schemas. Moreover, the application creates new schemas in runtime and I need to map new instances of the class to rows of the table from that new schema.
Currently, I use my own ORM module where I just provide the schema name as an argument when creating new instances of a class (I call class' method with the schema name as an argument and it returns an object(s) that holds the schema name). The class describes a table that can exist in many schemas. The class declaration doesn't contain information about schema, but instances of that class do contain it and include it when generating SQL statements.
This way, the application can work with many schemas simultaneously and even create foreign keys in tables from "other" schemas to the "main" table in the public schema. In such a way it is also possible to delete data in other schemas cascaded when deleting the row in the public schema.
The SQLAlchemy gives this example to set the schema for the table (documentation):
metadata_obj = MetaData(schema="remote_banks")
financial_info = Table(
"financial_info",
metadata_obj,
Column("id", Integer, primary_key=True),
Column("value", String(100), nullable=False),
)
But on ORM level, when I declare the class, I should pass an already constructed table (example from documentation):
metadata = MetaData()
group_users = Table(
"group_users",
metadata,
Column("user_id", String(40), nullable=False),
Column("group_id", String(40), nullable=False),
UniqueConstraint("user_id", "group_id"),
)
class Base(DeclarativeBase):
pass
class GroupUsers(Base):
__table__ = group_users
__mapper_args__ = {"primary_key": [group_users.c.user_id, group_users.c.group_id]}
So, the question is: is it possible to map class instances to tables/rows from dynamically created database schemas (in runtime) in SQLAlchemy? The way of altering the connection to set the current schema is not acceptable to me. I want to work with all schemas simultaneously.
I'm free to use the newest SQLAlchemy 2.0 (currently in BETA release).

You can set the schema per table so I think you have to make a table and class per schema. Here is a made up example. I have no idea what the ramifications are of changing the mapper registry during runtime. Especially as I have done below, mid-transaction and what would happen with threadsafety. You could probably use a master schema list table in public and lock it or lock the same row across connections to syncronize the schema list and provide threadsafety when adding a schema. I'm suprised it works. Kind of cool.
import sys
from sqlalchemy import (
create_engine,
Integer,
MetaData,
Float,
event,
)
from sqlalchemy.schema import (
Column,
CreateSchema,
Table,
)
from sqlalchemy.orm import Session
from sqlalchemy.orm import registry
username, password, db = sys.argv[1:4]
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=True)
metadata = MetaData()
mapper_registry = registry()
def map_class_to_some_table(cls, table, entity_name, **mapper_kwargs):
newcls = type(entity_name, (cls,), {})
mapper_registry.map_imperatively(newcls, table, **mapper_kwargs)
return newcls
class Measurement(object):
pass
units = []
cls_for_unit = {}
tbl_for_unit = {}
def add_unit(unit, create_bind=None):
units.append(unit)
schema_name = f"unit_{unit}"
if create_bind:
create_bind.execute(CreateSchema(schema_name))
else:
event.listen(metadata, "before_create", CreateSchema(schema_name))
cols = [
Column("id", Integer, primary_key=True),
Column("value", Float, nullable=False),
]
# One table per schema.
tbl_for_unit[unit] = Table("measurement", metadata, *cols, schema=schema_name)
if create_bind:
tbl_for_unit[unit].create(create_bind)
# One class per schema.
cls_for_unit[unit] = map_class_to_some_table(
Measurement, tbl_for_unit[unit], Measurement.__name__ + f"_{unit}"
)
for unit in ["mm", "m"]:
add_unit(unit)
metadata.create_all(engine)
with Session(engine) as session, session.begin():
# Create a value for each unit (schema).
session.add_all([cls(value=i) for i, cls in enumerate(cls_for_unit.values())])
with Session(engine) as session, session.begin():
# Read back a value for each unit (schema).
print(
[
(unit, cls.__name__, cls, session.query(cls).first().value)
for (unit, cls) in cls_for_unit.items()
]
)
with Session(engine) as session, session.begin():
# Add another unit, add a value, flush and then read back.
add_unit("km", create_bind=session.bind)
session.add(cls_for_unit["km"](value=100.0))
session.flush()
print(session.query(cls_for_unit["km"]).first().value)
Output of last add_unit()
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine CREATE SCHEMA unit_km
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine [no key 0.00015s] {}
2022-12-16 08:16:13,447 INFO sqlalchemy.engine.Engine COMMIT
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine
CREATE TABLE unit_km.measurement (
id SERIAL NOT NULL,
value FLOAT NOT NULL,
PRIMARY KEY (id)
)

Ian Wilson posted a great answer to my question which I'm going to use.
About the same time I got an idea of how it can work and would like to post it here just as a very simple example. I think this is the same mechanism behind it as posted by Ian.
This example only "reads" an object from the schema that can be referenced at runtime.
from sqlalchemy import create_engine, Column, Integer, String, MetaData
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import sessionmaker
import psycopg
engine = create_engine(f"postgresql+psycopg://user:password#localhost:5432/My_DB", echo=True)
Session = sessionmaker(bind=engine)
session = Session()
class Base(DeclarativeBase):
pass
class A(object):
__tablename__ = "my_table"
id = Column("id", Integer, primary_key=True)
name = Column("name", String)
def __repr__(self):
return f"A: {self.id}, {self.name}"
metadata_obj = MetaData(schema="my_schema") # here we create new mapping
A1 = type("A1", (A, Base), {"metadata": metadata_obj}) # here we make a new subclass with desired mapping
data = session.query(A1).all()
print(data)
This info helped me to come to this solution:
https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName
"... SQLAlchemy mapping makes modifications to the mapped class, so it's not really feasible to have many mappers against the exact same class ..."
This means a separate class must be created in runtime for each schema

Related

SQLAchemy ORM MetaData when used with multiple engines keeps first representation of engine

I am using sqlalchemy orm and have multiple users being able to query my api. I keep all user engines stored separately and accessible only via JWT verification. I use their engine when booting up the api to form a dictionary of the following format:
{
"user1": {
"table1": {
"column1": table1.c.column1
}
}
}
and repeat this for every user in my database who I want to access the api. Code is as follows:
def build_translation_per_db(connections):
metadata=MetaData()
database_engines = {}
translation_per_db = {}
for connection in connections:
db_tag = connection['db_tag']
client = aws_connect_function()
secret = function_get_secret()
if type(secret) == list:
for db in secret['aliases']:
db_engine_object, db_connection_object=function_return_engine(secret=db, db_tag=db_tag)
database_engines[db] = {
'engine': db_engine_object
}
else:
db_engine_object, db_connection_object=function_return_engine(secret=secret, db_tag=db_tag)
database_engines[connection['name']] = {
'engine': db_engine_object
}
# at this point we should have all engines and connections
for database in database_engines.keys():
# name of the database
table_translation, column_translation_from_table = build_table_translation(database_engines[database]['engine'], metadata)
translation_per_db[database] = {
'table_translation': table_translation,
'column_translation_from_table': column_translation_from_table,
'engine': database_engines[database]['engine']
}
return translation_per_db
def build_table_translation(db_engine_object, metadata):
table_translation = {}
tables = db_engine_object.table_names()
# print(tables)
for table in tables:
table_translation[table] = Table(table, metadata, autoload=True, autoload_with=db_engine_object)
column_translation_from_table = {}
for table in table_translation.keys():
column_translation_from_table[table] = {}
for col in table_translation[table].c:
column_translation_from_table[table][col.name] = col
where the metadata is built before all engines have been acquired. This was resulting in an error where all engine Tables were following the first engine's schema (ie a column present in user2 table2 would not be snagged if user1 table2 did not have that column).
This problem was solved by building the metadata directly in the function build_table_translation rather than passing it in. While this it is good that the bug is resolved, I don't understand why the bug was present in the first place - clearly I missed something from sqlachemy's docs on MetaData. Would appreciate an explanation!
From a comment to the question:
which user will the metadata act on as it is placed in my above code?
Since build_translation_per_db() does metadata = MetaData() and then passes that object to each invocation of build_table_translation(), all tables will share the same MetaData instance and that instance will contain table information for all users/engines:
from pprint import pprint
from sqlalchemy import Column, Integer, MetaData, Table
def build_translation_per_db(connections):
# for demonstration purposes, connections is just a list of strings
metadata = MetaData()
return [build_table_translation(conn, metadata) for conn in connections]
def build_table_translation(db_engine_object, metadata):
# for demonstration purposes, db_engine_object is just a string
return Table(
f"{db_engine_object}_table",
metadata,
Column("id", Integer, primary_key=True, autoincrement=False),
)
conns = ["engine_1", "engine_2"]
table_1, table_2 = build_translation_per_db(conns)
# Do the tables share the same metadata object?
print(table_1.metadata == table_2.metadata) # True
# What does it contain?
pprint(table_1.metadata.tables)
"""
{'engine_1_table': Table('engine_1_table', MetaData(), Column('id', Integer(), table=<engine_1_table>, primary_key=True, nullable=False), schema=None),
'engine_2_table': Table('engine_2_table', MetaData(), Column('id', Integer(), table=<engine_2_table>, primary_key=True, nullable=False), schema=None)}
"""
If different users can have tables with the same name but different columns then those tables may represent the first user processed, or maybe the last user processed, or perhaps some crazy mish-mash of attributes, but in any case it's not something you want.

How to save Scrapy items from pipeline in MySQL table by order from another table (multiple tables)?

This is my first question in Stackoverflow ever. :P Everything work just fine, except a crawl order, I add a priority method but didn`t work correctly. Need to first write all author data, then all album and songs data and store to DB with this order. I want to query items in a MySql table by order from item in another one.
Database structure: https://i.postimg.cc/GhF4w32x/db.jpg
Example: first write all author items in Author table, and then order album items in Album table by authorId from Author table.
Github repository: https://github.com/markostalma/discogs/tree/master/discogs
P.S. I have a three item class for author, album and song parser.
Also I was tried to make a another flow of spider and put all in one item class, but with no success. Order was a same. :(
Sorry for my bad English.
You need to setup an item pipeline for this. I would suggest using SQL Alchemy to build the SQL item and connect to the DB. You're SQL Alchemy class will reflect all the table relationships you have in your DB schema. Let me show you. This is a working example of a similar pipeline that I have except you would setup your class on the SQLAlchemy to container the m2m or foreignkey relationships you need. You'll have to refer to their documentation [1] .
An even more pythonic way of doing this would be to keep your SQL Alchemy class and item names the same and do something like for k,v in item.items():
This way you can just loop the item and set what is there. Code is long and violates DRY for a purpose though.
# -*- coding: utf-8 -*-
from scrapy.exceptions import DropItem
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Boolean, Sequence, Date, Text
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
import datetime
DeclarativeBase = declarative_base()
def db_connect():
"""
This function connections to the database. Tables will automatically be created if they do not exist.
See __tablename__ under RateMds class
MySQL example: engine = create_engine('mysql://scott:tiger#localhost/foo')
"""
return create_engine('sqlite:///reviews.sqlite', echo=True)
class GoogleReviewItem(DeclarativeBase):
__tablename__ = 'google_review_item'
pk = Column('pk', String, primary_key=True)
query = Column('query', String(500))
entity_name = Column('entity_name', String(500))
user = Column('user', String(500))
review_score = Column('review_score', Integer)
description = Column('description', String(5000))
top_words = Column('top_words', String(10000), nullable=True)
bigrams = Column('bigrams', String(10000), nullable=True)
trigrams = Column('trigrams', String(10000), nullable=True)
google_average = Column('google_average', Integer)
total_reviews = Column('total_reviews', Integer)
review_date = Column('review_date', DateTime)
created_on = Column('created_on', DateTime, default=datetime.datetime.now)
engine = db_connect()
Session = sessionmaker(bind=engine)
def create_individual_table(engine):
# checks for tables existance and creates them if they do not already exist
DeclarativeBase.metadata.create_all(engine)
create_individual_table(engine)
session = Session()
def get_row_by_pk(pk, model):
review = session.query(model).get(pk)
return review
class GooglePipeline(object):
def process_item(self, item, spider):
review = get_row_by_pk(item['pk'], GoogleReviewItem)
if review is None:
googlesite = GoogleReviewItem(
query=item['query'],
google_title=item['google_title'],
review_score=item['review_score'],
review_count=item['review_count'],
website=item['website'],
website_type=item['website_type'],
top_words=item['top_words'],
bigrams=item['bigrams'],
trigrams=item['trigrams'],
text=item['text'],
date=item['date']
)
session.add(googlesite)
session.commit()
return item
else:
raise DropItem()
[1]: https://docs.sqlalchemy.org/en/13/core/constraints.html

Getting empty sqlite DB and "(sqlite3.OperationalError) no such table:..:" when trying to add item

I wrote a general dbhandler module that can entangle data containers and uploade them to a mySQL database and is independent of the DB structure. Now I want to add a default or the possibility to shove the data into a sqlite DB. Structure-wise this is related to this question. The package looks like this:
dbhandler\
dbhandler.py
models\
meta.py
default\
default_DB_map.py
default_DB.cfg
default.cfg is the config file that describes the database for the dbhandler script. default_DB_map.py contains a map for each table of the DB, which inherits from BASE:
from sqlalchemy import BigInteger, Column, Integer, String, Float, DateTime
from sqlalchemy import Date, Enum
from ..meta import BASE
class db_info(BASE):
__tablename__ = "info"
id = Column(Integer, primary_key=True)
name = Column(String)
project = Column(String)
manufacturer = Column(String)
...
class db_probe(BASE):
__tablename__ = "probe"
probeid = Column(Integer, primary_key=True)
id = Column(Integer)
paraX = Column(String)
...
In meta.py I initialize the declarative_base object:
from sqlalchemy.ext.declarative import declarative_base
BASE = declarative_base()
And eventually, I import BASE within the dbhandler.py and create the engine and session:
"DBHandler module"
...
import sqlalchemy
from sqlalchemy.orm import sessionmaker
from models import meta #pylint: disable=E0401
....
class DBHandler(object):
"""Database handling
Methods:
- get_dict: returns table row
- add_item: adds dict to DB table
- get_table_keys: gets list of all DB table keys
- get_values: returns all values of key in DB table
- check_for_value: checks if value is in DB table or not
- upload: uploads data container to DB
- get_dbt: returns DBTable object
"""
def __init__(self, db_cfg=None):
"""Load credentials, DB structure and name of DB map from cfg file,
create DB session. Create DBTable object to get table names of DB
from cfg file, import table classes and get name of primary keys.
Args:
- db_cfg (yaml) : contains infos about DB structure and location
of DB credentials.
Misc:
- cred = {"host" : "...",
"database" : "...",
"user" : "...",
"passwd" : "..."}
"""
...
db_cfg = self.load_cfg(db_cfg)
if db_cfg["engine"] == "sqlite":
engine = sqlalchemy.create_engine("sqlite:///mySQlite.db")
meta.BASE.metadata.create_all(engine)
session = sessionmaker(bind=engine)
self.session = session()
elif db_cfg["engine"] == "mysql+mysqlconnector":
cred = self.load_cred(db_cfg["credentials"])
engine = sqlalchemy.create_engine(db_cfg["engine"]
+ "://"
+ cred["user"] + ":"
+ cred["passwd"] + "#"
+ cred["host"] + ":"
+ "3306" + "/"
+ cred["database"])
session = sessionmaker(bind=engine)
self.session = session()
else:
self.log.warning("Unkown engine in DB cfg...")
# here I'm importing the table classes stated in the config file
self.dbt = DBTable(map_file=db_cfg["map"],
table_dict=db_cfg["tables"],
cr_dict=db_cfg["cross-reference"])
I'm obviously doing something wrong within the if db_cfg["engine"] == "sqlite": paragraph, but I can't figure out what.
The script is working just fine with the mySQL engine. When I initialize the handler object I'm getting an empty mySQLite.db file.
Adding something with that session yields:
(sqlite3.OperationalError) no such table: info....
I can however use something like ´sqlalchemy.inspect´ on a table object without any errors. So I have the correct table objects at hand, but they are somehow not connected to the base?
For SQLite, apperently the import of the table classes needs to happen before the DB is created.
# here I'm importing the table classes stated in the config file
self.dbt = DBTable(map_file=db_cfg["map"],
table_dict=db_cfg["tables"],
cr_dict=db_cfg["cross-reference"])
(which is done via pydoc.locate btw) has to be done before
engine = sqlalchemy.create_engine("sqlite:///mySQlite.db")
meta.BASE.metadata.create_all(engine)
session = sessionmaker(bind=engine)
self.session = session()
is called. I thought this was not important since I imported BASE at the beginning and since it works just fine when using a different engine.

Unique Sequencial Number to column

I need create sequence but in generic case not using Sequence class.
USN = Column(Integer, nullable = False, default=nextusn, server_onupdate=nextusn)
, this funcion nextusn is need generate func.max(table.USN) value of rows in model.
I try using this
class nextusn(expression.FunctionElement):
type = Numeric()
name = 'nextusn'
#compiles(nextusn)
def default_nextusn(element, compiler, **kw):
return select(func.max(element.table.c.USN)).first()[0] + 1
but the in this context element not know element.table. Exist way to resolve this?
this is a little tricky, for these reasons:
your SELECT MAX() will return NULL if the table is empty; you should use COALESCE to produce a default "seed" value. See below.
the whole approach of inserting the rows with SELECT MAX is entirely not safe for concurrent use - so you need to make sure only one INSERT statement at a time invokes on the table or you may get constraint violations (you should definitely have a constraint of some kind on this column).
from the SQLAlchemy perspective, you need your custom element to be aware of the actual Column element. We can achieve this either by assigning the "nextusn()" function to the Column after the fact, or below I'll show a more sophisticated approach using events.
I don't understand what you're going for with "server_onupdate=nextusn". "server_onupdate" in SQLAlchemy doesn't actually run any SQL for you, this is a placeholder if for example you created a trigger; but also the "SELECT MAX(id) FROM table" thing is an INSERT pattern, I'm not sure that you mean for anything to be happening here on an UPDATE.
The #compiles extension needs to return a string, running the select() there through compiler.process(). See below.
example:
from sqlalchemy import Column, Integer, create_engine, select, func, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import ColumnElement
from sqlalchemy.schema import ColumnDefault
from sqlalchemy.ext.compiler import compiles
from sqlalchemy import event
class nextusn_default(ColumnDefault):
"Container for a nextusn() element."
def __init__(self):
super(nextusn_default, self).__init__(None)
#event.listens_for(nextusn_default, "after_parent_attach")
def set_nextusn_parent(default_element, parent_column):
"""Listen for when nextusn_default() is associated with a Column,
assign a nextusn().
"""
assert isinstance(parent_column, Column)
default_element.arg = nextusn(parent_column)
class nextusn(ColumnElement):
"""Represent "SELECT MAX(col) + 1 FROM TABLE".
"""
def __init__(self, column):
self.column = column
#compiles(nextusn)
def compile_nextusn(element, compiler, **kw):
return compiler.process(
select([
func.coalesce(func.max(element.column), 0) + 1
]).as_scalar()
)
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, default=nextusn_default(), primary_key=True)
data = Column(String)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
# will normally pre-execute the default so that we know the PK value
# result.inserted_primary_key will be available
e.execute(A.__table__.insert(), data='single row')
# will run the default expression inline within the INSERT
e.execute(A.__table__.insert(), [{"data": "multirow1"}, {"data": "multirow2"}])
# will also run the default expression inline within the INSERT,
# result.inserted_primary_key will not be available
e.execute(A.__table__.insert(inline=True), data='single inline row')

Association Proxy SQLAlchemy

This source details how to use association proxies to create views and objects with values of an ORM object.
However, when I append an value that matches an existing object in the database (and said value is either unique or a primary key), it creates a conflicting object so I cannot commit.
So in my case is this only useful as a view, and I'll need to use ORM queries to retrieve the object to be appended.
Is this my only option or can I use merge (I may only be able to do this if it's a primary key and not a unique constraint), OR set up the constructor such that it will use an existing object in the database if it exists instead of creating a new object?
For example from the docs:
user.keywords.append('cheese inspector')
# Is translated by the association proxy into the operation:
user.kw.append(Keyword('cheese inspector'))
But I'd like to to be translated to something more like: (of course the query could fail).
keyword = session.query(Keyword).filter(Keyword.keyword == 'cheese inspector').one()
user.kw.append(keyword)
OR ideally
user.kw.append(Keyword('cheese inspector'))
session.merge() # retrieves identical object from the database, or keeps new one
session.commit() # success!
I suppose this may not even be a good idea, but it could be in certain use cases :)
The example shown on the documentation page you link to is a composition type of relationship (in OOP terms) and as such represents the owns type of relationship rather then uses in terms of verbs. Therefore each owner would have its own copy of the same (in terms of value) keyword.
In fact, you can use exactly the suggestion from the documentation you link to in your question to create a custom creator method and hack it to reuse existing object for given key instead of just creating a new one. In this case the sample code of the User class and creator function will look like below:
def _keyword_find_or_create(kw):
keyword = Keyword.query.filter_by(keyword=kw).first()
if not(keyword):
keyword = Keyword(keyword=kw)
# if aufoflush=False used in the session, then uncomment below
#session.add(keyword)
#session.flush()
return keyword
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String(64))
kw = relationship("Keyword", secondary=lambda: userkeywords_table)
keywords = association_proxy('kw', 'keyword',
creator=_keyword_find_or_create, # #note: this is the
)
I recently ran into the same problem. Mike Bayer, creator of SQLAlchemy, refered me to the “unique object” recipe but also showed me a variant that uses an event listener. The latter approach modifies the association proxy so that UserKeyword.keyword temporarily points to a plain string and only creates a new Keyword object if the keyword doesn't already exist.
from sqlalchemy import event
# Same User and Keyword classes from documentation
class UserKeyword(Base):
__tablename__ = 'user_keywords'
# Columns
user_id = Column(Integer, ForeignKey(User.id), primary_key=True)
keyword_id = Column(Integer, ForeignKey(Keyword.id), primary_key=True)
special_key = Column(String(50))
# Bidirectional attribute/collection of 'user'/'user_keywords'
user = relationship(
User,
backref=backref(
'user_keywords',
cascade='all, delete-orphan'
)
)
# Reference to the 'Keyword' object
keyword = relationship(Keyword)
def __init__(self, keyword=None, user=None, special_key=None):
self._keyword_keyword = keyword_keyword # temporary, will turn into a
# Keyword when we attach to a
# Session
self.special_key = special_key
#property
def keyword_keyword(self):
if self.keyword is not None:
return self.keyword.keyword
else:
return self._keyword_keyword
#event.listens_for(Session, "after_attach")
def after_attach(session, instance):
# when UserKeyword objects are attached to a Session, figure out what
# Keyword in the database it should point to, or create a new one
if isinstance(instance, UserKeyword):
with session.no_autoflush:
keyword = session.query(Keyword).\
filter_by(keyword=instance._keyword_keyword).\
first()
if keyword is None:
keyword = Keyword(keyword=instance._keyword_keyword)
instance.keyword = keyword