SQLAlchemy clone table row with relations - sqlalchemy

Following on from this question SQLAlchemy: Modification of detached object.
This makes a copy of the object fine, but it loses any many-to-many relationships the original object had. Is there a way to copy the object and any many-to-many relationships as well?
Cheers!

I got this to work by walking the object graph and doing the expunge(), make_transient() and id = None steps on each object in the graph as described in SQLAlchemy: Modification of detached object.

Here is my sample code. The agent has at most one campaign.
from sqlalchemy.orm.session import make_transient
def clone_agent(id):
s = app.db.session
agent = s.query(Agent).get(id)
c = None
# You need to get child before expunge agent, otherwise the children will be empty
if agent.campaigns:
c = agent.campaigns[0]
s.expunge(c)
make_transient(c)
c.id = None
s.expunge(agent)
agent.id = None
# I have unique constraint on the following column.
agent.name = agent.name + '_clone'
agent.externalId = - agent.externalId # Find a number that is not in db.
make_transient(agent)
s.add(agent)
s.commit() # Commit so the agent will save to database and get an id
if c:
assert agent.id
c.agent_id = agent.id # Attach child to parent (agent_id is a foreign key)
s.add(c)
s.commit()

Related

SQLAlchemy ORM - map an object to dynamically created schema

I'm considering porting my app to the SQLAlchemy as it's much more extensive than my own ORM implementation, but all the examples I could find show how to set the schema name at class declaration rather than dynamically at runtime.
I need to map my objects to Postgres tables from multiple schemas. Moreover, the application creates new schemas in runtime and I need to map new instances of the class to rows of the table from that new schema.
Currently, I use my own ORM module where I just provide the schema name as an argument when creating new instances of a class (I call class' method with the schema name as an argument and it returns an object(s) that holds the schema name). The class describes a table that can exist in many schemas. The class declaration doesn't contain information about schema, but instances of that class do contain it and include it when generating SQL statements.
This way, the application can work with many schemas simultaneously and even create foreign keys in tables from "other" schemas to the "main" table in the public schema. In such a way it is also possible to delete data in other schemas cascaded when deleting the row in the public schema.
The SQLAlchemy gives this example to set the schema for the table (documentation):
metadata_obj = MetaData(schema="remote_banks")
financial_info = Table(
"financial_info",
metadata_obj,
Column("id", Integer, primary_key=True),
Column("value", String(100), nullable=False),
)
But on ORM level, when I declare the class, I should pass an already constructed table (example from documentation):
metadata = MetaData()
group_users = Table(
"group_users",
metadata,
Column("user_id", String(40), nullable=False),
Column("group_id", String(40), nullable=False),
UniqueConstraint("user_id", "group_id"),
)
class Base(DeclarativeBase):
pass
class GroupUsers(Base):
__table__ = group_users
__mapper_args__ = {"primary_key": [group_users.c.user_id, group_users.c.group_id]}
So, the question is: is it possible to map class instances to tables/rows from dynamically created database schemas (in runtime) in SQLAlchemy? The way of altering the connection to set the current schema is not acceptable to me. I want to work with all schemas simultaneously.
I'm free to use the newest SQLAlchemy 2.0 (currently in BETA release).
You can set the schema per table so I think you have to make a table and class per schema. Here is a made up example. I have no idea what the ramifications are of changing the mapper registry during runtime. Especially as I have done below, mid-transaction and what would happen with threadsafety. You could probably use a master schema list table in public and lock it or lock the same row across connections to syncronize the schema list and provide threadsafety when adding a schema. I'm suprised it works. Kind of cool.
import sys
from sqlalchemy import (
create_engine,
Integer,
MetaData,
Float,
event,
)
from sqlalchemy.schema import (
Column,
CreateSchema,
Table,
)
from sqlalchemy.orm import Session
from sqlalchemy.orm import registry
username, password, db = sys.argv[1:4]
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=True)
metadata = MetaData()
mapper_registry = registry()
def map_class_to_some_table(cls, table, entity_name, **mapper_kwargs):
newcls = type(entity_name, (cls,), {})
mapper_registry.map_imperatively(newcls, table, **mapper_kwargs)
return newcls
class Measurement(object):
pass
units = []
cls_for_unit = {}
tbl_for_unit = {}
def add_unit(unit, create_bind=None):
units.append(unit)
schema_name = f"unit_{unit}"
if create_bind:
create_bind.execute(CreateSchema(schema_name))
else:
event.listen(metadata, "before_create", CreateSchema(schema_name))
cols = [
Column("id", Integer, primary_key=True),
Column("value", Float, nullable=False),
]
# One table per schema.
tbl_for_unit[unit] = Table("measurement", metadata, *cols, schema=schema_name)
if create_bind:
tbl_for_unit[unit].create(create_bind)
# One class per schema.
cls_for_unit[unit] = map_class_to_some_table(
Measurement, tbl_for_unit[unit], Measurement.__name__ + f"_{unit}"
)
for unit in ["mm", "m"]:
add_unit(unit)
metadata.create_all(engine)
with Session(engine) as session, session.begin():
# Create a value for each unit (schema).
session.add_all([cls(value=i) for i, cls in enumerate(cls_for_unit.values())])
with Session(engine) as session, session.begin():
# Read back a value for each unit (schema).
print(
[
(unit, cls.__name__, cls, session.query(cls).first().value)
for (unit, cls) in cls_for_unit.items()
]
)
with Session(engine) as session, session.begin():
# Add another unit, add a value, flush and then read back.
add_unit("km", create_bind=session.bind)
session.add(cls_for_unit["km"](value=100.0))
session.flush()
print(session.query(cls_for_unit["km"]).first().value)
Output of last add_unit()
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine CREATE SCHEMA unit_km
2022-12-16 08:16:13,446 INFO sqlalchemy.engine.Engine [no key 0.00015s] {}
2022-12-16 08:16:13,447 INFO sqlalchemy.engine.Engine COMMIT
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2022-12-16 08:16:13,469 INFO sqlalchemy.engine.Engine
CREATE TABLE unit_km.measurement (
id SERIAL NOT NULL,
value FLOAT NOT NULL,
PRIMARY KEY (id)
)
Ian Wilson posted a great answer to my question which I'm going to use.
About the same time I got an idea of how it can work and would like to post it here just as a very simple example. I think this is the same mechanism behind it as posted by Ian.
This example only "reads" an object from the schema that can be referenced at runtime.
from sqlalchemy import create_engine, Column, Integer, String, MetaData
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import sessionmaker
import psycopg
engine = create_engine(f"postgresql+psycopg://user:password#localhost:5432/My_DB", echo=True)
Session = sessionmaker(bind=engine)
session = Session()
class Base(DeclarativeBase):
pass
class A(object):
__tablename__ = "my_table"
id = Column("id", Integer, primary_key=True)
name = Column("name", String)
def __repr__(self):
return f"A: {self.id}, {self.name}"
metadata_obj = MetaData(schema="my_schema") # here we create new mapping
A1 = type("A1", (A, Base), {"metadata": metadata_obj}) # here we make a new subclass with desired mapping
data = session.query(A1).all()
print(data)
This info helped me to come to this solution:
https://github.com/sqlalchemy/sqlalchemy/wiki/EntityName
"... SQLAlchemy mapping makes modifications to the mapped class, so it's not really feasible to have many mappers against the exact same class ..."
This means a separate class must be created in runtime for each schema

Update relationship in SQLAlchemy

I have this kind of model:
class A(Base):
id = Column(UUID(as_uuid=True), primary_key=True, server_default=text("uuid_generate_v4()"))
name = Column(String, nullable=False, unique=True)
property = Column(String)
parent_id = Column(UUID(as_uuid=True), ForeignKey(id, ondelete="CASCADE"))
children = relationship(
"A", cascade="all,delete", backref=backref("parent", remote_side=[id])
)
An id is created automatically by the server
I have a relationship from a model to itself (parent and children).
In the background I run a task that periodically receives a message with id of parent and list of pairs (name, property) of children. I would like to update the parent's children in table (Defined by name). Is there a way to do so without reading all children, see which one is missing (name not present is message), need to be updated (name exists but property has changed) or new (name not present in db)?
Do I need to set name to be my primary key and get rid of the UUID?
Thanks
I'd do a single query and compare the result against the message you receive. That way it's easier to handle both additions, removals and updates.
msg_parent_id = 5
msg_children = [('name', 'property'), ('name2', 'property2')]
stmt = select(A).where(A.parent_id == msg_parent_id)
children = session.execute(stmt).scalars()
# Example of determining what to change
name_map = {row.name: row for row in children}
for child_name, child_prop in msg_children:
# Child exists
if child_name in name_map:
# Edit child
if name_map[child_name].property != child_prop:
print(child_name, 'has changed to', property)
del name_map[child_name]
# Add child
else:
print(child_name, 'was added')
# Remove child
for child in name_map.values():
print(child, 'was removed')
Do I need to set name to be my primary key and get rid of the UUID?
Personally I'd add a unique constraint on the name, but still have a separate ID column for the sake of relationships.
Edit for a more ORM orientated way. I believe you can already use A.children = [val1, val2], which is really what you need.
In the past I have used this answer on how to intercept the call, parse the input data, and fetch the existing record from the database if it exists. As part of the that call you could update the property of that record.
Finally use a cascade on the relationship to automatically delete records when parent_id is set to None.

Apply similar options in SQL alchemy as contains_eager by default to query without specifying relationship path

The following pytest passes, as expected:
from sqlalchemy.orm import contains_eager
#pytest.fixture(scope="function", autouse=True)
def populate_db(test_db: Session):
"""Populate db with a single parent with two children that go to different schools."""
parent_1 = Parent(uuid="1")
child_1 = Child(uuid="2", parent_uuid="1", school="north")
child_2 = Child(uuid="3", parent_uuid="1", school="south")
test_db.add(parent_1)
test_db.commit()
test_db.add(child_1)
test_db.add(child_2)
test_db.commit()
def test_eager(db: Session):
"""Query for parents based off of weather or their child goes to school 'north'."""
parent = db.query(Parent).join(Child).filter(Child.school == "north").first()
# both of the parent children relationships are returned
assert len(parent.children) == 2
parent = db.query(Parent).join(Child).options(contains_eager(Parent.children)).filter(Child.school == "north").first()
# Only one of the parent child relationship returned
assert len(parent.children) == 1
What I would like, is for the behaviour of contains_eager to be applied by default (so in this case the first assertion would be false, since only the 1 parent.children relationship would be returned)
Also, I have many instances where I would want this to be applied, so would prefer not to have to define the relationship path each time .options(contains_eager(Parent.children))
For example if I added a School model, I wouldn't want to be required to update the options with School.children.

Django save() does not work

I'm writing an elections application. In the process, I've defined an Election model and a Candidate model.
Note: I'm using Django version 1.3.7, Python 2.7.1.
One of Election's methods,
Election.count_first_place(self)
is intended to count the number of first place votes each candidate receives and update the candidates' numVotes attribute. But for some reason they all stay at zero, no matter the ballots.
Note: I'm implementing STV so each ballot contains an array(ballot.voteArray) of Candidates in order of most preferred (position zero) to least preferred (position n). I've implemented this list with a PickledObjectField (see link).
models.py
class Candidate(models.Model):
election = models.ForeignKey("Election")
numVotes = models.FloatField(blank=True)
class Ballot(models.Model):
election = models.ForeignKey("Election", related_name = "ballot_set")
voteArray = PickledObjectField(null=True,blank=True)
class Election(models.Model):
position = models.CharField(max_length = 50)
candidates = models.ManyToManyField(Candidate,related_name="elections_in",null=True,blank=True)
def count_first_place(self):
#retrieve all of the ballots cast in this election
ballots = Ballot.objects.filter(election = self)
for ballot in ballots.all():
# the first element of a ballot's voteArray is a Candidate object
first_place_choice = ballot.voteArray[0]
first_place_choice.numVotes += 1
first_place_choice.save()
ballot.save()
self.save()
Here is what happens when I run a test:
Note: I realize that I am saving way more often than is necessary. Just being absolutely sure while I test this thing that it saves when it needs to.
elec = Election(position="Student Body President")
elec.save()
j = Candidate(election=elec,numVotes=0)
j.save()
e = Candidate(election=elec,numVotes=0)
e.save()
b = Candidate(election=elec,numVotes=0)
b.save()
elec.candidates.add(j,e,b)
elec.save()
ballot1 = Ballot(election=elec,voteArray=[j,e,b])
ballot1.save()
ballot2 = Ballot(election=elec,voteArray=[j,b,e])
ballot2.save()
ballot3 = Ballot(election=elec,voteArray[e,b,j])
ballot3.save()
So after this bit, j has two 2 place votes, and e has 1. But when I run
elec.count_first_place()
j still has zero votes, as do e and b.
What's up with that????
This is a very strange table structure. Pickling other model instances is a very bad idea: the pickled versions will not update when their database rows do. Really you should be storing an array of candidate IDs, or even better create a many-to-many relationship from Ballot to Candidate with a through table indicating position.
But I think your problem is simpler than that. You say that the objects still have zero votes: that is because you have not updated those particular instances. Again, there is no direct relationship between a Django instance and the database row, other than on loading and saving. You'll need to reload the objects from the database to see any updates.

Association Proxy SQLAlchemy

This source details how to use association proxies to create views and objects with values of an ORM object.
However, when I append an value that matches an existing object in the database (and said value is either unique or a primary key), it creates a conflicting object so I cannot commit.
So in my case is this only useful as a view, and I'll need to use ORM queries to retrieve the object to be appended.
Is this my only option or can I use merge (I may only be able to do this if it's a primary key and not a unique constraint), OR set up the constructor such that it will use an existing object in the database if it exists instead of creating a new object?
For example from the docs:
user.keywords.append('cheese inspector')
# Is translated by the association proxy into the operation:
user.kw.append(Keyword('cheese inspector'))
But I'd like to to be translated to something more like: (of course the query could fail).
keyword = session.query(Keyword).filter(Keyword.keyword == 'cheese inspector').one()
user.kw.append(keyword)
OR ideally
user.kw.append(Keyword('cheese inspector'))
session.merge() # retrieves identical object from the database, or keeps new one
session.commit() # success!
I suppose this may not even be a good idea, but it could be in certain use cases :)
The example shown on the documentation page you link to is a composition type of relationship (in OOP terms) and as such represents the owns type of relationship rather then uses in terms of verbs. Therefore each owner would have its own copy of the same (in terms of value) keyword.
In fact, you can use exactly the suggestion from the documentation you link to in your question to create a custom creator method and hack it to reuse existing object for given key instead of just creating a new one. In this case the sample code of the User class and creator function will look like below:
def _keyword_find_or_create(kw):
keyword = Keyword.query.filter_by(keyword=kw).first()
if not(keyword):
keyword = Keyword(keyword=kw)
# if aufoflush=False used in the session, then uncomment below
#session.add(keyword)
#session.flush()
return keyword
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String(64))
kw = relationship("Keyword", secondary=lambda: userkeywords_table)
keywords = association_proxy('kw', 'keyword',
creator=_keyword_find_or_create, # #note: this is the
)
I recently ran into the same problem. Mike Bayer, creator of SQLAlchemy, refered me to the “unique object” recipe but also showed me a variant that uses an event listener. The latter approach modifies the association proxy so that UserKeyword.keyword temporarily points to a plain string and only creates a new Keyword object if the keyword doesn't already exist.
from sqlalchemy import event
# Same User and Keyword classes from documentation
class UserKeyword(Base):
__tablename__ = 'user_keywords'
# Columns
user_id = Column(Integer, ForeignKey(User.id), primary_key=True)
keyword_id = Column(Integer, ForeignKey(Keyword.id), primary_key=True)
special_key = Column(String(50))
# Bidirectional attribute/collection of 'user'/'user_keywords'
user = relationship(
User,
backref=backref(
'user_keywords',
cascade='all, delete-orphan'
)
)
# Reference to the 'Keyword' object
keyword = relationship(Keyword)
def __init__(self, keyword=None, user=None, special_key=None):
self._keyword_keyword = keyword_keyword # temporary, will turn into a
# Keyword when we attach to a
# Session
self.special_key = special_key
#property
def keyword_keyword(self):
if self.keyword is not None:
return self.keyword.keyword
else:
return self._keyword_keyword
#event.listens_for(Session, "after_attach")
def after_attach(session, instance):
# when UserKeyword objects are attached to a Session, figure out what
# Keyword in the database it should point to, or create a new one
if isinstance(instance, UserKeyword):
with session.no_autoflush:
keyword = session.query(Keyword).\
filter_by(keyword=instance._keyword_keyword).\
first()
if keyword is None:
keyword = Keyword(keyword=instance._keyword_keyword)
instance.keyword = keyword