Convert SQL query to SQLAlchemy query - sqlalchemy

I have two SQLALchemy models with a many-to-many relationship between them:
class Contact(db.Model):
id = db.Column(db.Integer, primary_key=True)
customer_id = db.Column(db.Integer)
users = db.relationship(ContactUser, cascade='all, delete-orphan')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
class ContactUser(db.Model):
contact_id = db.Column(db.Integer, db.ForeignKey('contact.id'),
primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'),
primary_key=True)
user = db.relationship(User)
I want to convert this SQL:
SELECT
c.*
FROM
contact c LEFT JOIN
user_contact uc ON c.id = uc.contact_id AND uc.user_id='456'
WHERE
c.customer_id = '123' AND
uc.contact_id IS NULL
Into an SQLAlchemy query, where I can specify customer_id and user_id.
I can't figure out how to tell SQLAlchemy to add the AND uc.user_id='456' to the ON clause.
Current query:
contacts = (Contact.query.join(ContactUser)
.filter(Contact.customer_id == customer.id)
.filter(ContactUser.contact_id == None)
The docs mention something about being able to use a "two argument calling of join", but it seems that only allows me to specify one thing, and I need two in my ON clause.
Also, I think I need to use outerjoin instead of join?

With the help of the guys in #sqlalchemy, we came up with:
from sqlalchemy import and_
...
Contact.query.filter(
Contact.customer_id==customer_id)
.outerjoin(ContactUser, and_(Contact.id==ContactUser.contact_id,
ContactUser.user_id!=user_id))
.filter(ContactUser.contact_id == None))

Related

joinedload and load_only but with filtering

I have two models with a simple FK relationship, Stock and Restriction (Restriction.stock_id FK to Stock).
class Restriction(Model):
__tablename__ = "restrictions"
id = db.Column(db.Integer, primary_key=True)
stock_id = FK("stocks.id", nullable=True)
name = db.Column(db.String(50), nullable=False)
class Stock(Model):
__tablename__ = "stocks"
id = db.Column(db.Integer, primary_key=True)
ticker = db.Column(db.String(50), nullable=False, index=True)
I would like to retrieve Restriction object and related Stock but only Stock's ticker (there are other fields omitted here). I can simply do this with:
from sqlalchemy.orm import *
my_query = Restriction.query.options(
joinedload(Restriction.stock).load_only(Stock.ticker)
)
r = my_query.first()
I get all columns for Restriction and only ticker for Stocks with above. I can see this in the SQL query run and also I can access r.stock.ticker and see no new queries run as it is loaded eagerly.
The problem is I cannot filter on stocks now, SQLAlchemy adds another FROM clause if I do my_query.filter(Stock.ticker == 'GME'). This means there is a cross product and it is not what I want.
On the other hand, I cannot load selected columns from relationship using join. ie.
Restriction.query.join(Restriction.stock).options(load_only(Restriction.stock))
does not seem to work with relationship property. How can I have a single filter and have it load selected columns from relationship table?
SQL I want run is:
SELECT restrictions.*, stocks.ticker
FROM restrictions LEFT OUTER JOIN stocks ON stocks.id = restrictions.stock_id
WHERE stocks.ticker = 'GME'
And I'd like to get a Restriction object back with its stock and stock's ticker. Is this possible?
joinedload basically should not be used with filter. You probably need to take contains_eager option.
from sqlalchemy.orm import *
my_query = Restriction.query.join(Restriction.stock).options(
contains_eager(Restriction.stock).load_only(Stock.ticker)
).filter(Stock.ticker == 'GME')
r = my_query.first()
Because you are joining using stock_id it will also be in the results as Stock.id beside Stock.ticker. But other fields would be omitted as you wish.
I have written short post about it recently if you are interested: https://jorzel.hashnode.dev/an-orm-can-bite-you

SQLAlchemy: Counting multiple relationships - best way?

Imagine the following (example) datamodel:
class Organization(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
friendly_name = db.Column(db.Text, nullable=False)
users = db.relationship('Users', back_populates='organizations')
groups = db.relationship('Groups', back_populates='organizations')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="users")
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="groups")
(so basically an Organization has User and Group relationships)
What we want is to retrieve the counts for users and groups. Result should be similar to the following:
id
friendly_name
users_count
groups_count
1
o1
33
3
2
o2
12
2
3
o3
1
0
This can be achieved with a query similar to
query = db.session.query(
Organization.friendly_name,
func.count(User.id.distinct()).label('users_count'),
func.count(Group.id.distinct()).label('groups_count'),
) \
.outerjoin(User, Organization.users) \
.outerjoin(Group, Organization.groups) \
.group_by(Organization.id)
which seems quite overkill. The first intuitive approach would be something like
query = db.session.query(
Organization.friendly_name,
func.count(distinct(Organization.users)).label('users_count'),
func.count(distinct(Organization.groups).label('groups_count'),
)# with or without outerjoins
which is not working (Note: With one relationship it would work).
a) Whats the difference between User.id.distinct() and distinct(Organization.users) in this case?
b) What would be the best/most performant/recommended way in SQLAlchemy to get a count for each relationship an Object has?
Bonus): If instead of Organization.friendly_name the whole Model would be selected (...query(Organization, func....)) SQLAlchemy returns a tuple with the format t(Organization, users_count, groups_count) as result. Is there a way to just return the Organization with the two counts as additional fields? (as SQL would)
b:
You can try a window function to count users and groups with good performance:
query = db.session.query(
Organization.friendly_name,
func.count().over(partition_by=(User.id, Organization.id)).label('users_count')
func.count().over(partition_by=(Group.id, Organization.id)).label('groups_count')
)
.outerjoin(User, Organization.users)
.outerjoin(Group, Organization.groups)
bonus:
To return count as a field of Organization, you can use hybrid_property, but you would not be happy with the performance.

SQLAlchemy : how to apply distinct before a filter in a join query

Is there a way to apply distinct before filtering in a join query ?
I have 2 tables :
class User(db.Model):
id = db.Column(db.Integer(), primary_key=True)
class Event(db.Model):
id = db.Column(db.Integer(), primary_key=True)
date = db.Column(db.DateTime, nullable=False, default=datetime.utcnow)
description = db.Column(db.String(80))
user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
user = db.relationship('User', backref=db.backref('events', lazy=True))
I want to select all users where the most recent event description is empty.
I thought this would work:
User.query.join('events').order_by(Event.date).distinct(User.id).filter(Event.description==None)
However, it seems that distinct is called after filter. Is there a way to force to apply distinct before filtering ?
DISTINCT is used to return different values.
In this case you need a subquery to create a table using GROUP BY where you group Event by user_id and calculating the most recent event :
last_events = db.session.query(
Event.user_id,
db.func.max(Event.date).label('last_event_date'),
Event.description)\
.group_by(Event.user_id)
.subquery()
Then you can join this table to User and filter on the last_event_description :
User.query.join(last_events,User.id==last_events.c.bac_id)
.filter(last_events.c.description==None)
Et voilĂ  !
The .c comes from the way you access the columns of the subquery in SQLAlchemy.

Optimizing hybrid_properties in SQLAlchemy

I have a piece of working code but it is very inefficient, instead of a single query with a join. I get one initial query, followed by one query per row in the response.
I have to following scenario:
class Job(Base, SerializeMixin, JobInterface):
__tablename__ = 'job_subjobs'
id = Column(Integer, primary_key=True, autoincrement=True)
group_id = Column(Integer, ForeignKey("job_groups.id"), nullable=False)
class Crash(Base, SerializeMixin):
__tablename__ = 'crashes'
id = Column(Integer, primary_key=True, autoincrement=True)
job_id = Column(Integer, ForeignKey("job_subjobs.id", ondelete='CASCADE'), nullable=False)
job = relationship('Job', backref='Crash')
#hybrid_property
def job_identifier(self):
return "{}:{}".format(self.job.group_id, self.job.id)
So given the above and I perform a query for all Crashes, It will perform one SELECT for all crashes. When I iterate and ask for job_identifier it will then do one separate SELECT for each crash.
self.session.query(Crash).all()
Is there someway i can create a #hybrid_property referencing a different table and have it JOIN from the beginning and preload the expression?
I've experimented with #xxx.expression without success. If all else fails I can add another foreign key in Crash table, but I would like to avoid changing current data structure if possible.
ended up using:
jobs = relationship('Job', backref='Crash', lazy='joined')

SQLAlchemy recursive many-to-many relation

I've a case where I'm using one table to store user and group related datas. This column is called profile. So, basically this table is many-to-many table for the cases where one user is belonging in to many groups or there are many users in one group.
I'm a bit confused how it should be described...
Here's a simplified presentation of the class.
Entity relationship model
user_group_table = Table('user_group', metadata,
Column('user_id', Integer,ForeignKey('profiles.id',
onupdate="CASCADE", ondelete="CASCADE")),
Column('group_id', Integer, ForeignKey('profiles.id',
onupdate="CASCADE", ondelete="CASCADE"))
)
class Profile(Base)
__tablename__ = 'profiles'
id = Column(Integer, autoincrement=True, primary_key=True)
name = Column(Unicode(16), unique=True) # This can be either user- / groupname
groups = relationship('Profile', secondary=user_group_table, backref = 'users')
users = relationship('Profile', secondary=user_group_table, backref = 'groups')
#Example of the usage:
user = Profile()
user.name = 'Peter'
salesGroup = Profile()
salesGroup.name = 'Sales'
user.groups.append(salesGroup)
salesGroup.users
>[peter]
First of all, I agree with Raven's comment that you should use separate tables for Users and Groups. The reason being that you might get some inconsistent data where a User might have other Users as its users relations, as well as you might have cycles in the relationship tree.
Having said that, to make the relationship work declare it as following:
...
class Profile(Base):
__tablename__ = 'profiles'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(Unicode(16), unique=True) # This can be either user- / groupname
groups = relationship('Profile',
secondary=user_group_table,
primaryjoin=user_group_table.c.user_id==id,
secondaryjoin=user_group_table.c.group_id==id,
backref='users')
...
Also see Specifying Alternate Join Conditions to relationship() documentation section.