sqlalchemy query based on a previous date? - sqlalchemy

Sorry if this is a strange question, I've been going through the docs/tutorial on the sqlalchemy site but I can't figure out how to do this specific query.
I have a bunch of dates of activity on my site that continues until changed. I know I can query specific dates or ranges of dates but what if I query a date(which doesn't exist) can I get the previous match?
For example say I have june 25, and june 30, as two dates, I run a query for June 29. Is it possible to get the June 25th data with only one query? I just want the previous match of a date I enter.

Below is probably a simplified version of your model, but hopefully the example will help you create your own query.
Assuming the model is defined as below, and that the [Activity.person_id, Activity.date] is unique (basically, only one activity per day is allowed), the query using a subquery, which returns tuples (Person, _last_ Activity):
# MODEL:
class Person(Base):
__tablename__ = 'person'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String)
activities = relationship('Activity', backref="person")
class Activity(Base):
__tablename__ = 'activity'
id = Column(Integer, primary_key=True, autoincrement=True)
person_id = Column(Integer, ForeignKey('person.id'))
name = Column(String)
date = Column(Date)
# BUILDING THE QUERY
def get_latest_activity_before_or_at(last_date):
AT = Activity.__table__
q = (select([AT.c.person_id, func.max(AT.c.date).label("max_date")],
(AT.c.date <= last_date)
).
group_by(AT.c.person_id)).alias("subq")
#print q
#qry = session.query(Person, q).outerjoin(q, q.c.person_id == Person.id)
qry = (session.query(Person).outerjoin(q, q.c.person_id == Person.id).
outerjoin(Activity, and_(Activity.person_id == Person.id, Activity.date == q.c.max_date)))
qry = qry.add_entity(Activity)
return qry.all()
# TESTING the query:
last_date = datetime.date(2012, 7, 3)
res = get_latest_activity_before_or_at(last_date)
for x in res:
print x

Related

Sqlalchemy: Resolve cartesian product warning for many-to-many relationship with custom primaryjoin

Let's say I have the following database schema:
class A(Base):
__tablename__ = "a_table"
id = Column(Integer, primary_key=True, autoincrement=True)
version = Column(Integer, primary_key=True, default=1)
# More columns...
bs = relationship(
"B", secondary="a2b_table", back_populates="as"
)
class B(Base):
__tablename__ = "b_table"
id = Column(Integer, primary_key=True)
as = relationship(
A, secondary="a2b_table", back_populates="bs"
)
class A2B(Base):
__tablename__ = "a2b_table"
a_id = Column(
Integer(),
primary_key=True,
)
a_version = Column(
Integer,
primary_key=True,
)
b_id = sa.Column(
Integer,
ForeignKey("b.id", name="b_fk"),
primary_key=True,
)
__table_args__ = (
ForeignKeyConstraint(
[a_id, a_version],
[A.id, A.version],
name="a_fk",
),
{},
)
Each A is identified by an id and can have multiple versions. If something changes in the columns of A (the ones not shown), I produce a new A with the same id and version+1. The relationship bs gives me all instances of B that are associated with a specific version of an A.
The problem is, that the relationship as gives me all versions of each A that is associated with a specific B. Instead, I want the relationship to contain only the latest (highest) version of each A. Following the docs, I tried to solve this with a custom primaryjoin and a window function:
partition = select(
A,
row_number()
.over(order_by=A.version.desc(), partition_by=A.id)
.label("index"),
).alias()
partitioned_as = aliased(A, partition)
B.latest_as = relationship(
partitioned_as,
primaryjoin=and_(
partition.c.index == 1,
and_(
partitioned_as.id == A2B.a_id,
partitioned_as.version == A2B.a_version,
),
),
secondary="a2b_table",
viewonly=True,
)
Unfortunately, it doesn't work and I get the warning:
SELECT statement has a cartesian product between FROM element(s) "anon_1", "a2b_table" and FROM element "a_table". Apply join condition(s) between each element to resolve.
I checked the SQL statement sqlalchemy generates and it has anon_1, i.e. the query of partition, and a_table in its FROM clause. As far as I understand it, a_table shouldn't be in the FROM clause of this statement because it is already in the FROM clause of partition. I don't know how to get rid of it.
Could anyone point me in the right direction? Thanks in advance.

How do I add an aggregate as "virtual column" in a result?

I've got a query that normally looks like
def get_models_with_children(ids):
query = MyModel.query.filter(MyModel.id.in_(ids))
.join(Child, Child.parent_id = Child.id)
.groupBy(MyModel.id)
.having(func.count(Child.id) > 0)
return query.all()
Sometimes, I want to actually retrieve the count, as well. I can make that happen easily enough:
def get_models_with_children(ids, return_count):
query = MyModel.query
if return_count:
query = query.add_columns(func.count(Child.id).label("child_count"))
query = query.filter(MyModel.id.in_(ids))
.join(Child, Child.parent_id = Child.id)
.groupBy(MyModel.id)
.having(func.count(Child.id) > 0)
return query.all()
This works fine, but now, instead of a List[MyModel] coming back, I've got a differently shaped result with MyModel and child_count keys. If I want the MyModel's id, I do result[0].id if I didn't add the count, and result[0].MyModel.id if I did.
Is there any way I can flatten the result, so that the thing that's returned looks like a MyModel with an extra child_count column?
def do_stuff_with_models():
result = get_models_with_children([1, 2, 3], True)
for r in result:
# can't do this, but I want to:
print(r.id)
print(r.child_count)
# instead I have to do this:
print(r.MyModel.id)
print(r.child_count)
sqlalchemy.util.KeyedTuple is the type * of differently shaped result with MyModel and child_count keys:
Result rows returned by Query that contain multiple
ORM entities and/or column expressions make use of this
class to return rows.
You can effectively flatten them by explictly specifying the columns for your query. Here follows a complete example (tested on SQLAlchemy==1.3.12).
Plain table column attribute
Models:
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'user'
user_id = sa.Column(sa.Integer, sa.Sequence('user_id_seq'), primary_key=True)
username = sa.Column(sa.String(80), unique=True, nullable=False)
def __repr__(self):
return f'User({self.user_id!r}, {self.username!r})'
class Token(Base):
__tablename__ = 'token'
token_id = sa.Column(sa.Integer, sa.Sequence('token_id_seq'), primary_key=True)
user_id = sa.Column(sa.Integer, sa.ForeignKey('user.user_id'), nullable=False)
user = sa.orm.relationship('User')
value = sa.Column(sa.String(120), nullable=False)
def __repr__(self):
return f'Token({self.user.username!r}, {self.value!r})'
Connect and fill some data:
engine = sa.create_engine('sqlite://')
Base.metadata.create_all(engine)
Session = sa.orm.sessionmaker(bind=engine)
session = Session()
user1 = User(username='joe')
user2 = User(username='john')
token1 = Token(user=user1, value='q1w2e3r4t56')
session.add_all([user1, user2, token1])
session.commit()
Now, let's define the "virtual" column as whether user has a token:
query = session.query(User)
exists = (
sa.exists()
.where(User.user_id == Token.user_id)
.correlate(User)
.label("has_token")
)
query = query.add_columns(exists)
query.all() # [(User(1, 'joe'), True), (User(2, 'john'), False)]
It's the undesired shape. And here's how to flatten it:
query = session.query(*[getattr(User, n) for n in User.__table__.columns.keys()])
query = query.add_columns(exists)
query.all() # [(1, 'joe', True), (2, 'john', False)]
It's all possible to define columns for an existing query, given that you know the model:
query = session.query(User)
# later down the line
query = query.with_entities(*[
getattr(User, n) for n in User.__table__.columns.keys()])
query = query.add_columns(exists)
query.all() # [(1, 'joe', True), (2, 'john', False)]
Column bundle
The same can be achieved with sqlalchemy.orm.Bundle and passing single_entity to it.
bundle = sa.orm.Bundle(
'UserBundle', User.user_id, User.username, exists, single_entity=True)
query = session.query(bundle)
query.all() # [(1, 'joe', True), (2, 'john', False)]
Issue with relationship attribute
With complex models it gets complicated. It's possible to inspect the model (mapped class) attributes with sqlalchemy.orm.mapper.Mapper.attrs and take class_attribute:
# replace
[getattr(User, n) for n in User.__table__.columns.keys()]
# with
[mp.class_attribute for mp in sa.inspect(User).attrs]
But in this case relationship attributes turn into their target tables in FROM clause of the query without ON clause, effectively producing a cartesian product. And the "joins" have to be defined manually, so it's not a good solution. See this answer and a SQLAlchemy user group discussion.
Query expression attribute
Myself I ended up using query expressions, because of the issues with relationships in existing code. It's possible to get away with minimal modification of the model, with query-time SQL expressions as mapped attributes.
User.has_tokens = sa.orm.query_expression()
...
query = query.options(sa.orm.with_expression(User.has_tokens, exists))
query.all() # [User(1, 'joe'), User(2, 'john')]
[u.has_tokens for u in query.all()] # [True, False]
* Actually it's generated on-the-fly sqlalchemy.util._collections.result with MRO of sqlalchemy.util._collections.result, sqlalchemy.util._collections._LW, class sqlalchemy.util._collections.AbstractKeyedTuple, tuple, object, but that's details. More details on how the class is created with lightweight_named_tuple are available in this answer.

How to make this query in sqlalchemy?

SELECT
maintener.*,
(SELECT COUNT(*)
FROM device d
WHERE d.in_stock_maintener_id = maintener.id) AS in_stock_devices
FROM maintener;
I'm creating a report that show all mainteners but i need to show the number of devices that each one of that mainteners has by looking at the devices model reference in_stock_maintener_id;
I have this models in my persist sqlalchemy.
class Maintener(persist.Base):
__tablename__ = 'maintener'
id = Column(Integer, primary_key=True)
name = Column(String(255))
document_number = Column(String(30))
phone_1 = Column(String(12))
phone_2 = Column(String(12))
email = Column(String(255))
class Device(persist.Base):
__tablename__ = 'device'
id = Column(Integer, primary_key=True)
serial = Column(String(45))
in_stock = Column(SmallInteger)
in_stock_maintener_id = Column(ForeignKey(u'maintener.id'), nullable=True, index=True)
in_stock_maintener = relationship(u'Maintener', lazy='noload', \
primaryjoin='Device.in_stock_maintener_id == Maintener.id')
If anyone could help me, i'll be grateful =)
sq = (
session
.query(func.count())
.select_from(Device)
.filter(Device.in_stock_maintener_id == Maintener.id)
).as_scalar()
q = session.query(Maintener, sq.label('in_stock_devices'))
Query above will return an enumerable of tuple(Maintener, Integer).
If you would like to have columns instead (as per your comment), then you can either specify the columns you want in the query implicitly:
q = session.query(Maintener.id, Maintener.name, sq.label('in_stock_devices'))
or if you would like all columns (as in SELECT *), then you could query the Table instead of the mapped entity:
q = session.query(Maintener.__table__, sq.label('in_stock_devices'))
Above I assumed that you use declarative extension.

Count related items in a sqlalchemy model

I'm trying to count the number of items in their respective categories and end up with a collection that I can iterate through in a jinja template. My final output is something like:
category1, 5
category2, 10
category3, 0
The zero items case is important.
My model is:
class Category(Base):
__tablename__ = 'category'
id = Column(Integer, primary_key=True)
name = Column(String(80), unique=True)
user_id = Column(Integer, ForeignKey('user.id'))
user = relationship(User)
class Item(Base):
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
name = Column(String(80))
description = Column(String(500))
category_id = Column(Integer, ForeignKey('category.id'))
category = relationship(Category)
user_id = Column(Integer, ForeignKey('user.id'))
user = relationship(User)
date_added = Column(DateTime, default=datetime.datetime.now)
I have been kindly pointed in the direction of Stackoverflow: Counting relationships in SQLAlchemy, which led me to the query
count_categories = db_session.query(Category.name, func.count(Item.id)).join(Item.category).group_by(Category.id).all()
Which is almost correct, but it does not handle the zero case. When a category has zero items, I still need the category returned by the query.
Any help, much appreciated.
Actually, I've figured it out:
count_categories = db_session.query(
Category.name, func.count(Item.id)).outerjoin(
Item).group_by(Category.id).all()
See SQLAlchemy documentation on Joins

Usage of a COUNT(DISTINCT field) with a GROUP BY clause in Django

Problem
I want to use a COUNT(DISTINCT field) with a GROUP BY clause in Django. As I understand, the COUNT(DISTINCT... can only be achieved by using an extra for the query set.
My simplified model is :
class Site(models.Model):
name = models.CharField(max_length=128, unique=True)
class Application(models.Model):
name = models.CharField(max_length=64)
version = models.CharField(max_length=13, db_index=True)
class User(models.Model):
name = models.CharField(max_length=64)
site = models.ForeignKey(Site, db_index=True)
class Device(models.Model):
imei = models.CharField(max_length=16, unique=True)
applications = models.ManyToManyField(Application, null=True, db_index=True, through='ApplicationUsage')
user = models.ForeignKey(User, null=True, db_index=True)
class ApplicationUsage(models.Model):
activity = models.DateField(db_index=True)
application = models.ForeignKey(Application)
device = models.ForeignKey(Device)
My goal is to have a liste of Site objects with a count of distinct device for each site given an application activity through a time period, something like
stats_site.name deviceCount
ALBI 32
AMPLEPUIS 42
...
I try this code :
qs = models.Site.objects.filter(user__device__applicationusage__activity__range=[startDay, endDay])\
.extra(select={'deviceCount' : 'COUNT(DISTINCT `stats_device`.`id`)'})\
.values('name', 'deviceCount')\
The generated SQL is :
SELECT (COUNT(DISTINCT stats_device.id)) AS deviceCount, stats_site.name
FROM stats_site
INNER JOIN stats_user ON (stats_site.id = stats_user.site_id)
INNER JOIN stats_device ON (stats_user.id = stats_device.user_id)
INNER JOIN stats_applicationusage ON (stats_device.id = stats_applicationusage.device_id)
WHERE stats_applicationusage.activity BETWEEN '2013-07-01' AND '2013-07-03'
And the result is obviously wrong since it lacks the GROUP BY clause, which should be GROUP BY stats_site.name
The problem is: I don't know how to add the correct GROUP BY using the annotate function or other.
Solution
Using distinct=True on the Count function with annotate:
qs = models.Site.objects.filter(habileouser__device__applicationusage__activity__range=[startDay, endDay])\
.annotate(deviceCount=Count('habileouser__device', distinct=True))\
.values('name', 'deviceCount')
The annotate method of a queryset will calculate an aggregate value for each element of the queryset, and when used after a values call will aggregate over the values of the values. I think this should work:
qs = models.Site.objects.filter(
user__device__applicationusage__activity__range=[startDay, endDay]
).values('name').annotate(Count('user__device', distinct=True))
If you have an ordering specified you may need to remove it as discussed here:
https://docs.djangoproject.com/en/dev/topics/db/aggregation/#interaction-with-default-ordering-or-order-by