I have two models, a Project and an Action:
class Project(models.Model):
name = models.CharField("Project Name", max_length=200, unique = True)
class Action(models.Model):
name = models.CharField("Action Name", max_length=200)
project = models.ForeignKey(Project, blank=True, null=True, verbose_name="Project")
notes = models.TextField("Notes", blank=True)
complete = models.BooleanField(default=False, verbose_name="Complete?")
status = models.IntegerField("Action Status", choices = STATUS, default=0)
I need a query that returns all the Projects for which there are no actions with status < 2.
I tried:
Project.objects.filter(action__status__gt = 1)
But this returns all the Projects because in each Project, there are some actions with status 2 and some actions with status less than 2. Also, it repeated Projects in the resulting query. My current solution is below:
Project.objects.filter(action__status__gt =1).exclude(action__status__lt =2).annotate()
This collapses the repeating results and shows only actions with action statuses greater than 1. But is this the correct way to construct such a query? What if I wanted to return Projects with actions statuses greater than 1 OR Projects with no actions?
I might have misunderstood your requirement, but I think you can do that using annotations.
Project.objects.annotate(m = Min('action__status')).filter(Q(m = None) | Q(m__gt = 1))
The SQL generated is:
SELECT
"testapp_project"."id", "testapp_project"."name",
MIN("testapp_action"."status") AS "m"
FROM "testapp_project"
LEFT OUTER JOIN "testapp_action" ON ("testapp_project"."id" = "testapp_action"."project_id")
GROUP BY "testapp_project"."id", "testapp_project"."name"
HAVING(
MIN("testapp_action"."status") IS NULL
OR MIN("testapp_action"."status") > 1
)
Which is pretty self-explanatory.
Django's ORM is not capable of expressing this. You will need to use a raw query in order to perform this.
Related
I have two models with a simple FK relationship, Stock and Restriction (Restriction.stock_id FK to Stock).
class Restriction(Model):
__tablename__ = "restrictions"
id = db.Column(db.Integer, primary_key=True)
stock_id = FK("stocks.id", nullable=True)
name = db.Column(db.String(50), nullable=False)
class Stock(Model):
__tablename__ = "stocks"
id = db.Column(db.Integer, primary_key=True)
ticker = db.Column(db.String(50), nullable=False, index=True)
I would like to retrieve Restriction object and related Stock but only Stock's ticker (there are other fields omitted here). I can simply do this with:
from sqlalchemy.orm import *
my_query = Restriction.query.options(
joinedload(Restriction.stock).load_only(Stock.ticker)
)
r = my_query.first()
I get all columns for Restriction and only ticker for Stocks with above. I can see this in the SQL query run and also I can access r.stock.ticker and see no new queries run as it is loaded eagerly.
The problem is I cannot filter on stocks now, SQLAlchemy adds another FROM clause if I do my_query.filter(Stock.ticker == 'GME'). This means there is a cross product and it is not what I want.
On the other hand, I cannot load selected columns from relationship using join. ie.
Restriction.query.join(Restriction.stock).options(load_only(Restriction.stock))
does not seem to work with relationship property. How can I have a single filter and have it load selected columns from relationship table?
SQL I want run is:
SELECT restrictions.*, stocks.ticker
FROM restrictions LEFT OUTER JOIN stocks ON stocks.id = restrictions.stock_id
WHERE stocks.ticker = 'GME'
And I'd like to get a Restriction object back with its stock and stock's ticker. Is this possible?
joinedload basically should not be used with filter. You probably need to take contains_eager option.
from sqlalchemy.orm import *
my_query = Restriction.query.join(Restriction.stock).options(
contains_eager(Restriction.stock).load_only(Stock.ticker)
).filter(Stock.ticker == 'GME')
r = my_query.first()
Because you are joining using stock_id it will also be in the results as Stock.id beside Stock.ticker. But other fields would be omitted as you wish.
I have written short post about it recently if you are interested: https://jorzel.hashnode.dev/an-orm-can-bite-you
Imagine the following (example) datamodel:
class Organization(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
friendly_name = db.Column(db.Text, nullable=False)
users = db.relationship('Users', back_populates='organizations')
groups = db.relationship('Groups', back_populates='organizations')
class User(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="users")
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
organization_id = db.Column(db.Integer, db.ForeignKey('organizations.id'))
organizations = relationship("Organization", back_populates="groups")
(so basically an Organization has User and Group relationships)
What we want is to retrieve the counts for users and groups. Result should be similar to the following:
id
friendly_name
users_count
groups_count
1
o1
33
3
2
o2
12
2
3
o3
1
0
This can be achieved with a query similar to
query = db.session.query(
Organization.friendly_name,
func.count(User.id.distinct()).label('users_count'),
func.count(Group.id.distinct()).label('groups_count'),
) \
.outerjoin(User, Organization.users) \
.outerjoin(Group, Organization.groups) \
.group_by(Organization.id)
which seems quite overkill. The first intuitive approach would be something like
query = db.session.query(
Organization.friendly_name,
func.count(distinct(Organization.users)).label('users_count'),
func.count(distinct(Organization.groups).label('groups_count'),
)# with or without outerjoins
which is not working (Note: With one relationship it would work).
a) Whats the difference between User.id.distinct() and distinct(Organization.users) in this case?
b) What would be the best/most performant/recommended way in SQLAlchemy to get a count for each relationship an Object has?
Bonus): If instead of Organization.friendly_name the whole Model would be selected (...query(Organization, func....)) SQLAlchemy returns a tuple with the format t(Organization, users_count, groups_count) as result. Is there a way to just return the Organization with the two counts as additional fields? (as SQL would)
b:
You can try a window function to count users and groups with good performance:
query = db.session.query(
Organization.friendly_name,
func.count().over(partition_by=(User.id, Organization.id)).label('users_count')
func.count().over(partition_by=(Group.id, Organization.id)).label('groups_count')
)
.outerjoin(User, Organization.users)
.outerjoin(Group, Organization.groups)
bonus:
To return count as a field of Organization, you can use hybrid_property, but you would not be happy with the performance.
I have a table with ~5M rows that stores firmware downloads from each user. I'm trying to build a graph of the number of downloads of a specific firmware file over the last 30 days.
Although I can populate the data, it's slow to query from the database as I'm calling into it once for each result. I'm also guessing a subquery would be faster to "pre-filter" the firmware_id before doing the COUNT on each matching day.
The model.py definition is:
class Client(db.Model):
__tablename__ = 'clients'
id = Column(Integer, primary_key=True, nullable=False, unique=True)
datestr = Column(Integer, default=0, index=True)
firmware_id = Column(Integer, ForeignKey('firmware.firmware_id'), nullable=False, index=True)
datestr is an integer with the current yyyymmdd, e.g. 20190101.
The way I'm currently doing this is:
data = []
now = datetime.date.today()
for _ in range(30):
datestr = _get_datestr_from_datetime(now)
total = _execute_count_star(db.session.query(Client).\
filter(Client.firmware_id == fw.firmware_id).\
filter(Client.datestr == datestr))
data.append(int(total))
now -= datetime.timedelta(days=1)
where _execute_count_star is defined (seemingly much faster than .count()) as:
def _execute_count_star(q):
count_query = q.statement.with_only_columns([func.count()]).order_by(None)
return q.session.execute(count_query).scalar()
I'd ideally like to return all 30 of the day results in one query (using a subquery to filter on firmware_id?); at the moment the data takes about ~3s to be populated which is much too long to show to users by default. Also, the table is expected to grow at 0.5M rows per month, so the problem is only going to get worse. Any help or advice very welcome, thank you all so much.
I am new to django orm.
I've tables look like this.
class Product(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
name = models.CharField(max_length=60)
class ProductOption(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
product_id = models.ForeignKey(Product, on_delete=models.CASCADE, null=True, blank=True)
I would like to query productoption id that related to product. I made query like this to do inner join.
Query = Product.select_related(‘product_id’).filter(name='a')
And it gaves me error message saying
django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'product_id'. Choices are: (none)
I want to know if there is something wrong in models or query.
Use prefetch_related
Product.objects.filter(name='a').prefetch_related('productoption_set')
This is not how you query a related object. Since you used a foreign key and if I understand correctly, you probably want to use something like this:
Product.objects.filter(name='a').productoption_set.all()
Say I have a simple blog entry model in Django:
class Entry(models.Model):
author = models.ForeignKey(Author)
topic = models.ForeignKey(Topic)
entry = models.CharField(max_length=50, default='')
Now say I want to query for a author or topic, but exclude a particular topic altogether.
entry_list = Entry.objects.filter(Q(author=12)|Q(topic=123)).exclude(topic=666)
Sinmple enough, but I've found that this raw SQL contains a join on the topic table, even though it doesn't have to be used:
SELECT `blog_entry`.`id`
FROM `blog_entry`
LEFT OUTER JOIN `blog_topic`
ON (`blog_entry`.`topic_id` = `blog_topic`.`id`)
WHERE ((`blog_entry`.`author_id` = 12
OR `blog_entry`.`topic_id` = 123
)
AND NOT ((`blog_topic`.`id` = 666
AND NOT (`blog_topic`.`id` IS NULL)
AND `blog_topic`.`id` IS NOT NULL
))
)
Why is that? How can I get Django to query only the column ids and not join tables? I've tried the following but it give a FieldError:
entry_list = Entry.objects.filter(Q(author_id=12)|Q(topic_id=123)).exclude(topic_id=666)
i wonder whether this is a bug.
trying a similar example, i get no join when putting the exclude before the filter (but i do get it using your order)