Django ORM - Grouped aggregates with different select clauses - mysql

Imagine we have the Django ORM model Meetup with the following definition:
class Meetup(models.Model):
language = models.CharField()
speaker = models.CharField()
date = models.DateField(auto_now=True)
I'd like to use a single query to fetch the language, speaker and date for the
latest event for each language.
>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
{'speaker': u'mike', 'language': u'python', 'latest_date': ...},
{'speaker': u'ryan', 'language': u'python', 'latest_date': ...},
{'speaker': u'noah', 'language': u'node', 'latest_date': ...},
{'speaker': u'shawn', 'language': u'node', 'latest_date': ...},
]
D'oh! We're getting the latest event, but for the wrong grouping!
It seems like I need a way to GROUP BY the language but SELECT on a different
set of fields?
Update - this sort of query seems fairly easy to express in SQL:
SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;
I'd love a way to do this without using Django's raw() - is it possible?
Update 2 - after much searching, it seems there are similar questions on SO:
Django Query that gets the most recent objects
How can I do a greatest n per group query in Django
MySQL calls this sort of query a group-wise maximum of a certain column.
Update 3 - in the end, with #danihp's help, it seems the best you can do
is two queries. I've used the following approach:
# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
.annotate(latest_pk=Max("pk"))
.values_list("latest_pk", flat=True))
# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)
This question is a follow up to my previous question:
Django ORM - Get latest record for group

This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )
But this is not SQL, this is django query api. Easy way is to do a query for each language:
languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [ Meetup
.objects
.filter( language = l )
.latest( 'date' )
for l in languages
]
This crash if some language don't has meetings.
The other approach is to get all max data for each language:
last_dates = ( Meetup
.objects
.values("language")
.annotate(ldate=models.Max("date"))
.order_by() )
q= reduce(lambda q,meetup:
q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ),
last_dates, Q())
your_query = Meetup.objects.filter(q)
Perhaps someone can explain how to do it in a single query without raw sql.
Edited due OP comment
You are looking for:
"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"
Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.
In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.
Quoting MySQL docs:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause...However, this is useful primarily when all values
in each nonaggregated column not named in the GROUP BY are the same
for each group.
The most close query to match your requirements is:
Reults = ( Meetup
.objects
.values("language","speaker")
.annotate(ldate=models.Max("date"))
.order_by() )

Related

How to use the distinct method in Rails with Arel Table?

I am looking to run the following query in Rails (I have used the scuttle.io site to convert my SQL to rails-friendly syntax):
Here is the original query:
SELECT pools.name AS "Pool Name", COUNT(DISTINCT stakings.user_id) AS "Total Number of Users Per Pool" from stakings
INNER JOIN pools ON stakings.pool_id = pools.id
INNER JOIN users ON users.id = stakings.user_id
INNER JOIN countries ON countries.code = users.country
WHERE countries.kyc_flow = 1
GROUP BY (pools.name);
And here is the scuttle.io query:
<%Staking.select(
[
Pool.arel_table[:name].as('Pool_Name'), Staking.arel_table[:user_id].count.as('Total_Number_of_Users_Per_Pool')
]
).where(Country.arel_table[:kyc_flow].eq(1)).joins(
Staking.arel_table.join(Pool.arel_table).on(
Staking.arel_table[:pool_id].eq(Pool.arel_table[:id])
).join_sources
).joins(
Staking.arel_table.join(User.arel_table).on(
User.arel_table[:id].eq(Staking.arel_table[:user_id])
).join_sources
).joins(
Staking.arel_table.join(Country.arel_table).on(
Country.arel_table[:code].eq(User.arel_table[:country])
).join_sources
).group(Pool.arel_table[:name]).each do |x|%>
<p><%=x.Pool_Name%><p>
<p><%=x.Total_Number_of_Users_Per_Pool%>
<%end%>
Now, as you may notice, sctuttle.io does not include the distinct parameter which I need. How in the world can I use distinct here without getting errors such as "method distinct does not exist for Arel Node?" or just syntax errors?
Is there any way to write the above query using rails ActiveRecord? I am sure there is, but I am really not sure how.
Answer
The Arel::Nodes::Count class (an Arel::Nodes::Function) accepts a boolean value for distinctness.
def initialize expr, distinct = false, aliaz = nil
super(expr, aliaz)
#distinct = distinct
end
The #count expression is a shortcut for the same and also accepts a single argument
def count distinct = false
Nodes::Count.new [self], distinct
end
So in your case you could use either of the below options
Arel::Nodes::Count.new([Staking.arel_table[:user_id]],true,'Total_Number_of_Users_Per_Pool')
# OR
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
Suggestion 1:
The Arel you have seems a bit overkill. Given the natural relationships you should be able to simplify this a bit e.g.
country_table = Country.arel_table
Staking
.joins(:pools,:users)
.joins( Arel::Nodes::InnerJoin(
country_table,
country_table.create_on(country_table[:code].eq(User.arel_table[:country])))
.select(
Pool.arel_table[:name],
Staking.arel_table[:user_id].count(true).as('Total_Number_of_Users_Per_Pool')
)
.where(countries: {kyc_flow: 1})
.group(Pool.arel_table[:name])
Suggestion 2: Move this query to your controller. The view has no business making database calls.

Laravel - Grouping eloquent query by date and user

I have the following database table 'observations'
I am trying to make table by group the observations using three criteria (date - user_id - Type_Name_ID):-
There is no way coming into my mind of how to form an laravel query to get the required result.
Usually you can start from the known SQL query statement to get these results and use the methods provided in Query Builder.
>>> $observationsQuery = DB::table('observations')
->selectRaw('date, count(observation_id), user_id, Type_Name_ID')
->groupBy('date', 'user_id', 'Type_Name_ID');
>>> $observationsQuery->toSql();
=> "select date, count(observation_id), user_id, Type_Name_ID from "observations"
group by "date", "user_id", "Type_Name_ID""

Using Django Count() with conditional lookup types

So I'm trying to aggregate exam data, and because the database lives on another server I'm trying to reduce this to as few database calls as possible.
I have this model (whose corresponding table is in a mySQL database if that matters):
class Exam(models.Model):
submitted = models.BooleanField(default=False)
score = models.DecimalField(default=Decimal(0))
And this query:
>>> exam_models.Exam.objects\
... .using("exam_datebase")\
... .aggregate(average=Avg("score"),
... total=Count("submitted"))
{'average': 22.251082, 'total': 231}
What I'm looking for is a way to also retrieve the number of passed exams, something along the lines of:
>>> exam_models.Exam.objects\
... .using("exam_datebase")\
... .aggregate(average=Avg("score"),
... total=Count("submitted"))
... passed=Count("score__gte=80"))
{'average': 22.251082, 'total': 231, 'passed': 42}
I know I can just send another query using .filter(score__gte=80).count(), but I was really hoping to get both the total count and the passing count on the same aggregate. Any ideas?
You are either going to need two queries, or do the aggregation manually.
To see why, let's consider the underlying SQL that Django generates and uses to query the database.
Exam.objects.aggregate(average=Avg("score"), total=Count("submitted"))
roughly translates to
SELECT AVG(score), COUNT(submitted)
FROM exam
The "Count" part of the aggregate is applying to the SELECT clause in the underlying sql query. But if we want to include only scores greater than some value, the SQL query would need to look something like this:
SELECT AVG(score), COUNT(submitted)
FROM exam
WHERE score > 80
Filtering Exams with a particular "score" is applies to the WHERE or HAVING clause of the underlying SQL statement.
Unfortunately, there is not really a way to combine these two things. So, you are stuck doing two queries.
Having said all that, if you REALLY want to do a single query, one option is to just do the aggregation in your python code:
exams = Exam.objects.all()
total_score = 0
total_submitted = 0
passed = 0
for exam in exams:
total_score += exam.score
if exam.submitted:
total_submitted += 1
if exam.score >= 80:
passed += 1
exam_aggregates = {
'average': total_score / len(exams),
'submitted': total_submitted,
'passed': passed,
}

Django ORM: using extra to order models by max(datetime field, max(datetime field of related items))

Given the following models:
class BaseModel(models.Model):
modified_date = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
class Map(BaseModel):
...
class MapItem(BaseModel):
map = models.ForeignKey(Map)
...
How do I structure my ORM call to sort Maps by the last time either the Map or one of its MapItems was modified?
In other words, how do I generate a value for each Map that represents the maximum of the Map's own modified_date and the latest modified_date of its related MapItems and sort by it without resorting to raw SQL or Python?
I tried the following query but the last_updated values are blank when my QuerySet is evaluated and I'm not quite sure why:
Map.objects.extra(select={
"last_updated": "select greatest(max(maps_mapitem.modified_date), maps_map.modified_date)
from maps_map join maps_mapitem on maps_map.id = maps_mapitem.id"}).
Thanks in advance.
Edit 0: as Peter DeGlopper points out, my join was incorrect. I've fixed the join and the last_updated values are now all equal instead of being blank:
Map.objects.extra(select={
"last_updated": "select greatest(max(maps_mapitem.modified_date), maps_map.modified_date)
from maps_map join maps_mapitem on maps_map.id = maps_mapitem.maps_id"}).
Your join is wrong. It should be:
maps_map join maps_mapitem on maps_map.id = maps_mapitem.map_id
As it stands you're forcing the PKs to be equal, not the map's PK to match the items' FKs.
edit
I suspect your subquery isn't joining against the main maps_map part of the query. I am sure there are other ways to do this, but this should work:
Map.objects.extra(select={
"last_updated": "greatest(modified_date, (select max(maps_mapitem.modified_date) from maps_mapitem where maps_mapitem.map_id = maps_map.id))"})

How to pass data dynamically to mysql query

I have following query,
SELECT t_subject.subject, SUM( t_skilllist.skill_level ) AS total_skill, t_users.first_name,
t_skilllist.skill_level
FROM `t_skilllist`
JOIN t_subject ON t_subject.id = t_skilllist.subject_id
JOIN t_users ON t_users.id = t_skilllist.user_id
WHERE t_subject.subject = 'html'
GROUP BY t_users.first_name
ORDER BY total_skill DESC
LIMIT 0 , 30
I want to display subject and skill level for each student. But, for one subject I can do that with above query. As an example for html it works. However, I want to pass more than one subject to the query dynamically. I tried to combined subjects with AND operator but it return empty result set.
How to solve this? How to pass more than two subjects to the query? I am using PHP as server side scripting language.
You can use the IN() clause.
WHERE t_subject.subject IN ('html', 'php', 'and', 'a', 'lot', 'more')