Flask-SQLAlchemy Group By Multiple Aggregates

Flask-SQLAlchemy Group By Multiple Aggregates - sqlalchemy

I want to write a query with two levels of group by in Flask-SQLAlchemy which is equivalent to the following SQL code
select right_team_id team_id
,sum(score)-sum(deductions) score from (
select left_team_id, right_team_id
,1.0*sum(case when right_win then 1 else 0 end)/count(*) score
,1.0*sum(right_deductions)/2 deductions
from races
group by left_team_id, right_team_id ) A
group by right_team_id
I started with the following for the first group by
query = Races.query.group_by(Races.left_team_id, Races.right_team_id)
.add_columns(func.sum(Races.left_deductions).label('deductions')
,func.sum(case([(Races.left_win, 1)], else_ = 0)).label('wins')
,func.count().label('races'))
But each record in query is the following (<flaskapp.races.models.Races object at 0x107c5f358>, 0, 0, 1). How can I run another group by query, including on the aggregate columns at the end?
Thanks

Not having the db, or what the underlying schema is, this is difficult to verify. Plus the original sql query and sqlalchemy query are rather different. But I think approaching this using subqueries will work well.
subq = db.session.query(Races.right_team_id.label('right_team_id'),
(func.sum(case([(Races.right_win, 1)], else_ = 0))/func.count(Races.id)).label('score'),
(func.sum(right_deductions)/2).label('deductions')).group_by(Races.left_team_id, Races.right_team_id).subquery()
q = db.session.query(Races.right_team_id, func.sum(subq.c.score)-func.sum(subq.c.deductions)).\
join(subq, subq.c.right_team_id==Races.right_team_id).group_by(Races.right_team_id)

Related

How to optimize a query that is a join of 2 very similar queries but with different aggregation

I have the following query:
SELECT OBJ_DESC_ERRORS.description, OBJ_DESC_ERRORS.object, OBJ_DESC_ERRORS.count_errors, OBJ_ERRORS.count_total FROM
(SELECT `metrics_event`.`description`, `metrics_event`.`object`, COUNT(`metrics_event`.`id`) AS `count_errors` FROM `metrics_event`
INNER JOIN `metrics_session` ON (`metrics_event`.`session_id` = `metrics_session`.`id`)
WHERE (`metrics_session`.`training_id` = 4 AND NOT (`metrics_session`.`completed_at` IS NULL) )
GROUP BY `metrics_event`.`description`, `metrics_event`.`object` ORDER BY `count_errors` DESC ) as OBJ_DESC_ERRORS
JOIN
(SELECT `metrics_event`.`object`, COUNT(`metrics_event`.`id`) AS `count_total` FROM `metrics_event`
INNER JOIN `metrics_session` ON (`metrics_event`.`session_id` = `metrics_session`.`id`)
WHERE (`metrics_session`.`training_id` = 4 AND NOT (`metrics_session`.`completed_at` IS NULL) )
GROUP BY `metrics_event`.`object` ORDER BY `count_total` DESC ) as OBJ_ERRORS
ON OBJ_DESC_ERRORS.object = OBJ_ERRORS.object
which produces the following result:
As you can see I'm basically running the same query twice. The reason for that is that I need to have that count_errors broken down by each aggregation of object + description, but I also need the count_total to be only aggregated by object. This was the way I could think of. Now I'd like to know if this is the best I can do or if it can be optimized even further.
If so I have no clue how. Googling and searching similar topics on this is difficult because the optimization task depends on the query itself, so keywords here didn't help me much.

Get rid of the inner ORDER BYs; they do nothing useful.
Rewrite the query something like this:
SELECT
me.description,
me.object,
SUM(...) AS count_errors,
SUM(...) AS count_total
FROM `metrics_event` AS me
INNER JOIN `metrics_session` AS ms ON (me.`session_id` = ms.`id`)
WHERE ms.`training_id` = 4
ms.`completed_at` IS NOT NULL
GROUP BY me.`description`, me.`object`
ORDER BY `count_total` DESC
Since a boolean expression evaluates as 1 for TRUE, else 0, devise the argument to SUM() to be a boolean expression that provides the desired COUNT.

Don't know how to create this MySQL statement in sqlalchemy

I have the following MySQL statement which does what I want:
SELECT scores.score, registrations.parade, AVG(scores.score) as result
FROM scores
JOIN registrations ON scores.registrationId=registrations.id
where registrations.parade=1
GROUP BY scores.registrationId
ORDER BY result DESC
basically, with sqlalchemy I think I would start with:
db.session.query(Scores, func.avg(Scores.score).label('result'))
This is because I do not need the information from registrations (it's linked to each other in the model). I only join the registrations in the MySQL statement because I need to filter on its parade.id
Below is what I have been trying so far but does not work:
scores = db.session.query(Scores,func.avg(Scores.score).label('result'))\
.filter(Registrations.parade == 1)\
.group_by(Scores.registrationId)\
.order_by(desc('result'))

You are missing the join
scores = db.session.query(Scores.score,func.avg(Scores.score).label('result'))\
.join(Registrations)\
.filter(Registrations.parade == 1)\
.group_by(Scores.registrationId)\
.order_by(desc('result'))
Another issue is, that you should have an aggretation function for registrations.parade or else include it in the group_by statement.

mysql find only unique records in a subquery and show the count

i have two tables i am trying to get information from.
login table - which has the list of employees
projects table - which has the projects
in short, i am trying to write a query that will select the copywriters and perform a subquery on each that will return a field dubbed 'open_projects'. This, i can get to work with the below sql:
select web_login_id,
(select count(project_web_id) from project
where copywriter = web_login_id
and (`status` = 'open' or `status` = 'qual')) as open_projects from login
where roles like '%copywriter%'
and tierLevel like '%c1%'
order by open_projects asc
This returns something like:
1982983 3
1982690 22
2987398 5
The problem with this is that sometimes 5 or 6 of the projects will belong to the same client and are not actually being worked on as they are dealt with in a queue-ish fashion.
My question is how to modify the above sql so that the subquery will GROUP subset based on the client_login_id field.
This sql gives me an error of : subquery returns more than 1 row
select web_login_id,
(select count(project_web_id) from project
where copywriter = web_login_id
and (`status` = 'open' or `status` = 'qual') group by client_login_id) as open_projects from login
where roles like '%copywriter%'
and tierLevel like '%c1%'
order by open_projects asc

You need to rephrase this as an explicit join. I think the following does the trick:
select web_login_id, cw.open_projects
from login l left outer join
(select copywriter, count(project_web_id) as open_projects
from project
where `status` in ('open', 'qual')
group by copywriter
) cw
on l.web_login_id = cw.copywriter
where l.roles like '%copywriter%' and l.tierLevel like '%c1%'
order by open_projects asc
I'm not sure what the "group by client_login_id" is doing. It doesn't seem necessary.
Once you've done this, you can return as many columns as you like from the subquery.

Converting a SQL query with group by into LINQ query

I'm stuggling to replicate a SQL query into LINQ.
Can any one help?
SQL:
SELECT tblInvoice.lngID AS InvoiceID,
tblInvoice.dtTimeStamp AS InvoiceDate,
tblInvoice.strReference,
tblInvoice.fltTotalValue,
max(Project.ProjectID) AS ProjectID,
max(Project.ProjectName) AS ProjectName,
max(Project.Location) AS ProjectLocation
FROM tblInvoice INNER JOIN
tblInvoiceLine ON tblInvoice.lngID = tblInvoiceLine.lngInvoiceID
WHERE (tblInvoice.intStatus != 0)
AND (tblInvoice.lngPersonID = #PersonID)
GROUP BY tblInvoice.lngID, tblInvoice.dtTimeStamp, strReference, fltTotalValue
ORDER BY tblInvoice.lngID DESC
LINQ so far:
var invoices = from inv in db.TblInvoices
join invLine in db.TblInvoiceLines on inv.LngID equals invLine.LngInvoiceID
where inv.IntStatus != 0
where inv.LngPersonID == personID
group inv by new {inv.LngID,inv.DtTimeStamp,inv.StrReference,inv.FltTotalValue} into newInv
Part of the problem is that I want to do a
select new Invoice(){
}
and build up my custom Invoice object but, I cant see any of the properties in newInv.
Can any one advise?

I don't have time for a full answer now, but:
To get at properties of the key, use newInv.Key.StrReference etc
To get at aggregates (e.g. max values) use newInv.Max(x => x.ProjectId) etc
Hopefully that'll be enough to get you going. Basically, newInv will be a group of entries, with an associated key (which is what you grouped by).

Doctrine issue - Different queries, same results but not with Doctrine

i'm having a little issue with doctrine using symfony 1.4 (I think it's using doctrine 1.2). I have 2 queries, using raw sql in the mysql console, they produce the same resultset. The queries can be generated using this code :
$dates = Doctrine::getTable('Picture')
->createQuery('a')
->select('substr(a.created_at,1,10) as date')
->leftjoin('a.PictureTag pt ON a.id = pt.picture_id')
->leftjoin('pt.Tag t ON t.id = pt.tag_id')
->where('a.created_at <= ?', date('Y-m-d 23:59:59'))
->orderBy('date DESC')
->groupby('date')
->limit(ITEMS_PER_PAGE)
->offset(ITEMS_PER_PAGE * $this->page)
->execute();
If I remove the two joins, it changes the query, but the resultset it's the same.
But using doctrine execute(), one produces only one row.
Somebody have an idea on what's going on here?
PS : Picture table has id, title, file, created_at (format 'Y-m-d h:i:s'), the Tag table is id, name and PictureTag is an relationship table with id and the two foreign keys.
PS 2 : Here are the two sql queries produced (the first without joins)
SELECT substr(l.created_at, 1, 10) AS l__0 FROM lupa_picture l WHERE (l.created_at <= '2010-03-19 23:59:59') GROUP BY l__0 ORDER BY l__0 DESC LIMIT 4
SELECT substr(l.created_at, 1, 10) AS l__0 FROM lupa_picture l LEFT JOIN lupa_picture_tag l2 ON (l.id = l2.picture_id) LEFT JOIN lupa_tag l3 ON (l3.id = l2.tag_id) WHERE (l.created_at <= '2010-03-19 23:59:59') GROUP BY l__0 ORDER BY l__0 DESC LIMIT 4

I had something similar this week. Doctrine's generated SQL (from the Symfony debug toolbar) worked fine in phpMyAdmin, but failed when running the query as in your question. Try adding in the following into your query:
->setHydrationMode(Doctrine::HYDRATE_SCALAR)
and see if it gives you the expected result. If so, it's down to the Doctrine_Collection using the Picture primary key as the index in the collection. If you have more than 1 result with the same index, Doctrine will refuse to add it into the collection, so you only end up with 1 result. I ended up running the query using a different table rather than the one I wanted, which resulted in a unique primary key and then the results I wanted appeared.

Well, the solution was that...besides substr(), it needs another column of the table. Using select(substr(), a.created_at) made it work

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Flask-SQLAlchemy Group By Multiple Aggregates - sqlalchemy

Related

How to optimize a query that is a join of 2 very similar queries but with different aggregation

Don't know how to create this MySQL statement in sqlalchemy

mysql find only unique records in a subquery and show the count

Converting a SQL query with group by into LINQ query

Doctrine issue - Different queries, same results but not with Doctrine

Categories

Resources