Top N Per Group with Multiple Table Joins

Top N Per Group with Multiple Table Joins - mysql

Based on my research, this is a very common problem which generally has a fairly simple solution. My task is to alter several queries from get all results into get top 3 per group. At first this was going well and I used several recommendations and answers from this site to achieve this (Most Viewed Products). However, I'm running into difficulty with my last one "Best Selling Products" because of multiple joins.
Basically, I need to get all products in order by # highest sales per product in which the maximum products per vendor is 3 I've got multiple tables being joined to create the original query, and each time I attempt to use the variables to generate rankings it produces invalid results. The following should help better understand the issue (I've removed unnecessary fields for brevity):
Product Table
productid | vendorid | approved | active | deleted
Vendor Table
vendorid | approved | active | deleted
Order Table
orderid | `status` | deleted
Order Items Table
orderitemid | orderid | productid | price
Now, my original query to get all results is as follows:
SELECT COUNT(oi.price) AS `NumSales`,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY oi.productid
ORDER BY COUNT(oi.price) DESC
LIMIT 100;
Finally, (and here's where I'm stumped), I'm trying to alter the above statement such that I received only the top 3 product (by # sold) per vendor. I'd add what I have so far, but I'm embarrassed to do so and this question is already a wall of text. I've tried variables but keep getting invalid results. Any help would be greatly appreciated.

Even though you specify LIMIT 100, this type of query will require a full scan and table to be built up, then every record inspected and row numbered before finally filtering for the 100 that you want to display.
select
vendorid, productid, NumSales
from
(
select
vendorid, productid, NumSales,
#r := IF(#g=vendorid,#r+1,1) RowNum,
#g := vendorid
from (select #g:=null) initvars
CROSS JOIN
(
SELECT COUNT(oi.price) AS NumSales,
p.productid,
p.vendorid
FROM products p
INNER JOIN vendors v ON (p.vendorid = v.vendorid)
INNER JOIN orders_items oi ON (p.productid = oi.productid)
INNER JOIN orders o ON (oi.orderid = o.orderid)
WHERE (p.Approved = 1 AND p.Active = 1 AND p.Deleted = 0)
AND (v.Approved = 1 AND v.Active = 1 AND v.Deleted = 0)
AND o.`Status` = 'SETTLED'
AND o.Deleted = 0
GROUP BY p.vendorid, p.productid
ORDER BY p.vendorid, NumSales DESC
) T
) U
WHERE RowNum <= 3
ORDER BY NumSales DESC
LIMIT 100;
The approach here is
Group by to get NumSales
Use variables to row number the sales per vendor/product
Filter the numbered dataset to allow for a max of 3 per vendor
Order the remaining by NumSales DESC and return only 100

I like this elegant solution, however when I run an adapted but similar query on my dev machine I get a non-deterministic result-set returned. I believe this is due to the way the MySql optimiser deals with assigning and reading user variables within the same statement.
From the docs:
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server.
Just adding this note here in case someone else comes across this weird behaviour.

The answer given by #RichardTheKiwi worked great and got me 99% of the way there! I am using MySQL and was only getting the first row of each group marked with a row number, while the rest of the rows remained NULL. This resulted in the query returning only the top hit for each group rather than the first three rows. To fix this, I had to initialize #r in the initvars subquery. I changed,
from (select #g:=null) initvars
to
from (select #g:=null, #r:=null) initvars
You could also initialize #r to 0 and it would work the same. And for those less familiar with this type of syntax, the additional section is reading through each sorted group and if a row has the same vendorid as the previous row, which is tracked with the #g variable, it increments the row number, which is stored in the variable #r. When this process reaches the next group with a new vendorid, the IF statement will no longer evaluate as true and the #r variable (and thereby the RowNum) will be reset to 1.

Related

MYSQL: Using math in SELECT with alias

I have an extremely complex SQL query that I am needing help with. Essentially, this query will see how many total assignments a student is assigned (total) and how many they have completed (completed) for the course. I need one final column that would give me the percentage of completed assignments, because I want to run a query to select all users who have completed less than 50% of their assignments.
What am I doing wrong? I am getting an error "Unknown column 'completed' in 'field list'"
Is there a better way to execute this? I am open to changing my query.
Query:
SELECT students.usid AS ID, students.firstName, students.lastName,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(
assignGrades.submitted IS NOT NULL OR
(assignGrades.score IS NOT NULL AND CASE WHEN assignments.points > 0 THEN assignGrades.score ELSE 1 END > 0)
)
) AS completed,
(
SELECT COUNT(workID) FROM assignGrades
INNER JOIN students ON students.usid = assignGrades.usid
INNER JOIN assignments ON assignments.assID = assignGrades.assID
WHERE
assignGrades.usid = ID AND
assignments.subID = 4 AND
(NOW() - INTERVAL 5 HOUR) > assignments.assigned
) AS total,
(completed/total)*100 AS percentage
FROM students
INNER JOIN profiles ON profiles.usid = students.usid
INNER JOIN classes ON classes.ucid = profiles.ucid
WHERE classes.utid=2 AND percentage < 50
If I cut the (percentage) part in the SELECT statement, the query runs as expected. See below for results.
Information about the tables involved in this query:
assignGrades: Lists the student's score for each assignment.
assignments: List the assignments for each course.
students: Lists student information
classes: Lists class information
profiles: Links a student to a class

If you need to check when value is >50% but you don't need to see it, you might use a different approach using HAVING clause
SELECT (now) AS completed, (totalassignments) AS total
FROM db
HAVING (completed/total)*100 > 50;

MYSQL LEFT JOIN returning all data as NULL

My mysql version is 5.7.32.
I realize this has been asked many times, and I've tried many post answer without succeeding. Thank you in advance.
This is my query at the moment, which returns all from LEFT JOIN as NULL.
SELECT %playlists%.*, tracks.*
FROM %playlists%
LEFT JOIN (
SELECT *
FROM %tracks%
ORDER BY timestamp DESC
LIMIT 1
) AS tracks ON tracks.id_playlist=%playlists%.id
WHERE %playlists%.owner='.$id_owner.'
ORDER BY %playlists%.name ASC
My tables are ex
%playlist%
name |id |owner|
relaxing music | 1 | 3 |
%tracks%
id_playlist|timestamp |tracks|
1 |1234958574| 200
1 |1293646887| 300
I want to include the latest timestamp from %tracks%

I want to include the latest timestamp from %tracks%
In MySQL 5.7, I would recommend filtering the left join with a correlated subquery that brings the latest timestamp for the current playlist:
select p.*, t.timestamp, t.tracks
from playlists p
left join tracks t
on t.id_playlist = p.id
and t.timestamp = (select max(t1.timestamp) from tracks t1 where t1.id_playlist = p.id)
where p.owner = ?
order by p.name
Note that I removed the percent signs around the table names (that's not valid SQL), and that I used table aliases (p and t), which make the query easier to write and read. I also used a placeholder (?) to represent the query parameter; concatenating variables in the query string is bad practice, prepared statements should be preferred.

Performance: Double nested subquery unknown column

If I do COUNT()/AVG() in the first subquery MySQL process all rows inside the table, because of that reason it is necessary to filter at from all rows with another subquery.
As example if I have 3 rows, but only 1 row has the id which should get count, MySQL process all 3 rows (according to EXPLAIN) and does the where clause after.
If I'm able to select in a double nested sub query this single row and call the count outside it would be a lot better performance wise.
The problem MySQL does not allow using outer values in a second level subquery.
Simple example of my code:
SELECT
pr.id, pr.catid, ...
(
SELECT COUNT(pra.id)
FROM (
SELECT id
FROM productsrating
WHERE pr.id = productid
) pra
) as ratingcount,
...
FROM
(
SELECT id, ...
FROM products
WHERE active = 1
) pr
-> Unknown column pr.id
I do also tried to use the COUNT in the main select but it isn't allowed to have multiple values inside a subquery.
Edit: I have an index on productid.
EDIT2 SOLUTION:
Sorry at all its working fine with the first single subquery, server problems caused bad behavior.

It seems you want the count of ratings occurring for an active product. Is this correct?
So; why is a simple left join not working? The count of PRA should be based on only those products which are active; so index usage should work here.
I'd need to see sample data / expected results to figure out the overall goal here.
SELECT PR.*, count(PRA.ID)
FROM products PR
LEFT JOIN productsRating PRA
on PR.ID = PRA.ProductID
WHERE PR.Active = 1
GROUP BY PR.*
Substitute all fields needed for PR.*
Maybe this... seems like an odd thing to have to do to get the products rating to be filtered before the average/count is done though.
SELECT PR.*, count(PRA.ID)
FROM products PR
LEFT JOIN (SELECT * FROM productsRating PRI
WHERE EXISTS (SELECT 1
FROM Products P
WHERE active = 1 and PRI.ProductID = P.ID)) PRA
on PR.ID = PRA.ProductID
WHERE PR.Active = 1
GROUP BY PR.*

Try count with distinct
SELECT
pr.id, pr.catid, ...
(
SELECT COUNT(Distinct productsrating.id)
FROM productsrating
WHERE pr.id = productid
) as ratingcount,
...
FROM
(
SELECT id, ...
FROM products
WHERE active = 1
) pr

MySQL Inner join naming error?

http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?

It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`

--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10

COUNT evaluate to zero if no matching records

Take the following:
SELECT
Count(a.record_id) AS newrecruits
,a.studyrecord_id
FROM
visits AS a
INNER JOIN
(
SELECT
record_id
, MAX(modtime) AS latest
FROM
visits
GROUP BY
record_id
) AS b
ON (a.record_id = b.record_id) AND (a.modtime = b.latest)
WHERE (((a.visit_type_id)=1))
GROUP BY a.studyrecord_id;
I want to amend the COUNT part to display a zero if there are no records since I assume COUNT will evaluate to Null.
I have tried the following but still get no results:
IIF(ISNULL(COUNT(a.record_id)),0,COUNT(a.record_id)) AS newrecruits
Is this an issue because the join is on record_id? I tried changing the INNER to LEFT but also received no results.
Q
How do I get the above to evaluate to zero if there are no records matching the criteria?
Edit:
To give a little detail to the reasoning.
The studies table contains a field called 'original_recruits' based on activity before use of the database.
The visits tables tracks new_recruits (Count of records for each study).
I combine these in another query (original_recruits + new_recruits)- If there have been no new recruits I still need to display the original_recruits so if there are no records I need it to evalulate to zero instead of null so the final sum still works.

It seems like you want to count records by StudyRecords.
If you need a count of zero when you have no records, you need to join to a table named StudyRecords.
Did you have one? Else this is a nonsense to ask for rows when you don't have rows!
Let's suppose the StudyRecords exists, then the query should look like something like this :
SELECT
Count(a.record_id) AS newrecruits -- a.record_id will be null if there is zero count for a studyrecord, else will contain the id
sr.Id
FROM
visits AS a
INNER JOIN
(
SELECT
record_id
, MAX(modtime) AS latest
FROM
visits
GROUP BY
record_id
) AS b
ON (a.record_id = b.record_id) AND (a.modtime = b.latest)
LEFT OUTER JOIN studyrecord sr
ON sr.Id = a.studyrecord_id
WHERE a.visit_type_id = 1
GROUP BY sr.Id

I solved the problem by amending the final query where I display the result of combining the original and new recruits to include the IIF there.
SELECT
a.*
, IIF(IsNull([totalrecruits]),consents,totalrecruits)/a.target AS prog
, IIf(IsNull([totalrecruits]),consents,totalrecruits) AS trecruits
FROM
q_latest_studies AS a
LEFT JOIN q_totalrecruitment AS b
ON a.studyrecord_id=b.studyrecord_id
;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Top N Per Group with Multiple Table Joins - mysql

Related

MYSQL: Using math in SELECT with alias

MYSQL LEFT JOIN returning all data as NULL

Performance: Double nested subquery unknown column

MySQL Inner join naming error?

COUNT evaluate to zero if no matching records

Categories

Resources