MySQL SUM top n values for several columns and group - mysql

I have a MySQL table containing player points for serveral categories (p1, p2 etc) and player id (pid).
I have a query that counts SUM of points for each category, puts them as aliases and groups them by player id (pid).
SELECT *,
SUM(p1) as p1,
SUM(p2) as p2,
SUM(p3) as p3,
SUM(p4) as p4,
SUM(p6) as p6,
SUM(p13) as p13,
SUM(p14) as p14,
SUM(p15) as p15,
SUM(p16) as p16,
SUM(p17) as p17,
SUM(p18) as p18,
SUM(p19) as p19,
SUM(p20) as p20,
SUM(p21) as p21
FROM results GROUP BY pid
Futher I do a while loop and update other table with these alias values.
Now I have a need to count only top 5 or 12 (depending on a category) values for each group. I don't know where to start. I found similar questions, but none of them addresses putting value in an alias, so i don't have to change futher code.
Can someone help me, and write an example query for at least two categories, so i can understand a principle of doing this right?
Thank you in advance!

As we need to do sum of top n records, we need to use something like this:
SELECT pid, sum(p1)
FROM (SELECT p.*,
(#pn := if(#p = pid, #pn + 1,
if(#p := pid, 1, 1)
)
) as seqnum
FROM player p CROSS JOIN
(SELECT #p := 0, #pn := 0) as p1
ORDER BY pid, p1 DESC
) p
WHERE seqnum <= 1
GROUP BY pid;
Here, we can modify seqnum <= 1 condition as per the number of records needed. E.g. if we want 5 records then we need to write seqnum <= 5.
Please note that this will only calculate Top n sum for a particular field. If we want multiple fields then we may need to repeat the query.
Here is the SQL Fiddle example to play around with.

Building on the answer by #DarshanMehta , you can do repeated sub queries like that. Note that the variable names in each sub query need to be different.
Something like this, assuming you have a table of players:-
SELECT players.pid,
suba1.p1sum,
suba2.p2sum
FROM players
LEFT OUTER JOIN
(
SELECT pid, SUM(p1) AS p1sum
FROM (SELECT r.pid,
r.p1,
#p1n := if(#p1 = pid, #p1n + 1, 1) AS seqnum,
#p1 := pid
FROM results r
CROSS JOIN (SELECT #p1 := 0, #p1n := 0) as p1
ORDER BY r.pid, r.p1 DESC
) sub1
WHERE seqnum <= 5
GROUP BY pid
) suba1
ON players.pid = suba1.pid
LEFT OUTER JOIN
(
SELECT pid, SUM(p2) AS p1sum
FROM (SELECT r.pid,
r.p2,
#p2n := if(#p2 = pid, #p2n + 1, 1) AS seqnum,
#p2 := pid
FROM results r
CROSS JOIN (SELECT #p2 := 0, #p2n := 0) as p2
ORDER BY r.pid, r.p2 DESC
) sub1
WHERE seqnum <= 5
GROUP BY pid
) suba2
ON players.pid = suba1.pid

You can build a table with all that SUM information, and use this one:
SELECT * from newTable ORDER BY p1 DESC LIMIT 5;
and you can catch all info that you want, by changing the field p1 and LIMIT 5

Related

Double Aggregate Function Mysql

I want to take the maximum value from a series of returned values but I can't figure out a simple way to do it. My query returns all rows so 1/2 way there. I can filter it down with PHP but I'd like to do it all in SQL. I tried with a max subquery but that returned all results still.
DDL:
create table matrix(
count int(4),
date date,
product int(4)
);
create table products(
id int(4),
section int(4)
);
DML:
select max(magic_count), section, id
from (
select sum(count) as magic_count, p.section, p.id
from matrix as m
join products as p on m.product = p.id
group by m.product
) as faketable
group by id, section
Demo with my current try.
Only ids 1 and 3 should be returned from the sample data because they have the highest cumulative count for each of the sections.
Here's a second SQL fiddle that demonstrates the same issue.
Here you go:
select a.id,
a.section,
a.magic_count
from (
select p.id,
p.section,
magic_count
from (
select m.product, sum(count) as magic_count
from matrix m
group by m.product
) sm
join products p on sm.product = p.id
) a
left join (
select p.id,
p.section,
magic_count
from (
select m.product, sum(count) as magic_count
from matrix m
group by m.product
) sm
join products p on sm.product = p.id
) b on a.section = b.section and a.magic_count < b.magic_count
where b.id is null
see a simplified example (and other methods) in the manual entry for The Rows Holding the Group-wise Maximum of a Certain Column
see it working live here
Here you have solution without using JOINs, it has better performance than the other answer, which uses lot of JOINs:
select #rn := 1, #sectionLag := 0;
select id, section, count from (
select id,
case when #sectionLag = section then #rn := #rn + 1 else #rn := 1 end rn,
#sectionLag := section,
section,
count
from (
select id, section, sum(count) count
from matrix m
join products p on m.product = p.id
group by id, section
) a order by section, count desc
) a where rn = 1
Variables at the beginning are used to imitate window functions (LAG and ROW_NUMBER), which are available in MySQL 8.0 or higher (if you are using such version, let me know, so I will give you solution also with window functions).
DEMO
Another demo, where you can compare performance of my and the other query. It contains ~20K rows and my query tends to be almost 2 times faster.

insert / update records from one table to another table, no clear join

I have a list of sku's in one table that I need to assign to product id's in another table the same way that one would in excel, by copying records from a column of sku's and pasting it next to the a column of product id's starting at the first row. I'd like to do this with an update query or other.
table1: tmp_pid
fields: pid, sku
This is where I have a random number of pid records. The sku field is empty. I'm trying to fill it with date from the next table.
table2: tmp_sku
fields: sku, used
This is where I keep a very long list of unique sku's and whether they have been used.
I tried this query but it does not work ([Err] 1054 - Unknown column 'tmp_sku.sku' in 'IN/ALL/ANY subquery')
UPDATE tmp_pid
SET tmp_pid.sku = tmp_sku.sku
WHERE tmp_sku.sku IN (SELECT sku FROM tmp_sku WHERE used = NO )
Table1 can have 20 or 1000 pid records, Table2 has 10000 unused sku's. I only need to copy the needed sku's next to the 20-1000 pid records in Table1. I know there is no connecting key between the two, but I am limited to this structure.
If I understand correctly, you want to get this result:
select p.*, s.sku
from (select p.*, (#rnp := #rnp + 1) as n
from tmp_pid p cross join (select #rnp := 0) params
order by pid
) p join
(select s.*, (#rns := #rns + 1) as n
from tmp_sku s cross join (select #rns := 0) params
where used = 'NO'
order by sku
) s
on p.n = s.n;
If so, you can adapt this to an update:
update tmp_pid p join
(select p.*, (#rnp := #rnp + 1) as n
from tmp_pid p cross join (select #rnp := 0) params
order by pid
) pp
on p.pid = pp.pid join
(select s.*, (#rns := #rns + 1) as n
from tmp_sku s cross join (select #rns := 0) params
order by sku
) s
on pp.n = s.n
set p.sku = s.sku;

How to limit results from a SQL subquery or join

Lets imagine I have 2 tables in MySQL, one called Vehicle and the other called Passenger.
If I want a complete list of all Vehicles and their passengers then I can do something like this:
SELECT *
FROM Vehicle v
LEFT
JOIN Passenger p
ON p.VehicleID = v.VehicleID
LIMIT 0,100
The problem here is lets imagine that my vehicles are buses, and the first has 50 passengers, the 2nd bus has 40 and the 3rd has 30. The Limit 100 on the above query would give me a partial list of passengers on the 3rd bus.
Is there a way create such a query that won't split the results from the joined table?
Or alternatively can you apply LIMITS separately to the different tables? So I could say I want a limit of 10 vehicles and a limit of 50 passengers per vehicle?
Logically something like this:
SELECT * FROM Vehicle (LEFT JOIN Passenger ON Passenger.VehicleID = Vehicle.VehicleID LIMIT 0,50) LIMIT 0, 10
I was wondering if this could be achieved using some kind of subquery? Maybe something like:
SELECT *, (SELECT * FROM Passenger WHERE Passenger.VehicleID = Vehicle.VehicleID LIMIT 0,50) FROM Vehicle LIMIT 0, 10
But this doesn't work (The subquery is only allowed to return a single row).
Thanks in advance.
In MySQL, the easiest way to do what you want is using variables to enumerate the rows:
SELECT *
FROM (SELECT v.*, (#rnv := #rnv + 1) as seqnum_v
FROM Vehicle v CROSS JOIN
(SELECT #rnv := 0) params
) v LEFT JOIN
(SELECT p.*,
(#rnp := if(#v = VehicleId, #rnp + 1,
if(#v := VehicleId, 1, 1)
)
) as seqnum_p
FROM Passenger p CROSS JOIN
(SELECT #v := -1, #rnp := 0) params
) p
ON p.VehicleID = v.VehicleID
WHERE seqnum_v <= 10 and seqnum_p <= 50;

#rownum not incrementing as expected (selecting n per group)

What I'm expecting is for the tmp.rank to increment 1-10 for each userid before moving on to the next userid however all I'm getting is it staying on 1 for every record thus not limiting 10 items per userid.
Any ideas what I'm doing wrong, most probably something simple and obvious or more likely using the SQL in a way it's not intended.
SELECT DISTINCT
tmp.title,
tmp.content,
tmp.postid,
tmp.userid,
tmp.screenname,
tmp.email
FROM
(
SELECT
qp.title,
qp.content,
qp.postid,
ut.userid,
ut.screenname,
ut.email,
qp.created,
#rownum := IF( #prev = ut.userid, #rownum+1, 1 ) AS rank,
#prev := ut.userid
FROM
user_table AS ut JOIN (SELECT #rownum := NULL, #prev := 0) AS r ,
qa_posts AS qp,
qa_categories AS qc,
expatsblog_country AS cc
WHERE
LOWER(ut.country_of_expat) = LOWER(qc.title)
AND ut.setting_notifications IN (3)
AND ut.valid=1
AND ut.confirm_email = 1
AND qc.categoryid = qp.categoryid
AND qp.type='Q'
AND DATE(qp.created)>=DATE_SUB(NOW(), INTERVAL 24 HOUR)
ORDER BY ut.userid,qp.created ASC
) AS tmp
WHERE tmp.rank < 10
ORDER BY tmp.userid, tmp.created ASC
I've done many queries in the past, especially responding to posts out here regarding MySQL and # variables. One of the problems I've encountered is the return order of data when applying to the rows.
Your original query DID have an order by in the inner-most query, but I've encountered times where the #variable assignment is not truly respecting it, and resets the counter because it goes to some other (in this case) user, then later, encounters more records for the first and resets counter back to one even though it occurred later on.
Next, you are applying DISTINCT to the end. I would try pulling DISTINCT to the inner-most so you are not getting 10 records for some key, same user and ending up with only 2 records returned.
That said, I would adjust the query to what I have below. The inner-most just grabs DISTINCT on the columns you want, then apply the #variable assignments.
SELECT DISTINCT
tmp.title,
tmp.content,
tmp.postid,
tmp.userid,
tmp.screenname,
tmp.email,
#rownum := IF( #prev = ut.userid, #rownum+1, 1 ) AS rank,
#prev := ut.userid
FROM
( SELECT DISTINCT
qp.title,
qp.content,
qp.postid,
ut.userid,
ut.screenname,
ut.email
FROM
user_table AS ut
JOIN qa_categories AS qc
ON LOWER( ut.country_of_expat ) = LOWER( qc.title )
JOIN qa_posts AS qp
ON qc.categoryid = qp.categoryid
AND qp.type='Q'
AND DATE(qp.created)>=DATE_SUB(NOW(), INTERVAL 24 HOUR)
WHERE
ut.setting_notifications = 3
AND ut.valid = 1
AND ut.confirm_email = 1
ORDER BY
ut.userid,
qp.created ASC ) AS tmp,
( SELECT #rownum := NULL,
#prev := 0) AS r
HAVING
tmp.rank < 10
ORDER BY
tmp.userid
I did not see any references to "cc" being joined anywhere which would have
cause a Cartesian result giving a record for EACH entry in expatsblog_country
so I removed it... If it IS needed, put where applicable and put JOIN condition too.
( removed expatsblog_country AS cc )
Also, instead of a WHERE clause, I changed to a HAVING clause, so this way all returned records are CONSIDERED for the final result set. This will ensure the #rownum will keep incrementing when it encounters a POSSIBLE entry, but having will throw all those greater than 10 out.
Finally, since the inner table was already pre-ordered by user and created date, you should not need the explicit re-ordering AGAIN in the outer... maybe just the UserID as I have it.
I am going to take a guess here at what the problem is.
You are combining two types of join syntax and that could be causing your issue.
You are using both a JOIN and then commas between your tables. You are using a JOIN between the user_table and the user variables and then a comma between the remaining tables.
FROM user_table AS ut JOIN (SELECT #rownum := NULL, #prev := 0) AS r ,
qa_posts AS qp,
qa_categories AS qc,
expatsblog_country AS cc
While your WHERE clause includes the columns to join on for most tables. It looks like you have no join condition for the table expatsblog_country. When you are joining tables, you should use one type of syntax and not mix them.
I would suggest something similar to this. I did not see any join condition fo the expatsblog_country table so I used a CROSS JOIN to join those tables to the a subquery of the others. If you have a column to join the expatsblog_country to any of the others, then move that query into the subquery:
SELECT DISTINCT tmp.title,
tmp.content,
tmp.postid,
tmp.userid,
tmp.screenname,
tmp.email
FROM
(
SELECT src.title,
src.content,
src.postid,
src.userid,
src.screenname,
src.email,
src.created,
#rownum := IF( #prev = src.userid, #rownum+1, 1 ) AS rank,
#prev := src.userid
FROM
(
select ut.userid,
ut.screenname,
ut.email,
qp.title,
qp.content,
qp.postid,
qp.created
from user_table AS ut
join qa_categories AS qc
on LOWER(ut.country_of_expat) = LOWER(qc.title)
join qa_posts AS qp
on qc.categoryid = qp.categoryid
where ut.setting_notifications IN (3)
and ut.valid=1
and ut.confirm_email = 1
and qp.type='Q'
and DATE(qp.created)>=DATE_SUB(NOW(), INTERVAL 24 HOUR)
) src
CROSS JOIN
(
SELECT #rownum := 0, #prev := 0
) AS r
CROSS JOIN expatsblog_country AS cc
ORDER BY src.userid, src.created ASC
) AS tmp
WHERE tmp.rank < 10
ORDER BY tmp.userid, tmp.created ASC
Try changing
( SELECT #rownum := NULL,
#prev := 0) AS r
so that you initialize #rownum to 0, and #prev to NULL instead.
Figured out what this was
expats_country was a not supposed to be part of the query
and more importantly:
the "DATE()" around qp.created was messing this up and the actual "last 24 hour" part of the query too. Removed this and it was fine.

Auto-increment with Group BY

I have two tables as follows:
Contract
|
Contractuser
My job was to fetch latest invoice date for each contract number from Contractuser table and display results. The resultant table was as follows:
Result Table
Now I wanted to get a auto-increment column to display as the first column in my result set.
I used the following query for it:
SELECT #i:=#i+1 AS Sno,a.ContractNo,a.SoftwareName,a.CompanyName,b.InvoiceNo,b.InvoiceDate,
b.InvAmount,b.InvoicePF,max(b.InvoicePT) AS InvoicePeriodTo,b.InvoiceRD,b.ISD
FROM contract as a,contractuser as b,(SELECT #i:=0) AS i
WHERE a.ContractNo=b.ContractNo
GROUP BY b.ContractNo
ORDER BY a.SoftwareName ASC;
But it seems that the auto-increment is getting performed before the group by procedure because of which serial numbers are getting displayed in a non contiguous manner.
GROUP BY and variables don't necessarily work as expected. Just use a subquery:
SELECT (#i := #i + 1) AS Sno, c.*
FROM (SELECT c.ContractNo, c.SoftwareName, c.CompanyName, cu.InvoiceNo, cu.InvoiceDate,
cu.InvAmount, cu.InvoicePF, max(cu.InvoicePT) AS InvoicePeriodTo, cu.InvoiceRD, cu.ISD
FROM contract c JOIN
contractuser as b
ON c.ContractNo = cu.ContractNo
GROUP BY cu.ContractNo
ORDER BY c.SoftwareName ASC
) c CROSS JOIN
(SELECT #i := 0) params;
Notes:
I also fixed the JOIN syntax. Never use commas in the FROM clause.
I also added reasonable table aliases -- abbreviations for the tables. a and b don't mean anything, so they make the query harder to follow.
I left the GROUP BY with only one key. It should really have all the unaggregated keys but this is allowed under some circumstances.
SELECT #row_no := IF(#prev_val = lpsm.lit_no, #row_no + 1, 1) AS row_num,#prev_val := get_pad_value(lpsm.lit_no,26) LAWSUIT_NO,lpsm.cust_rm_no
FROM lit_person_sue_map lpsm,(SELECT #row_no := 0) x,(SELECT #prev_val := '') y
ORDER BY lpsm.lit_no ASC;
This will return sequence number group by lit_no;