Thank you stackoverflow community! I have learned SO much from you over the years and I've recently created an account. I hope the answer to this question isn't obviously somewhere already, but I am going to go crossed eyed if I read another post. Here's my problem:
I recently used a nested SELECT to get the highest score for each of my students from a table. I did so by a little trick another post taught me. I can't find the exact post I learned it from, but here is a snippet that's essentially the same trick. I imagine, if you are well versed in sql, it's nothing new to you:
SELECT id, authorId, answer, votes
FROM ( SELECT id, authorId, answer, votes
FROM answers
ORDER BY votes DESC) AS h
GROUP BY authorId
The ORDER BY ____ DESC makes the last value, the highest overwrite all previous, so you end up with only it...if I understand correctly. So, that was great and I tailored it to my needs. The only problem is, now, I'd like to add one more feature to it and I'm racking my brain cells over it. I'm hoping some generous person will just straighten me out. I want to get a complete list of students from my "rosters" table and if there is no score for a given student, in my "holder" table, I'd like it to display a "0". Here is what I have, and I don't know exactly how to tweak it to do just that:
SELECT *
FROM (
SELECT
holder.id,
#IFNULL(holder.score, 0) AS score,
holder.score AS score,
holder.total,
holder.student_id AS stu_id,
holder.date AS date,
users.firstname AS first,
users.lastname AS last,
users.stu_number AS stuno,
assignments.name AS test,
rosters.p_id,
preps.period AS period,
preps.user
FROM holder
JOIN rosters
ON rosters.stu_id = holder.student_id
JOIN users
ON users.id = holder.student_id
JOIN assignments
ON assignments.id = holder.t_id
JOIN preps
ON preps.id = rosters.p_id
WHERE holder.visible = 0
AND preps.user = 1
AND assignments.user = 1
AND holder.t_id = 1
AND preps.period = 2
ORDER BY score DESC
) x
GROUP BY stuno
ORDER BY last
You can see that line I commented out is one of my feeble attempts to get it to display a "0" if NULL, but it's not working. I get a complete list, but if the score isn't found for a student, that student isn't showing up in my list. Anyone have a solution/idea for me to try?
Am I overusing JOINs and making my life harder than it needs to be? I'm mostly self-taught, so I know I have some holes in the fundamentals. It hasn't stopped me from creating some crazy cool projects though...but every now and then I'm sure I'm causing myself some unnecessary grief.
///////////////////////////////////////////////
Here is what I have done with the answer below, so that it grabs info from my tables:
SELECT au.stu_id,
COALESCE(t.id, 0) as id,
COALESCE(t.score , 0) as score
FROM rosters au
LEFT JOIN (
SELECT *
FROM (
SELECT a.*,
#rownum := if(#prev_value = student_id,
#rownum + 1,
1) rn,
#prev_value := student_id as prev
FROM holder a,
(SELECT #rownum := 0, #prev_value := '') r
ORDER BY student_id, score DESC
) T
WHERE T.rn = 1) T
ON au.stu_id = T.student_id
So, this is working great, except it doesn't show students who don't have scores for a given test. If their score isn't found in the "holder" table, I'd like it to show up as a "0".
/////////////////
Wait a minute! I may have mispoke...I think it is working correctly. I'll need to tweak a few things and get back to you. By the way, thanks SO much for taking the time to help me!
Your first aproach only work because a bad design on MySQL.
The right aproach should be
SQL Fiddle Demo
SELECT a.id, a.authorId, a.answer, a.votes
FROM ( SELECT authorId,
MAX(votes) as votes
FROM answers
GROUP BY authorId ) AS h
JOIN answers a
ON a.authorId = h.authorId
AND a.votes = h.votes;
OUTPUT
| id | authorId | answer | votes |
|----|----------|--------|-------|
| 2 | a | x2 | 21 |
| 4 | b | x1 | 23 | ==>
| 5 | b | x2 | 23 | ==> duplicates max value are possible
But this have issue if several answer has same score. You need to include some logic to decide which one to show.
Also you can use variable to get the highest score.
SELECT *
FROM (
SELECT a.*,
#rownum := if(#prev_value = authorId,
#rownum + 1,
1) rn,
#prev_value := authorId as prev
FROM answers a,
(SELECT #rownum := 0, #prev_value := '') r
ORDER BY authorId, votes DESC
) T
WHERE T.rn = 1;
OUTPUT
| id | authorId | answer | votes | rn | prev |
|----|----------|--------|-------|----|------|
| 4 | b | x1 | 23 | 1 | b | => only one is show but would
| 2 | a | x2 | 21 | 1 | a | be random unless you specify some rule.
Now for your question you also need to use LEFT JOIN instead of JOIN to get the students without scores.
Something like this
SELECT au.authorId,
COALESCE(t.id, 0) as id,
COALESCE(t.answer , 0) as answer ,
COALESCE(t.votes , 0) as votes
FROM authors au
LEFT JOIN (
SELECT *
FROM (
SELECT a.*,
#rownum := if(#prev_value = authorId,
#rownum + 1,
1) rn,
#prev_value := authorId as prev
FROM answers a,
(SELECT #rownum := 0, #prev_value := '') r
ORDER BY authorId, votes DESC
) T
WHERE T.rn = 1) T
ON au.authorId = T.authorId
OUTPUT
| authorId | id | answer | votes |
|----------|----|--------|-------|
| a | 2 | x2 | 21 |
| b | 4 | x1 | 23 |
| c | 0 | 0 | 0 |
Related
Simple table userpoints:
userid | points
1 | 456
2 | 3
3 | 1778
... | ...
I used this function for years in MySQL 5 to receive the userrank:
SELECT userid, userrank FROM
(SELECT #row_number:=#row_number+1 AS userrank, userid
FROM `userpoints`, (SELECT #row_number := 0) r
ORDER BY points DESC) t
WHERE userid = 123
And it returned the userrank for userid 123, e.g. 3456.
With MySQL 8 I only get 1 as value for userrank with each userid I try.
What is the problem and how to fix this?
I tried the inner SELECT alone, and this gives me the list of all userids with the correct userranks.
In MySQL 8, setting user variables as side-effects in expressions is deprecated. You should use window functions instead.
SELECT t.userid, t.userrank
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY points DESC) AS userrank, userid
FROM `userpoints`
) t
WHERE t.userid = 123;
I have two tables: game and tasks
Game looks like this:
| step | manualTaskCounter | autoTaskCounter | (and other)
----------------------------------------------------------
| 1 | 3 | 1 | ...
----------------------------------------------------------
Tasks looks like this:
| id | taskType | taskContent |
-------------------------------
| 1 | M | abc |
| 2 | M | cde |
| 3 | A | efg |
| 4 | M | jpq |
Since tasks holds both, manual (with M taskType) and automatic (A) tasks I want to select. My API holds two variables: mTaskCounter and aTaskCounter. for example if mTaskCounter = 3 I want to select 3rd row of type manualTask from tasks. Since it is in fact row with id = 4 I can not use id in WHERE clause.
What I already achieved is:
SELECT
id,
taskType,
taskContent,
(#row:=#row + 1) as rowNumber,
g.manualTaskCounter as mTaskCounter
FROM
tasks t,
(SELECT #ROW:=0) AS r,
(SELECT manualTaskCounter FROM game) AS g
WHERE
g.manualTaskCounter = rowNumber
This says "unknown column 'rowNumber' in where clause
I also tried to use LEFT JOIN:
SELECT
id,
taskType,
taskContent,
(#row:=#row + 1) as rowNumber,
g.manualTaskCounter as mTaskCounter
FROM
tasks t,
(SELECT #ROW:=0) AS r
LEFT JOIN
`game` g ON g.manualTaskCounter = rowNumber
Same result. It's been a while since I used mysql everyday and dont know how to fix it. I also think to make two tables - manualTasks and autoTasks instead of tasks so it qould solve the problem by common select taskContent from autoTasks a LEFT JOIN game ON a.id = game.autoTaskCounter
For approaching your goal, first you will need to make derived tables for both manual and automatic tasks. Next queries will made those tables adding up the row number too:
Table With Manual Tasks
SELECT
t.id,
t.taskType,
t.taskContent,
(#row_num := #row_num + 1) AS rowNum
FROM
tasks AS t
CROSS JOIN
(SELECT #row_num := 0) AS r
WHERE
taskType = 'M'
Table With Automatic Tasks
SELECT
t.id,
t.taskType,
t.taskContent,
(#row_num := #row_num + 1) AS rowNum
FROM
tasks AS t
CROSS JOIN
(SELECT #row_num := 0) AS r
WHERE
taskType = 'A'
Now, all you need to do is join those derived tables with the game table on the adequate columns:
Select manual task number X using the manualTaskCounter field
SELECT
mTasks.*
FROM
game AS g
INNER JOIN
( SELECT
t.id,
t.taskType,
t.taskContent,
(#row_num := #row_num + 1) AS rowNum
FROM
tasks AS t
CROSS JOIN
(SELECT #row_num := 0) AS r
WHERE
taskType = 'M' ) AS mTasks ON mTasks.rowNum = g.manualTaskCounter
Select automatic task number X using the autoTaskCounter field
SELECT
aTasks.*
FROM
game AS g
INNER JOIN
( SELECT
t.id,
t.taskType,
t.taskContent,
(#row_num := #row_num + 1) AS rowNum
FROM
tasks AS t
CROSS JOIN
(SELECT #row_num := 0) AS r
WHERE
taskType = 'A' ) AS aTasks ON aTasks.rowNum = g.autoTaskCounter
Check the next online example:
DB Fiddle Example
I have 3 tables:
matchdays:
matchday_id | season_id | userid | points | matchday
----------------------------------------------------
1 | 1 | 1 | 33 | 1
2 | 1 | 2 | 45 | 1
etc
players
userid | username
-----------------
1 | user1
2 | user2
etc.
seasons
seasons_id | title | userid
----------------------------
1 | 2011 | 3
2 | 2012 | 10
3 | 2013 | 5
My query:
SELECT s.title, p.username, SUM(points) FROM matchdays m
INNER JOIN players p ON p.userid = m.userid
INNER JOIN seasons s ON m.userid = s.userid
group by s.season_id
This results in (example!):
title | username | SUM(points)
------------------------------
2011 | user3 | 3744
2012 | user10 | 3457
2013 | user5 | 3888
What it should look like is a table with the winner (max points) of every season. Right now, the title and username is correct, but the sum of the points is way too high. I couldn't figure out what sum is calculated. Ideally, the sum is the addition of every matchday of a season for every user.
Your main issue is that you group by seasons only. Thus your SUM is running on all points over a season, regardless of the player.
The whole approach is wrong anyway. The "flaw" with userid in the season table is your biggest issue, and you seem to know it.
I will explain you how to calculate your rankings in the database one time for all, and to have them at your disposal at all times, which will save you a lot of headaches, and obviously save some CPU and loading times as well.
Start by creating a new table "Rankings":
CREATE table rankings (season_id INT, userid INT, points INT, rank INT)
If you have a lot of players, index all columns but points
Then, populate the table for each season:
This is a oneshot operation to run each time a season has ended.
So for the time being, you will have to run it several times for each season.
The key here is to compute the rank of each player for the season, which is a must-have that will be super-handy for later. Because MySQL doesnt have a window function for that, we have to use an old trick : incrementing a counter.
I decompose.
This will compute the points of a season, and provide the ranking for that season:
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
ORDER BY points DESC
Now we adapt this query to add a rank column :
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
That's it.
Now we can INSERT the results of this computation into our ranking table, to store it once for good :
INSERT INTO rankings
SELECT
season_id, userid, points,
#curRank := #curRank + 1 AS rank
FROM
(
SELECT season_id, userid, SUM(points) as points
FROM matchdays
WHERE season_id = 1
GROUP BY season_id, userid
) T,
(
SELECT #curRank := 0
) R
ORDER BY T.points DESC
Change the season_id = 1 and repeat for each season.
Save this query somewhere, and in the future, run it once each time a season has ended.
Now you have a proper database-computed ranking and a nice ranking table that you can query whenever you want.
You want the winner for each season ? As simple as that:
SELECT S.title, P.username, R.points
FROM Ranking R
INNER JOIN seasons S ON R.season_id=S.season_id
INNER JOIN players P ON R.userid=P.userid
WHERE R.rank = 1
You will discover over the time that you can do a lot of different things very simply with your ranking table.
You're join is wrong, try something like:
SELECT s.title, p.username, SUM(m.points) as points FROM matchdays m
JOIN players p ON p.userid = m.userid
JOIN seasons s ON m.season_id = s.season_id
group by s.season_id, p.userid
ORDER by points DESC;
As pointed out, userid does'nt belong/is not needed in 'seasons' table.
This seems like such a simple question and I terrified that I might be bashed with the duplicate question hammer, but here's what I have:
ID Date
1 1/11/01
1 3/3/03
1 2/22/02
2 1/11/01
2 2/22/02
All I need to do is enumerate the records, based on the date, and grouped by ID! As such:
ID Date Num
1 1/11/01 1
1 3/3/03 3
1 2/22/02 2
2 1/11/01 1
2 2/22/02 2
This is very similar to this question, but it's not working for me. This would be great but it's not MySQL.
I've tried to use group by but it doesn't work, as in
SELECT ta.*, count(*) as Num
FROM temp_a ta
GROUP BY `ID` ORDER BY `ID`;
which clearly doesn't run since the GROUP BY always results to one value.
Any advice greatly appreciated.
Let's assume the table to be as follows:
CREATE TABLE q43381823(id INT, dt DATE);
INSERT INTO q43381823 VALUES
(1, '2001-01-11'),
(1, '2003-03-03'),
(1, '2002-02-22'),
(2, '2001-01-11'),
(2, '2002-02-22');
Then, one of the ways in which the query to get the desired output could be written is:
SELECT q.*,
CASE WHEN (
IF(#id != q.id, #rank := 0, #rank := #rank + 1)
) >=1 THEN #rank
ELSE #rank := 1
END as rank,
#id := q.id AS buffer_id
FROM q43381823 q
CROSS JOIN (
SELECT #rank:= 0,
#id := (SELECT q2.id FROM q43381823 AS q2 ORDER BY q2.id LIMIT 1)
) x
ORDER BY q.id, q.dt
Output:
id | dt | rank | buffer_id
-------------------------------------------------
1 | 2001-01-11 | 1 | 1
1 | 2002-02-22 | 2 | 1
1 | 2003-03-03 | 3 | 1
2 | 2001-01-11 | 1 | 2
2 | 2002-02-22 | 2 | 2
You may please ignore the buffer_id column from the output - it's irrelevant to the result, but required for the resetting of rank.
SQL Fiddle Demo
Explanation:
#id variable keeps track of every id in the row, based on the sorted order of the output. In the initial iteration, we set it to id of the first record that may be obtained in the final result. See sub-query SELECT q2.id FROM q43381823 AS q2 ORDER BY q2.id LIMIT 1
#rank is set to 0 initially and is by default incremented for every subsequent row in the result set. However, when the id changes, we reset it back to 1. Please see the CASE - WHEN - ELSE construct in the query for this.
The final output is sorted first by id and then by dt. This ensures that #rank is set incrementally for every subsequent dt field within the same id, but gets reset to 1 whenever a new id group begins to show up in the result set.
I'd like to count how many occurrences of a value happen before a specific value
Below is my starting table
+-----------------+--------------+------------+
| Id | Activity | Time |
+-----------------+--------------+------------+
| 1 | Click | 1392263852 |
| 2 | Error | 1392263853 |
| 3 | Finish | 1392263862 |
| 4 | Click | 1392263883 |
| 5 | Click | 1392263888 |
| 6 | Finish | 1392263952 |
+-----------------+--------------+------------+
I'd like to count how many clicks happen before a finish happens.
I've got a very roundabout way of doing it where I write a function to find the last
finished activity and query the clicks between the finishes.
Also repeat this for Error.
What I'd like to achieve is the below table
+-----------------+--------------+------------+--------------+------------+
| Id | Activity | Time | Clicks | Error |
+-----------------+--------------+------------+--------------+------------+
| 3 | Finish | 1392263862 | 1 | 1 |
| 6 | Finish | 1392263952 | 2 | 0 |
+-----------------+--------------+------------+--------------+------------+
This table is very long so I'm looking for an efficient solution.
If anyone has any ideas.
Thanks heaps!
This is a complicated problem. Here is an approach to solving it. The groups between the "finish" records need to be identified as being the same, by assigning a group identifier to them. This identifier can be calculated by counting the number of "finish" records with a larger id.
Once this is assigned, your results can be calculated using an aggregation.
The group identifier can be calculated using a correlated subquery:
select max(id) as id, 'Finish' as Activity, max(time) as Time,
sum(Activity = 'Clicks') as Clicks, sum(activity = 'Error') as Error
from (select s.*,
(select sum(s2.activity = 'Finish')
from starting s2
where s2.id >= s.id
) as FinishCount
from starting s
) s
group by FinishCount;
A version that leverages user(session) variables
SELECT MAX(id) id,
MAX(activity) activity,
MAX(time) time,
SUM(activity = 'Click') clicks,
SUM(activity = 'Error') error
FROM
(
SELECT t.*, #g := IF(activity <> 'Finish' AND #a = 'Finish', #g + 1, #g) g, #a := activity
FROM table1 t CROSS JOIN (SELECT #g := 0, #a := NULL) i
ORDER BY time
) q
GROUP BY g
Output:
| ID | ACTIVITY | TIME | CLICKS | ERROR |
|----|----------|------------|--------|-------|
| 3 | Finish | 1392263862 | 1 | 1 |
| 6 | Finish | 1392263952 | 2 | 0 |
Here is SQLFiddle demo
Try:
select x.id
, x.activity
, x.time
, sum(case when y.activity = 'Click' then 1 else 0 end) as clicks
, sum(case when y.activity = 'Error' then 1 else 0 end) as errors
from tbl x, tbl y
where x.activity = 'Finish'
and y.time < x.time
and (y.time > (select max(z.time) from tbl z where z.activity = 'Finish' and z.time < x.time)
or x.time = (select min(z.time) from tbl z where z.activity = 'Finish'))
group by x.id
, x.activity
, x.time
order by x.id
Here's another method of using variables, which is somewhat different to #peterm's:
SELECT
Id,
Activity,
Time,
Clicks,
Errors
FROM (
SELECT
t.*,
#clicks := #clicks + (activity = 'Click') AS Clicks,
#errors := #errors + (activity = 'Error') AS Errors,
#clicks := #clicks * (activity <> 'Finish'),
#errors := #errors * (activity <> 'Finish')
FROM
`starting` t
CROSS JOIN
(SELECT #clicks := 0, #errors := 0) i
ORDER BY
time
) AS s
WHERE Activity = 'Finish'
;
What's similar to Peter's query is that this one uses a subquery that's returning all the rows, setting some variables along the way and returning the variables' values as columns. That may be common to most methods that use variables, though, and that's where the similarity between these two queries ends.
The difference is in how the accumulated results are calculated. Here all the accumulation is done in the subquery, and the main query merely filters the derived dataset on Activity = 'Finish' to return the final result set. In contrast, the other query uses grouping and aggregation at the outer level to get the accumulated results, which may make it slower than mine in comparison.
At the same time Peter's suggestion is more easily scalable in terms of coding. If you happen to have to extend the number of activities to account for, his query would only need expansion in the form of adding one SUM(activity = '...') AS ... per new activity to the outer SELECT, whereas in my query you would need to add a variable and several expressions, as well as a column in the outer SELECT, per every new activity, which would bloat the resulting code much more quickly.