MySQL top 2 records per group - mysql

Basically I need to get only the last 2 records for each user, considering the last created_datetime:
id | user_id | created_datetime
1 | 34 | '2015-09-10'
2 | 34 | '2015-10-11'
3 | 34 | '2015-05-23'
4 | 34 | '2015-09-13'
5 | 159 | '2015-10-01'
6 | 159 | '2015-10-02'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
Returns (expected output):
2 | 34 | '2015-10-11'
1 | 34 | '2015-09-10'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
I was trying with this idea:
select user_id, created_datetime,
$num := if($user_id = user_id, $num + 1, 1) as row_number,
$id := user_id as dummy
from logs group by user_id
having row_number <= 2
The idea is keep only these top 2 rows and remove all the others.
Any ideas?

Your idea is close. I think this will work better:
select u.*
from (select user_id, created_datetime,
$num := if(#user_id = user_id, #num + 1,
if(#user_id := id, 1, 1)
) as row_number
from logs cross join
(select #user_id := 0, #num := 0) params
order by user_id
) u
where row_number <= 2 ;
Here are the changes:
The variables are set in only one expression. MySQL does not guarantee the order of evaluation of expressions, so this is important.
The work is done in a subquery, which is then processed in the outer query.
The subquery uses order by, not group by.
The outer query uses where instead of having (actually, in MySQL having would work, but where is more appropriate).

Related

Rank MySQL results after joining with another table, also filtering them by a condition

I found how to rank results, but have yet to find a solution for how to rank them after they've been filtered by a condition. What I need to do is:
1) Sort scores in descending order
2) Join the results with the account table
3) Filter the results by age and gender (from the account table)
4) Rank them
5) Grab the results within a range of ranks (IE: 25-75, but I put 0-10 for simplicity below)
Each solution I've found either doesn't properly rank them, or ranks them and then filters the results. However, this leaves me with ranks like 1, 3, 4, 6, 10, etc when I want 1, 2, 3, 4, 5, etc. or 25, 26, 27, 28, 29, etc.
Here is what I have right now:
SELECT a.id, a.username, a.country, score, user_id, duration, rank
FROM (
SELECT s.*, a.age, a.gender, #rownum := #rownum + 1 AS rank
FROM scores s,
(SELECT #rownum := 0) r
LEFT JOIN accounts a ON 1
WHERE a.age>=18 && a.age<=35 && a.gender='m'
ORDER BY score DESC, duration DESC, time ) `selection`
LEFT JOIN accounts a ON a.id=user_id
WHERE rank >= 0 && rank <= 10 && a.age>=18 && a.age<=35 && a.gender='m'
The problem with the above query is that it gives me correct rankings without any gaps, but the scores are completely out of order. IE. 5, 1, 10, 7, 125, 50, etc. IE. They come out in the same order they are in the database, unsorted.
This is what my previous query looked like:
SELECT a.id, a.username, a.country, score, duration, rank
FROM (
SELECT s.*, #rownum := #rownum + 1 AS rank
FROM scores s,
(SELECT #rownum := 0) r
ORDER BY score DESC, duration DESC, time ) `selection`
LEFT JOIN accounts a ON a.id=user_id
WHERE rank >= 0 && rank <= 10 && a.age>=18 && a.age<=35 && a.gender='m'
This query gives me properly sorted scores, but with ranks like 3, 5, 8, 9, 11 instead of 1, 2, 3, 4, 5.
Here is some real-world sample data...
For the sake of time, id, username, country, are all omitted because they are irrelevant data and just repeat all the way down the table. The age all the way down is '34' just as the condition shows, the country and username are the same all the way down since this particular user is the only one in the table with the age of 34.
The first query from above with proper rankings, but incorrectly sorted scores (IE. the same order as they were added to the database):
age | score | rank
--------------------
34 | 5 | 1
34 | 1 | 2
34 | 22 | 3
34 | 13 | 4
34 | 23 | 5
34 | 23 | 6
34 | 34 | 7
34 | 32 | 8
34 | 58 | 9
34 | 76 | 10
The second query from above, properly sorted, but ranks still take other users into consideration when ranking, then get excluded in the final result:
age | score | rank
---------------------
34 | 76 | 3 --- This should have started at 1
34 | 62 | 4
34 | 58 | 5
34 | 42 | 7 --- Notice 6 was skipped
34 | 34 | 8
34 | 32 | 9
34 | 29 | 10
What I expect:
age | score | rank
--------------------
34 | 76 | 1
34 | 62 | 2
34 | 58 | 3
34 | 42 | 4
34 | 34 | 5
34 | 32 | 6
34 | 29 | 7
I can offer the following query based on your sample data set:
SET #rank = 0;
SET #score = NULL;
SELECT
age,
#rank:=CASE WHEN #score = score THEN #rank ELSE #rank + 1 END AS rank,
#score:=score AS score
FROM scores
ORDER BY rank;
The computes the rank of the scores, in terms of the definition for which "rank" is usually used. If two scores be the same, then they have the same rank. If you really want the row number, then the above query can be easily modified.
Output:
Demo here:
Rextester
SELECT a.id, a.username, a.country, score, user_id, duration, rank
FROM (
SELECT s.*, a.age, a.gender, #rownum := #rownum + 1 AS rank
FROM scores s,
(SELECT #rownum := 0) r
LEFT JOIN accounts a ON 1
WHERE a.age>=18 && a.age<=35 && a.gender='m'
ORDER BY score DESC, duration DESC, time ) `selection`
LEFT JOIN accounts a ON a.id=user_id
WHERE rank >= 0 && rank <= 10 && a.age>=18

Optimizing SQL Query for max value with various conditions from a single MySQL table

I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.

SQL Find Position in table

I have a table in mySql which has the users ID and scores.
What I would like to do is organise the table by scores (simple) but then find where a certain user ID sits in the table.
So far I would have:
SELECT * FROM table_score
ORDER BY Score DESC
How would I find where userID = '1234' is (i.e entry 10 of 12)
The following query will give you a new column UserRank, which specify the user rank:
SELECT
UserID,
Score,
(#rownum := #rownum + 1) UserRank
FROM table_score, (SELECT #rownum := 0) t
ORDER BY Score DESC;
SQL Fiddle Demo
This will give you something like:
| USERID | SCORE | USERRANK |
-----------------------------
| 4 | 100 | 1 |
| 10 | 70 | 2 |
| 2 | 55 | 3 |
| 1234 | 50 | 4 |
| 1 | 36 | 5 |
| 20 | 33 | 6 |
| 8 | 25 | 7 |
Then you can put this query inside a subquery and filter with a userId to get that user rank. Something like:
SELECT
t.UserRank
FROM
(
SELECT *, (#rownum := #rownum + 1) UserRank
FROM table_score, (SELECT #rownum := 0) t
ORDER BY Score DESC
) t
WHERE userID = '1234';
SQL Fiddle Demo
For a given user id, you can do this with a simple query:
select sum(case when ts.score >= thescore.score then 1 else 0 end) as NumAbove,
count(*) as Total
from table_score ts cross join
(select ts.score from table_score ts where userId = '1234') thescore
If you have indexes on score and userid, this will be quite efficient.

Top x rows and group by (again)

I know it's a frequent question but I just can't figure it out and the examples I found didn't helped. What I learned, the best strategy is to try to find the top and bottom values of the top range and then select the rest, but implementing is a bit tricky.
Example table:
id | title | group_id | votes
I'd like to get the top 3 voted rows from the table, for each group.
I'm expecting this result:
91 | hello1 | 1 | 10
28 | hello2 | 1 | 9
73 | hello3 | 1 | 8
84 | hello4 | 2 | 456
58 | hello5 | 2 | 11
56 | hello6 | 2 | 0
17 | hello7 | 3 | 50
78 | hello8 | 3 | 9
99 | hello9 | 3 | 1
I've fond complex queries and examples, but they didn't really helped.
You can do it using variables:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
#rn := CASE WHEN #prev = group_id THEN #rn + 1 ELSE 1 END AS rn,
#prev := group_id
FROM table1, (SELECT #prev := -1, #rn := 0) AS vars
ORDER BY group_id DESC, votes DESC
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
This is basically just the same as the following query in databases that support ROW_NUMBER:
SELECT
id,
title,
group_id,
votes
FROM (
SELECT
id,
title,
group_id,
votes,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY votes DESC) AS rn
FROM student
) T1
WHERE rn <= 3
ORDER BY group_id, votes DESC
But since MySQL doesn't support ROW_NUMBER yet you have to simulate it, and that's what the variables are for. The two queries are otherwise identical. Try to understand the second query first, and hopefully the first should make more sense.

Select a row and rows around it

Ok, let's say I have a table with photos.
What I want to do is on a page display the photo based on the id in the URI. Bellow the photo I want to have 10 thumbnails of nearby photos and the current photo should be in the middle of the thumbnails.
Here's my query so far (this is just an example, I used 7 as id):
SELECT
A.*
FROM
(SELECT
*
FROM media
WHERE id < 7
ORDER BY id DESC
LIMIT 0, 4
UNION
SELECT
*
FROM media
WHERE id >= 7
ORDER BY id ASC
LIMIT 0, 6
) as A
ORDER BY A.id
But I get this error:
#1221 - Incorrect usage of UNION and ORDER BY
Only one ORDER BY clause can be defined for a UNION'd query. It doesn't matter if you use UNION or UNION ALL. MySQL does support the LIMIT clause on portions of a UNION'd query, but it's relatively useless without the ability to define the order.
MySQL also lacks ranking functions, which you need to deal with gaps in the data (missing due to entries being deleted). The only alternative is to use an incrementing variable in the SELECT statement:
SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t, (SELECT #rownum := 0) r
Now we can get a consecutively numbered list of the rows, so we can use:
WHERE rownum BETWEEN #midpoint - ROUND(#midpoint/2)
AND #midpoint - ROUND(#midpoint/2) +#upperlimit
Using 7 as the value for #midpoint, #midpoint - ROUND(#midpoint/2) returns a value of 4. To get 10 rows in total, set the #upperlimit value to 10. Here's the full query:
SELECT x.*
FROM (SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t,
(SELECT #rownum := 0) r) x
WHERE x.rownum BETWEEN #midpoint - ROUND(#midpoint/2) AND #midpoint - ROUND(#midpoint/2) + #upperlimit
But if you still want to use LIMIT, you can use:
SELECT x.*
FROM (SELECT t.id,
#rownum := #rownum+1 as rownum
FROM MEDIA t,
(SELECT #rownum := 0) r) x
WHERE x.rownum >= #midpoint - ROUND(#midpoint/2)
ORDER BY x.id ASC
LIMIT 10
I resolve this by using the below code:
SELECT A.* FROM (
(
SELECT * FROM gossips
WHERE id < 7
ORDER BY id DESC
LIMIT 2
)
UNION
(
SELECT * FROM gossips
WHERE id > 7
ORDER BY id ASC
LIMIT 2
)
) as A
ORDER BY A.id
I don't believe that you can have an "order by" in different sections of a UNION. Could you just do something like this:
SELECT * FROM media where id >= 7 - 4 and id <= 7 + 4 ORDER BY id
I'm agree with the answer suggested by malonso(+1), but if you try it with id= 1, you will get only 5 thumbnails. I don't know if you want this behaviour. If you want always 10 thumbs, you can try:
select top 10 * from media where id > 7 - 4
The problem is that select top is database dependent (in this case is a sql server clause). Other database has similar clauses:
Oracle:
SELECT * media
FROM media
WHERE ROWNUM < 10
AND id > 7 - 4
MySQL:
SELECT *
FROM media
WHERE id > 7 - 4
LIMIT 10
So maybe you can use the last one.
If we do it, we will have the same problem if you want the last 10 thumbs. By example, If we have 90 thumbs and we give an id=88 ... You can solve it adding an OR condition. In MySQL will be something like:
SELECT *
FROM media
WHERE id > 7 - 4
OR (Id+5) > (select COUNT(1) from media)
LIMIT 10
If you're happy to use temp tables, your original query could be broken down to use them.
SELECT
*
FROM media
WHERE id < 7
ORDER BY id DESC
LIMIT 0, 4
INTO TEMP t1;
INSERT INTO t1
SELECT
*
FROM media
WHERE id >= 7
ORDER BY id ASC
LIMIT 0, 6;
select * from t1 order by id;
drop table t1;
Try union all instead. Union requires the server to ensure that the results are unique and this conflicts with your ordering.
I had to solve a similar problem, but needed to account situations where we always got the same number of rows, even if the desired row was near the top or bottom of the result set (i.e. not exactly in the middle).
This solution is a tweak from OMG Ponies' response, but where the rownum maxes out at the desired row:
set #id = 7;
SELECT natSorted.id
FROM (
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
) natSorted ORDER BY id;
Here's a break down of what's happening:
NOTE: In the example below I made a table with 20 rows and removed ids 6 and 9 to ensure a gap in ids do not affect the results
First we assign every row a gravity value that's centered around the particular row you're looking for (in this case where id is 7). The closer the row is to the desired row, the higher the value will be:
SET #id = 7;
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
returns:
+----+---------+
| id | gravity |
+----+---------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 7 | 6 |
| 8 | 5 |
| 10 | 4 |
| 11 | 3 |
| 12 | 2 |
| 13 | 1 |
| 14 | 0 |
| 15 | -1 |
| 16 | -2 |
| 17 | -3 |
| 18 | -4 |
| 19 | -5 |
| 20 | -6 |
| 21 | -7 |
+----+---------+
Next we order all the results by the gravity value and limit on the desired number of rows:
SET #id = 7;
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
returns:
+----+---------+
| id | gravity |
+----+---------+
| 7 | 6 |
| 5 | 5 |
| 8 | 5 |
| 4 | 4 |
| 10 | 4 |
| 3 | 3 |
| 11 | 3 |
| 2 | 2 |
| 12 | 2 |
| 1 | 1 |
+----+---------+
At this point we have all the desired ids, we just need to sort them back to their original order:
set #id = 7;
SELECT natSorted.id
FROM (
SELECT gravitySorted.* FROM (
SELECT Media.id, IF(id <= #id, #gravity := #gravity + 1, #gravity := #gravity - 1) AS gravity
FROM Media, (SELECT #gravity := 0) g
) AS gravitySorted ORDER BY gravity DESC LIMIT 10
) natSorted ORDER BY id;
returns:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 7 |
| 8 |
| 10 |
| 11 |
| 12 |
+----+