Per group, find first N users with SUM(x) >= N - mysql

Problem: Find the first 2 users who have at least 10 items in a category, per category.
Table structure:
CREATE TABLE items(
id INT AUTO_INCREMENT PRIMARY KEY,
datetime datetime,
category INT,
user INT,
items_count INT
);
Sample data:
INSERT INTO items (datetime, category, user, items_count) VALUES
('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),
('2013-01-01 00:00:00', 2, 4, 10),
('2013-01-01 00:00:01', 2, 1, 10),
('2013-01-01 00:00:01', 2, 5, 10);
Desired result:
category user
1 1
1 3
2 4
2 5
Note: As shown in the result, I need to be able to show preference towards a user when multiple users meet the requirements simultaneously.
SQL Fiddle:
http://sqlfiddle.com/#!2/58e60
This is what I have tried:
SELECT
Derived.*,
IF (#category != Derived.category, #rank := 1, #rank := #rank + 1) AS rank,
#category := category
FROM(
SELECT
category,
user,
SUM(items_count) AS items_count,
MAX(datetime) AS datetime
FROM items
GROUP BY
category,
user
HAVING
SUM(items_count) >= 10
) AS Derived
JOIN(SELECT #rank := 0, #category := 0) AS r
HAVING
rank <= 2
ORDER BY
Derived.category,
Derived.datetime
But it is faulty. Not only does it not take user precedence into account, it would produce the wrong result with data such as this:
('2013-01-01 00:00:00', 1, 1, 10),
('2013-01-01 00:00:01', 1, 2, 1),
('2013-01-01 00:00:02', 1, 3, 10),
('2013-01-01 00:00:03', 1, 2, 9),
('2013-01-01 00:00:10', 1, 3, 1);
Additional information: I do not know if procedures could make a difference in this scenario, but unfortunately it is not an option either. The user running this query only has SELECT privilege.

In order to find the users that meet your needs, you need the cumulative sum of the counts. The following query finds the occasions when a user first reaches 10 units. If the counts are never negative, then there is only one:
select i.*
from (select i.*,
(select sum(items_count)
from items i2
where i2.user = i.user and
i2.category = i.category and
i2.datetime <= i.datetime
) as cumsum
from items i
) i
where cumsum - items_count < 10 and cumsum >= 10
order by datetime;
To get the first two, you need to use MySQL tricks for counting within a group. Here is an example that generally works:
select i.*
from (select i.*, if(#prevc = category, #rn := #rn + 1, #rn := 1) as rn, #prevc := category
from (select i.*,
(select sum(items_count)
from items i2
where i2.user = i.user and
i2.category = i.category and
i2.datetime <= i.datetime
) as cumsum
from items i
) i
cross join
(select #rn := 0) const
where cumsum - items_count < 10 and cumsum >= 10
) i
where rn <= 2
order by category, datetime;
I have a problem with this approach, because nothing in MySQL says that the expression #prevc := category will actually be calculated after the calculation for rn. However, it seems to be the case, and this seems to work in practice.

I tried Gordon's query, but unfortunately it does not seem to work with large tables; after waiting 15 minutes for the result I decided to kill it.
However the following query worked very well for me, it chewed it's way through a table of ~6M rows in about 8 seconds.
#Variable
SET #min_items = 10,
#max_users = 2,
#preferred_user = 5,
#Static
#category = 0,
#user = 0,
#items = 0,
#row_num = 1;
--
SELECT
category,
user,
datetime
FROM(
SELECT
category,
user,
datetime,
IF (#category = category, #row_num := #row_num + 1, #row_num := 1) AS row_num,
#category := category
FROM(
SELECT
category,
user,
datetime,
IF (#user != user, #items := 0, NULL),
IF (#items < #min_items, #items := #items + items_count, NULL) AS items_cumulative,
#user := user
FROM items
ORDER BY
category,
user,
datetime
) AS Derived
WHERE items_cumulative >= #min_items
ORDER BY
category,
datetime,
FIELD(user, #preferred_user, user)
) AS Derived
WHERE row_num <= #max_users;

Related

get distinct values as array by user_id

I want to get a list of distinct values for each user limited by 3 values per user:
id, user_id, value
1, 1, a
2, 1, b
3, 2, c
4, 1, b
5, 1, d
6, 1, e
expected result:
user_id, values
1, [a,b,d]
2, [c]
is there some way to do this with GROUP BY user_id and DISTINCT?
Edit (based on comments):
We can use user-defined variables to assign row number to various value within a partition of user_id. Eventually, we will filter out this result-set to consider upto 3 rows per user_id only.
SELECT
dt2.user_id,
dt2.value
FROM
(
SELECT
#rn := CASE WHEN #ui = dt.user_id THEN #rn + 1
ELSE 1
END AS row_no,
#ui = dt.user_id,
dt.value
FROM
(
SELECT DISTINCT
user_id,
value
FROM your_table
ORDER BY user_id
) AS dt
CROSS JOIN (SELECT #rn := 0, #ui := null) AS user_init_vars
) AS dt2
WHERE dt2.row_no <= 3
Previous question's answer:
Group_Concat(Distinct...) all the unique value for a user_id.
We can then use Substring_Index() function to consider string upto 3rd comma. This will then result in consideration of upto 3 values only.
At the end, we can use Concat() function to enclose the resultant string in square brackets.
Values is Reserved keyword in MySQL. You can consider naming the resultant column into something else.
Try the following:
SELECT user_id,
CONCAT('[',
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT value), ',', 3),
']') AS user_values
FROM your_table
GROUP BY user_id

How to display RANK in mysql table?

If we have rank depends on two column, lets say score and time,
and i want rank on highest score and lowest time.
CREATE TABLE yourTable (id int, userid int, questions int, `date` varchar(10),
rightquestions int, examid int, `time` int);
INSERT INTO yourTable (id, userid, questions, `date`, rightquestions, examid, `time`)
VALUES
(1, 10, 5, '02/09/2017', 5, 2, 11),
(2, 12, 5, '02/09/2017', 5, 2, 11),
(9, 16, 5, '02/09/2017', 4, 2, 18),
(8, 15, 5, '02/09/2017', 3, 2, 18);
as you can see above,In my table score = rightanswers and time=time
Now if i want rank by Highest rightanswers and lowest time then what should be the query?
I tried this one but getting unexpected results
SELECT id, rightquestions, leagueid,time,
CASE
WHEN #prevRank = rightquestions AND #prevTime != time THEN #curRank
WHEN #prevRank != rightquestions AND #prevTime != time THEN #curRank
WHEN #prevRank := rightquestions AND #prevTime := time AND #curRank = 0 THEN #curRank := #curRank + 1
END AS rank
FROM results p,
(SELECT #curRank :=0, #prevRank := NULL, #prevTime := NULL) r
ORDER BY rightquestions DESC,time ASC
You require a session variable that will increase and restart every time the score changes. You can also initialize it in a "fake" join, within the actual query as follows:
select timer, if(#score = score,#rank:=#rank+1,#rank:= 1) as rank ,#score:=score as score
from (
select 1 score, now() + interval rand()*10 hour as timer
union all
select 1 score, now() + interval rand()*10 hour as timer
union all
select 2 score, now() + interval rand()*10 hour as timer
union all
select 3 score, now() + interval rand()*10 hour as timer
union all
select 3 score, now() + interval rand()*10 hour as timer
)
t
join (select #rank:=0, #score := 0)r on (1=1)
order by score, timer asc
You can simulate a dense rank in MySQL by using a row number session variable along with a subquery which identifies unique rightquestions/time pairs. Try the following query which is based off the table snapshot you shared with us:
SET #row_number = 0;
SELECT t1.rightquestions, t1.`time`, t2.rank
FROM yourTable t1
INNER JOIN
(
SELECT (#row_number:=#row_number + 1) AS rank, t.rightquestions, t.`time`
FROM
(
SELECT rightquestions, `time`
FROM yourTable
GROUP BY rightquestions, `time`
) t
ORDER BY t.rightquestions DESC, t.`time`
) t2
ON t1.rightquestions = t2.rightquestions AND
t1.`time` = t2.`time`
Here is the output I got in Workbench while testing locally:
rightquestions | time | rank
5 | 11 | 1
5 | 11 | 1
4 | 18 | 2
3 | 18 | 3

how can i group by field value?

how can i group by one field start by value 0
eg.
select * from t;
id, check_id, user_name
1, 0, user_a
2, 1, user_a
3, 2, user_a
1, 0, user_a
2, 1, user_a
3, 3, user_a
1, 0, user_b
2, 1, user_b
3, 3, user_b
group by check_id by start by value 0 per group
user_name, check_info
user_a, 0-1-2
user_a, 0-1-3
user_b, 0-1-3
how can i group by?
Well, i read in the question : group by one field start by value 0
Then, you can try this.
select user_name,group_concat(distinct check_id order by check_id asc separator '-') check_info
from (
select id,check_id,user_name,
case when check_id = 0 then
#rn := #rn+1
else
#rn := #rn
end as unique_id
from t
inner join (select #rn := 0) as tmp
order by user_name
) as tbl
group by user_name,unique_id
This will group by for every records start by 0 and order by user_name.
This will give you what you want....maybe. It does work but is relying on the records coming back in the appropriate order when selected from the table (and that is NOT certain to occur).
SELECT user_name, GROUP_CONCAT(check_id ORDER BY grouping, check_id SEPARATOR '-')
FROM
(
SELECT id, check_id, user_name, #grouping:=if(id > #prev_id, #grouping, #grouping + 1) AS grouping, #prev_id:=id
FROM t
CROSS JOIN
(
SELECT #grouping:=0, #prev_id:=0
) sub0
) sub1
GROUP BY user_name, grouping
It works by returning the rows and using variables to assign a grouping to them (so when the id gets smaller it adds one to the grouping value), then does a GROUP BY on the user name and the grouping value.
But really you need to have the grouping value somehow stored with your data in advance.
Provided that id is an auto-increment field, then you can use:
SELECT user_name,
GROUP_CONCAT(check_id ORDER BY check_id SEPARATOR '-') AS check_info
FROM (
SELECT id, check_id, user_name,
#grp := IF (#uname = user_name,
IF (check_id = 0, #grp + 1, #grp),
IF (#uname := user_name, #grp + 1, #grp + 1)) AS grp
FROM mytable
CROSS JOIN (SELECT #grp := 0, #uname := '') AS vars
ORDER BY id) AS t
GROUP BY user_name, grp
Variables are used to identify slices of consecutive records, within each user_name partition, starting by 0.
Demo here

Sorting Leaderboard in MySQL

I use the following SQL to order scores by rank in my leaderboard table:
SELECT score, 1+(SELECT COUNT(*) FROM leaderboard a WHERE a.score > b.score) AS rank
FROM leaderboard b
WHERE stage=1
GROUP BY id
where my table schema is like this:
CREATE TABLE `leaderboard` (
`auto_id` int(11) NOT NULL AUTO_INCREMENT,
`score` int(11) NOT NULL DEFAULT '0',
`id` int(11) NOT NULL,
`created_on` datetime NOT NULL,
PRIMARY KEY (`auto_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Some sample data rows are as follow:
auto_id score id created_on
====================================================
1, 72023456, 1, '2014-12-30 11:49:59'
2, 1420234, 1, '2014-12-30 12:00:21'
3, 420234, 1, '2014-12-30 12:00:38'
4, 16382, 1, '2014-12-30 16:31:12'
5, 16382, 1, '2014-12-30 16:34:18'
6, 16382, 1, '2014-12-30 16:37:43'
7, 17713, 1, '2014-12-30 16:38:35'
8, 17257, 1, '2014-12-30 18:53:45'
9, 10625, 1, '2014-12-30 18:58:10'
10, 17272, 1, '2014-12-30 18:58:59'
11, 17328, 1, '2014-12-30 18:59:44'
12, 17267, 37, '2015-01-02 17:11:59'
13, 16267, 37, '2015-01-02 17:12:30'
14, 16267, 37, '2015-01-02 17:13:02'
15, 35509, 37, '2015-01-02 17:17:46'
16, 18286, 37, '2015-01-02 18:20:09'
17, 16279, 37, '2015-01-02 18:20:43'
18, 16264, 37, '2015-01-02 18:21:15'
19, 16265, 37, '2015-01-02 18:40:04'
Since id is player's ID, I have to GROUP BY id. It gives the following result:
id score rank
=========================
1 72023456 1
37 17267 11
How can I obtain the following expected results?
id score rank
=========================
1 72023456 1
37 35509 2
The current problem is, the existing result is not the MAX score of the player.
Bonus: My ultimate goal is to get the entries 1 rank higher & 1 rank lower than specific id.
As MySql does not have Windowing Functions the query that you need has to mimic its behavior, so you have to use variables.
select id, score, #rank :=#rank+1 as rank
from (
SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc
) tab
,(select #rank := 0) r
EDIT: I made a little mistake. I've corrected it now.
The output will be:
id score rank
=========================
1 72023456 1
37 35509 2
Basically what I'm doing is creating an iterator on the query and for every row it will increment the variable. As I added the order by it will rank your values based on that order by. But that rank has to happen outside the query because the order be alongside with the rank will mess things up if there is more than two IDs
I will edit the query with the solution for "1 rank higher & 1 rank lower than specific id."
EDIT: for the bonus (not pretty though)
select id, score, rank
from (
select tab.id, tab.score, #rank :=#rank+1 as rank
from (select #rank := 0) r,
(SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc) tab
) spec
where spec.id=2
UNION
select id, score, rank
from (
select tab.id, tab.score, #rank :=#rank+1 as rank
from (select #rank := 0) r,
(SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc) tab
) spec
where spec.rank=
(select rank-1
from (
select tab.id, tab.score, #rank :=#rank+1 as rank
from (select #rank := 0) r,
(SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc) tab
) spec
where spec.id=2)
UNION
select id, score, rank
from (
select tab.id, tab.score, #rank :=#rank+1 as rank
from (select #rank := 0) r,
(SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc) tab
) spec
where spec.rank=
(select rank+1
from (
select tab.id, tab.score, #rank :=#rank+1 as rank
from (select #rank := 0) r,
(SELECT b.id, max(b.score) as score
FROM leaderboard b
GROUP BY id
order by score desc) tab
) spec
where spec.id=2)
order by rank;
Note that you put the specific ID on the clauses spec.id=2 (I've put 2 because I had to change the values on my enviroment to test it)
Here the SQL Fiddle with my test with the two queries working: http://sqlfiddle.com/#!2/75047/2
The reason the score isn't the max is that, since score isn't in the GROUP BY clause, MySQL is just picking the first value as a representative. Technically, this isn't valid SQL. You probably want to use MAX(score) AS score.
As for the rank, since MySQL doesn't support window functions you'll have to hack something yourself. You can look at this SO post for more info. The standard ways seem to be to use mutable variables to count the rows, or to join the query to itself using an inequality in the ON clause. Neither seems very elegant.

Mysql alternative for LIMIT inside subquery in mysql 5.1.49

SELECT student_id FROM `students` AS s1
WHERE student_id IN
(SELECT s2.student_id FROM `students` AS s2
WHERE s1.year_of_birth = s2.year_of_birth
LIMIT 10)
Can't process this query on my server. It drops errors, that says that this version of mysql doesn't support limit inside subqueries etc(ERROR 1235).
Is there any solution for my version of mysql 5.1.49?
SELECT
id,
region
FROM (
SELECT
region,
id,
#rn := CASE WHEN #prev_region = region
THEN #rn + 1
ELSE 1
END AS rn,
#prev_region := region
FROM (SELECT #prev_region := NULL) vars, ads T1
ORDER BY region, id DESC
) T2
WHERE rn <= 4
ORDER BY region, id
Thanks to Mark Byers
I think you want any ten students with each birthdate. This is a greatest-n-per-group query and you can search Stack Overflow to see how this can be done in MySQL.
It would be easy if MySQL supported the ROW_NUMBER function, but since it does not you can emulate it using variables. For example to get 3 students for each birth date you could do it like this:
SELECT
student_id,
year_of_birth
FROM (
SELECT
year_of_birth,
student_id,
#rn := CASE WHEN #prev_year_of_birth = year_of_birth
THEN #rn + 1
ELSE 1
END AS rn,
#prev_year_of_birth := year_of_birth
FROM (SELECT #prev_year_of_birth := NULL) vars, students T1
ORDER BY year_of_birth, student_id DESC
) T2
WHERE rn <= 3
ORDER BY year_of_birth, student_id
Result:
1, 1990
2, 1990
5, 1990
4, 1991
7, 1991
8, 1991
6, 1992
Test data:
CREATE TABLE students (student_id INT NOT NULL, year_of_birth INT NOT NULL);
INSERT INTO students (student_id, year_of_birth) VALUES
(1, 1990),
(2, 1990),
(3, 1991),
(4, 1991),
(5, 1990),
(6, 1992),
(7, 1991),
(8, 1991);