MySQL: Effectively group into equally-sized buckets by data

MySQL: Effectively group into equally-sized buckets by data - mysql

Suppose I have a table of Players, each player has a score, and now I want to divide all players into levels of equal size, based on their score, so if I have n players, level 1 will have the first n/10 players with the highest score, level 2 will have the next n/10, and so on.
I have come up with a query:
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM (
SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC LIMIT ?,?
) AS T1
);
Where I run this 10 times, with the first parameter running from 1-10, the second is 0, n/10, 2*n/10, ... and the third is always n/10.
This works, but it takes quite a long time. Trying to get a better result, I have created a temporary table:
CREATE TEMPORARY TABLE TempTable (
IDX INT UNSIGNED NOT NULL AUTO_INCREMENT,
ID INT UNSIGNED NOT NULL,
PRIMARY KEY (IDX)
) ENGINE=MEMORY;
INSERT INTO TempTable (ID) SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC;
Then I run ten times:
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM TempTable WHERE IDX BETWEEN ? AND ?
);
With the appropriate parameters, and finally:
DROP TABLE TempTable;
However, this runs even slower. So is there a more efficient way to do this in MySQL? I've found this answer, but it appears NTILE is not available in MySQL.
Note: Players have an index on PlayerID (Primary key) and on Score, although running without index on Score doesn't seem to make much of a difference. The reason I sort also by PlayerID is so I have well-defined (consistent) behavior in case of ties.

You could try using a ranking function. This is what I'd use:
SELECT PlayerID,
score,
#levelLimit,
#counter := #counter + 1 AS counter,
#level := IF(#counter % #levelLimit = 0, #level:= #level + 1, #level) as level
FROM Players,
(SELECT #counter := 0) a,
(SELECT #levelLimit := round(count(*)/4 -- number of groups you want to end with
, 0)
FROM Players) b,
(SELECT #level := 1) c
ORDER BY Score DESC,
PlayerID ASC
;
To update the table:
UPDATE Players join (
SELECT PlayerID,
score,
#levelLimit, #counter := #counter + 1 AS counter,
#level := IF(#counter % #levelLimit = 0, #level:= #level + 1, #level) AS level
FROM Players,
(SELECT #counter := 0) a,
(SELECT #levelLimit := round(count(*)/4 -- number of clusters
, 0)
FROM Players) b,
(SELECT #level := 1) c
ORDER BY Score DESC,
PlayerID ASC
) as a on a.PlayerID = Players.PlayerID
SET Players.level = a.level
http://sqlfiddle.com/#!9/7f55f9/3

The reason that your query is slow is because of this limit bit at the end:
SELECT PlayerID FROM Players ORDER BY Score DESC, PlayerID ASC LIMIT ?,?
Without an offset, limit you would be doing a table scan in ten steps. With a offset,limit You are doing it several times over! Essentially to get the offset the whole set of data has to be sorted and then only can mysql move to the data of interest. My suggestion is to avoid limit clause entire by breaking up the field into levels based on their scores.
For example if you have 10 levels, you could do a simple query to get
SELECT max(score), min(score) from ...
and then split the fields into 10 equals levels based on the difference of the highest and lovest score. If like stack overflow you have millions of users with a score of one, instead of min you can choose an arbitary number of the lowest bound.
then
UPDATE Players SET Level=? WHERE PlayerID IN (
SELECT * FROM (
SELECT PlayerID FROM score < level_upper_bound and score > leve_lower bound ) AS T1
);
You would still be doing a table scan in 10 steps, but now there is only one table scan and not 10

Related

SQL query to count rows grouped by an ID, but limit count on each group

So I have a bit of an unusual request. I'm working with a table with billions of rows.
The table has a column 'id' which is not unique, and has a column 'data'
What I want to do is run a count on the number of rows grouped by the 'id', but limit the counting to only 150 entries. I only need to know if there are 150 rows by any given id.
This is in an effort to optimize the query and performance.
It doesn't have to be a count. I only need to know if a given id has 150 entries, without have MySQL continue counting entries during the query. If that makes sense.
I know how to count, and I know how to group, and I know how to do both, but the count will come back with a number in the millions which is wasted processing time and the query needs to run on hundred of thousands of ids.

You can't really optimize performance for this -- I don't think.
select id, (count(*) >= 150)
from t
group by id;
If you happen to have a separate table with one row per id and an index on t(id), then this might be faster:
select ids.id,
((select count(*)
from t
where t.id = ids.id
) >= 150
)
from ids;
Unfortunately, MySQL does not support double nesting for correlated subqueries, so this is not possible:
select ids.id,
((select count(*)
from (select 1
from t
where t.id = ids.id
limit 150
) t
) >= 150
)
from ids;
If so, this might be faster.
EDIT:
If you have an index on id and only want ids that have 150 or more, then variables might be faster:
select id,
(#rn := if(#id = id, #rn + 1,
if(#id := id, 1, 1)
)
) as rn
from (select id
from t
order by id
) t cross join
(select #id := 0, #rn := 0) params
having rn = 150;
The thinking here is that using the index to order the table, materializing, and scanning again is probably faster than group by. I don't think row_number() would have the same performance characteristics.
EDIT II:
A slight variation on the above can be used to get all ids with a flag:
select id, (max(id) = 150)
from (select id,
(#rn := if(#id = id, #rn + 1,
if(#id := id, 1, 1)
)
) as rn
from (select id
from t
order by id
) t cross join
(select #id := 0, #rn := 0) params
having rn in (1, 150)
) t
group by id;
EDIT III:
Ahh! If you have a separate table of ids, then this might be the best approach:
select ids.id,
(select id
from t
where t.id = ids.id
limit 1 offset 149
) is not null
from ids;
This will fetch the 150th row from the index. If it not there, then no row is returned.

I don't think that this is possible. You will have to scan the entire table to know which ids have at least 150 entries.
So:
select id
from mytable
group by id
having count(*) >= 150
With an index on id, this should be as efficient as it can be.

Limit result amount for each ID to x

I am trying to do something like this in MYSQL, but without making query multiple times (50 times, in my case) through a PHP foreach.
foreach($this->map_ids as $key => $val) {
$this->db->query("SELECT scores.profile_number, scores.score FROM scores
LEFT JOIN players ON scores.profile_number = players.profile_number
WHERE scores.map_id = {'$val'}
AND scores.profile_number IN (SELECT profile_number FROM players WHERE banned = 0) LIMIT 10");
}
This is how it looks approximately when I retrieve all scores without LIMIT.
profile score map_id
76561198026851335 2478 47455
76561198043770492 2480 47455
... ... ...
76561198043899549 1340 47452
76561198048179892 1345 47452
... ... ...
I want only 10 entries (scores) from each unique map_id.

This is surprisingly difficult to do but I've ended up using user variables to do the job, check out the following demo. Obviously my data structure is much simplified but it should be enough to get you going:
SQL Fiddle example
Here is the SQL for anyone who may be interested in skipping the demo (hideous, I know)
SELECT *
FROM (
SELECT profile_number, score, map_id
FROM (
SELECT
profile_number, score, map_id,
IF( #prev <> map_id, #rownum := 1, #rownum := #rownum+1 ) AS rank,
#prev := map_id
FROM scores
JOIN (SELECT #rownum := NULL, #prev := 0) AS r
ORDER BY map_id
) AS tmp
WHERE tmp.rank <= 10
) s
JOIN players p
ON s.profile_number = p.profile_number
Basically, what is happening is this:
ORDER BY map_id
Orders your table by map_id so that all the same ones are together.
Next we assign a rownumber to each row by using the following logic:
IF( #prev <> map_id, #rownum := 1, #rownum := #rownum+1 )
If the previous row's map_id is not equal to the current row's ID, set the row number = 1, otherwise increase the rownumber by 1.
Finally, only return the rows who have a rownumber less than or equal to 10
WHERE tmp.rank <= 10
Hope that makes it a little clearer for you.

You can use the limit directive.
SELECT * FROM `your_table` LIMIT 0, 10
This will display the first 10 results from the database.
SELECT * FROM `your_table` LIMIT 5, 5
This will show records 6, 7, 8, 9, and 10

Getting latest rows in MySQL based on date (grouped by another column)

This type of question is asked every now and then. The queries provided works, but it affects performance.
I have tried the JOIN method:
SELECT *
FROM nbk_tabl
INNER JOIN (
SELECT ITEM_NO, MAX(REF_DATE) as LDATE
FROM nbk_tabl
GROUP BY ITEM_NO) nbk2
ON nbk_tabl.REF_DATE = nbk2.LDATE
AND nbk_tabl.ITEM_NO = nbk2.ITEM_NO
And the tuple one (way slower):
SELECT *
FROM nbk_tabl
WHERE REF_DATE IN (
SELECT MAX(REF_DATE)
FROM nbk_tabl
GROUP BY ITEM_NO
)
Is there any other performance friendly way of doing this?
EDIT: To be clear, I'm applying this to a table with thousands of rows.

Yes, there is a faster way.
select *
from nbk_table
order by ref_date desc
limit <n>
Where is the number of rows that you want to return.
Hold on. I see you are trying to do this for a particular item. You might try this:
select *
from nbk_table n
where ref_date = (select max(ref_date) from nbk_table n2 where n.item_no = n2.item_no)
It might optimize better than the "in" version.

Also in MySQL you can use user variables (Suppose nbk_tabl.Item_no<>0):
select *
from (
select nbk_tabl.*,
#i := if(#ITEM_NO = ITEM_NO, #i + 1, 1) as row_num,
#ITEM_NO := ITEM_NO as t_itemNo
from nbk_tabl,(select #i := 0, #ITEM_NO := 0) t
order by Item_no, REF_DATE DESC
) as x where x.row_num = 1;

getting the ranking of the rows in mysql ORDER BY statements

suppose I have
SELECT * FROM t ORDER BY j
is there a way to specify the query to also return an autoincremented column that go along with the results that specifies the rank of that row in terms of the ordering?
also this column should also work when using ranged LIMITs, eg
SELECT * FROM t ORDER BY j LIMIT 10,20
should have the autoincremented column return 11,12,13,14 etc....

Oracle, MSSQL etc support ranking functions that do exactly what you want, unfortunately, MySQL has some catching up to do in this regard.
The closest I've ever been able to get to approximating ROW_NUMBER() OVER() in MySQL is like this:
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
I don't know how that would rank using ranged LIMIT unless you used that in a subquery perhaps (although performance may suffer with large datasets)
SELECT T2.*, rank
FROM (
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
) t2
LIMIT 10,20
The other option would be to create a temporary table,
CREATE TEMPORARY TABLE myRank
(
`rank` INT(11) NOT NULL AUTO_INCREMENT,
`id` INT(11) NOT NULL,
PRIMARY KEY(id, rank)
)
INSERT INTO myRank (id)
SELECT T.id
FROM T
ORDER BY j
SELECT T.*, R.rank
FROM T
INNER JOIN myRank R
ON T.id = R.id
LIMIT 10,20
Of course, the temporary table would need to be persisted between calls.
I wish there was a better way, but without ROW_NUMBER() you must resort to some hackery to get the behavior you want.

MySQL update statement to store ranking positions

I'm trying to get my head around a query and I just can't figure it out. I would appreciate if someone give me a pointer. As a simple example of what I'm trying to achieve, I have these records in the database
Score|Ranking
-------------
100 |0
200 |0
300 |0
And I would like the Ranking field to contain 1,2,3 based on who's got the highest score so the result should be:
Score|Ranking
-------------
100 |3
200 |2
300 |1
At the moment, I'm doing a for next loop for all these records but given that in reality that could be a few thousand - that could take forever! Does anyone have an idea on a magic query which would do this in one go?

Here's a way to do it:
SET #r=0;
UPDATE table SET Ranking= #r:= (#r+1) ORDER BY Score DESC;
/* use this if you just want to pull it from the db, but don't update anything */
SET #r=0;
SELECT *, #r:= (#r+1) as Ranking FROM table ORDER BY Score DESC;

In MySQL, you can use row_number.
Here's an example of using it in a SELECT:
select #rownum:=#rownum+1 ‘rank’, p.*
from player p, (SELECT #rownum:=0) r
order by score desc;
If you INSERT INTO using a SELECT like this, you will get your rankings.

This creates an inline update statement that will rank your players incrementing by the variable #rc. I've used it many times in very similar cases, it works well and keeps it all on the DB side.
SET #rc = 0;
UPDATE players JOIN (SELECT #rc := #rc + 1 AS rank, id FROM players ORDER BY rank DESC)
AS order USING(id) SET players.rank = order.rank;
id is assumed to be the primary key for your players table.

If you are using MySQL 8 so can you use the new function RANK()
SELECT
score,
RANK() OVER (
ORDER BY score DESC
) ranking
FROM
table;
Depending on how you want to display the ranking with even score so can you also check out DENSE_RANK()
And as an UPDATE:
WITH
ranking AS(
SELECT
score,
RANK() OVER (
ORDER BY score DESC
) ranking
FROM
table
)
UPDATE
table,
ranking r
SET
table.ranking = r.ranking
WHERE
table.score = r.score

SET #r = 0;
UPDATE players JOIN (SELECT #r := #r + 1 AS rank, id FROM players ORDER BY rank DESC)
AS sorted USING(id) SET players.rank = sorted.rank;

i'm showing you my way of doing it [for interval sql update functions]
select:
set #currentRank = 0,
#lastRating = null,
#rowNumber = 1;
select
*,
#currentRank := if(#lastRating = `score`, #currentRank, #rowNumber) `rank`,
#rowNumber := #rowNumber + if(#lastRating = `score`, 0, 1) `rowNumber`,
#lastRating := `score`
from `table`
order by `score` desc
update:
set #currentRank = 0,
#lastRating = null,
#rowNumber = 1;
update
`table` r
inner join (
select
`primaryID`,
#currentRank := if(#lastRating = `score`, #currentRank, #rowNumber) `rank`,
#rowNumber := #rowNumber + if(#lastRating = `score`, 0, 1) `rowNumber`,
#lastRating := `score`
from `table`
order by `score` desc
) var on
var.`primaryID` = r.`primaryID`
set
r.`rank` = var.`rank`
i did not make any performance checks on this one except for testing that it works

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL: Effectively group into equally-sized buckets by data - mysql

Related

SQL query to count rows grouped by an ID, but limit count on each group

Limit result amount for each ID to x

Getting latest rows in MySQL based on date (grouped by another column)

getting the ranking of the rows in mysql ORDER BY statements

MySQL update statement to store ranking positions

Categories

Resources