I have a Leaderboard that looks like this:
|--------------------------------------|
| userId | allTimePoints | allTimeRank |
|--------------------------------------|
| .. | ... | ... |
| xx | 5555555 | ? |
| .. | ... | ... |
----------------------------------------
Let's assume the table has a million records, and that allTimePoints is updated constantly. When a user asks to see the Leaderboard, I'd like to be able to show them their rank, score, as well as their closest competitors. I'd like to achieve the following:
figure out the rank of each user (sort table by allTimePoints DESC)
figure out paging offset so that leaderboard viewer is in the middle of reduced resultset
do it within an acceptable runtime (e.g. create perception of instant response even if hundreds of thousands of other users are also hitting the Leaderboard screen at the same time)
I've started like this and this takes about 0.4sec on my machine when the table has 1mil rows.
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC;
Btw, the runtime benefit of this method over using a self-join, is described here (it's much faster): http://code.openark.org/blog/mysql/sql-ranking-without-self-join
I keep rowIndex and rank as separate variables, so that I can calculate the requesting user's paging offset more accurately if there are rank ties (i.e., n users have same score).
So far so good, although I fear that if this doesn't reduce to msec runtime, it won't be viable when hundreds of thousands of users run the query simultaneously.
To make matters worse, if I expand this query to work correctly with paging as described above, then runtime increases to 1.5sec
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
SELECT sortedL.userId, sortedL.rank, sortedL.allTimePoints
FROM
(SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
-- simulate paging, as LIMIT doesn't seem to accept variables
WHERE sortedL.rowIndex > sortedL.requestedOffset -15 AND sortedL.rowIndex < sortedL.requestedOffset + 15;
This returns 29 users and the requesting user is in the middle, as desired.
If I run this with EXPLAIN, I can see that the subquery is using a FILESORT, but the results are not indexed, and hence the outer SELECT is forced to do yet another full scan of the resultset using WHERE (slower than FILESORT).
Questions (1): how can I optimize this?
Another idea was to store the ranking in an indexed column: allTimeRank. I thought I'd experiment with sorting the table in a procedure on a schedule (say, every 10 min), and then offer very quick access with a simpler SELECT that would utilize the index. I haven't managed to get this to work properly, it doesn't seem to be using the condition in my WHERE clause (the ranking stored in allTimeRank is incorrect, and MySQL complains so I have to turn off safe updates to get it to even run)
SET SQL_SAFE_UPDATES=0;
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
UPDATE Leaderboard L,
(SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
SET L.allTimeRank = sortedL.rank
WHERE sortedL.userId = L.userId;
SET SQL_SAFE_UPDATES=1;
Question (2): how do I make the WHERE condition work.
This has taken between 2min and 12 sec to run. Not sure why the inconsistency. In any case, this will block UPDATEs from users that are winning points, giving the sense that the app has hung. Question (3): is there a work around?
First thing, you are not computing Rank correctly. If there are three players: Britney(100 pts), Rachel(100 pts), and Susan(75 pts), then Britney and Rachel each have a rank of 1, and Susan should have a rank of 3. Your routine would give Susan a rank of 2.
Second, when players have the same score (and rank) they should display in a consistent order. The order within tied scores/ranks should be the order in which she attained that score.
I would add two columns to the table: allTimeRank, and allTimeRankOrder. And update in real time every time the points change. Realize that if my score goes from 100 to 125, the only users that need to be reranked are those that had scores from 100-124 -- just the people I jumped over.
Here is a routine to do it. It assumes points always go up, never down. I don't have a million row table to test with, but if you have the right indexes set up I hope it will run pretty fast.
CREATE PROCEDURE `updateUserPoints`(IN `puserid` VARCHAR(10), IN `pnewPoints` INT)
BEGIN
SET #currPoints = 0;
SET #currRank = 0;
SET #currRankOrder = 0;
SELECT allTimePoints, allTimeRank, allTimeRankOrder INTO #currPoints, #currRank, #currRankOrder from Leaderboard where userid = puserid;
SET #newRank = 0;
SET #newRankOrder = 0;
SELECT max(allTimeRank), max(allTimeRankOrder)+1 INTO #newRank, #newRankOrder FROM Leaderboard WHERE allTimePoints = pnewPoints;
IF (#newRank IS NULL) THEN
SET #newRank = (SELECT min(allTimeRank) from Leaderboard WHERE allTimePoints < pnewPoints);
SET #newRankOrder = 0;
END IF;
UPDATE Leaderboard
SET allTimePoints = pnewPoints,
allTimeRank = #newRank,
allTimeRankOrder = #newRankOrder
WHERE userid = puserid;
/* all the people that I was tied with, but ahead in order,
slide up one in the order */
UPDATE Leaderboard
SET allTimeRankOrder = allTimeRankOrder - 1
WHERE allTimeRank = #currRank
AND allTimeRankOrder > #currRankOrder;
/* did I jump anyone? Their rank goes down. */
UPDATE Leaderboard
SET allTimeRank = allTimeRank + 1
WHERE userid <> puserid
AND allTimePoints >= #currPoints
AND allTimePoints < pnewPoints;
END
Related
Question:
I looked at various other examples to increment over rows, but all resulted in the same wrong output. The problem which I encountered was that my code did not successfully increment over rows to build a correct index per new row in the result-set per episode (highlighted in red below).
My first try was:
SET #ep_1 = "Peaky Blinders";
SET #curRow_1 = 0;
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#curRow_1 := #curRow_1 + 1 AS row_number,
#ep_1 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = "xyz"
AND created_at >= "2019-07-01" AND created_at <= "2019-07-07"
GROUP BY 1
Other than the rows not incrementing correctly; I also got the following error when I tried setting some variables in the beginning of my code:
Error running query: Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation '='
(Note: I have no affiliation with Netflix, I just used Netflix dummy data to answer my question)
I broke down my question in various sections and got to the final answer below.
The most important part was to add the initial result-sets into a subqueries, and thereafter select the data from tables x1,x2, etc.
The second part of the question was, how to combine multiple datasets together (in my case: how do one not only do it for one specific netflix episode, but multiple episodes)? I settled on the UNION ALL - clause.
In the first iteration I tried hard-coding the dates, and thereafter found the INTERVAL-function very helpful.
Finally, the unicode-error I fixed by adding COLLATE utf8_unicode_ci after setting my variables.
If you find mistakes in my code or have any other suggestions, please feel free to suggest them.
-- SET DATA
-- variables for table x1
SET #ep_1 = "Peaky Blinders" COLLATE utf8_unicode_ci;
SET #id_1 = (SELECT id FROM netflix.episodes WHERE episode_title = #ep_1);
SET #date_1 = (SELECT created_at FROM netflix.episodes WHERE episode_title = #ep_1);
SET #curRow_1 = 0;
-- variables for table x2
SET #ep_2 = "Brooklyn Nine-Nine" COLLATE utf8_unicode_ci;
SET #id_2 = (SELECT id FROM netflix.episodes WHERE episode_title = #ep_2);
SET #date_2 = (SELECT created_at FROM netflix.episodes WHERE episode_title = #ep_2);
SET #curRow_2 = 0;
-- QUERY DATA
SELECT
x1.year_month_day,
#curRow_1 := #curRow_1 + 1 AS row_number,
x1.episode_title,
x1.episode_plays
FROM (
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#ep_1 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = #id_1
AND created_at >= #date_1 AND created_at <= DATE_ADD(#date_1 , INTERVAL 7 DAY)
GROUP BY 1) x1
UNION ALL
SELECT
x2.year_month_day,
#curRow_2 := #curRow_2 + 1 AS row_number,
x2.episode_title,
x2.episode_plays
FROM (
SELECT
DATE_FORMAT(created_at, "%Y%m%d") AS year_month_day,
#ep_2 AS episode_title,
COUNT(id) AS episode_plays
FROM netflix.episode_plays
WHERE
episode_id = #id_2
AND created_at >= #date_2 AND created_at <= DATE_ADD(#date_2 , INTERVAL 7 DAY)
GROUP BY 1) x2
I'm looking for a method to do this in a "clean" way (not 3..n cross JOINS), just want to know if it's possible to do it in sql, if not I'll go for another solution.
Will use numbers instead of dates for simplification
I have n rows with n tasks and n items
task item start end
1 1 1 5
1 2 2 6
1 3 0 4
1 4 8 10
In this case I'm looking to use the min(start) max(end) of the overlapping dates so the result will be:
task item start end
1 1,2,3 0 6
1 4 8 10
Any ideas of how to resolve it in sql? is like a challenge, if can't do it this way I'll go to python.
Thank you
This similar to the problem I answered here, and similar data "island" problems. However, it is more complicated in your case as the identification of the "islands" will need to be calculated from more than just the record immediately prior.
It will end up looking something like this:
SET #iEnd = -1; /* init value should be something you don't expect to see */
SET #task = -1; /* init value should be something you don't expect to see */
SET #isNewIsland = 0 /* init value doesn't actually matter */;
SET #i = 0;
SELECT islandNum
, GROUP_CONCAT(item ORDER BY item) AS items
, MIN(start) AS iStart
, MAX(end) AS iEnd
FROM (
SELECT #isNewIsland := IF(#task <> task OR start > #iEnd, 1, 0)
, #task := task, item, start, end
, #i := IF(#isNewIsland = 1, #i + 1, #i) AS islandNum
, #end := IF(#isNewIsland = 1, end, GREATEST(end, #iEnd))
FROM ( /* Session(#) variables evaluation can be a bit unpredictable
the subquery helps guarantee ordering before evaluation */
SELECT task, item, start, end
FROM theTable
ORDER BY task, start, end
) AS subQ
) AS subQ 2
some are not fond of needing the separate, preceding SET statements; to avoid the need, replace ) AS subQ with
) AS subQ, (SELECT #iEnd := -1, #task := -1, #isNewIsland := 0, #i := 0) AS sInit
I have table with column like this:
id (int auto inc primary), title (varchar), batch (int)
which characteristic is having 3 item per batch. But in this case the batch is not in sequence
Here the sample data:
1,a,5
2,b,5
3,c,5
4,d,7
5,e,7
6,f,7
7,g,10
8,h,10
Is there any query to update those batches become like auto increment? (order by batch) So batch 5 become 1, batch 7 become 2, batch 10 become 3 and so on.
TL;DR: I want to update sequence per batch, not per row
Thanks for any help.
You can do it like this:
UPDATE t
JOIN (
SELECT
t.*,
#newbatch := IF (#rc % 3 = 0, #newbatch + 1, #newbatch) AS newbatch,
#rc := #rc + 1
FROM t
, (SELECT #rc := 0, #newbatch := 0) var_init_subquery
) sq ON t.id = sq.id
SET t.batch = newbatch;
see it working live in an sqlfiddle
If you have questions about how this works, please read this manual entry about user defined variables first.
table
create table tst(locationId int,
scheduleCount tinyint(1) DEFAULT 0,
displayFlag tinyint(1) DEFAULT 0);
INSERT INTO tst(locationId,scheduleCount)
values(5,0),(2,0),(5,1),(5,2),(2,1),(2,2);
I update multiple rows and multiple columns with one query, but want to change the one of the columns only for the first row and keep the other things the same for that column.
I want to update all the rows with some location id and change displayFlag to 1 and increment scheduleCount of only the top entry with 1 , rest would remain the same
**Query **
update tst,(select #rownum:=0) r,
set tst.displayFlag =1,
scheduleCount = (CASE WHEN #rownum=0
then scheduleCount+1
ELSE scheduleCount
END),
#rownum:=1 where locationId = 5
But it gives error and does not set the user defined variable rownum, I am able to join the tables in a select and change the value of the rownum, is there any other way to update the values.
I'm not sure this is the correct way of doing such a thing, but it is possible to include the user variable logic in the CASE condition:
UPDATE tst
JOIN (SELECT #first_row := 1) r
SET tst.displayFlag = 1,
scheduleCount = CASE
WHEN #first_row = 1 AND ((#first_row := 0) OR TRUE) THEN scheduleCount+1
ELSE scheduleCount
END
WHERE locationId = 5;
I have used a #first_row flag as this is more inline with your initial attempt.
The CASE works as follows:
On the first row #first_row = 1 so the second part of the WHEN after AND is processed, setting #first_row := 0. Unfortunately for us, the assignment returns 0, hence the OR TRUE to ensure the condition as a whole is TRUE. Thus scheduleCount + 1 is used.
On the second row #first_row != 1 so the condition is FALSE, the second part of the WHEN after AND is not processed and the ELSE scheduleCount is used.
You can see it working in this SQL Fiddle. Note; I have had to set the column types to TINYINT(3) to get the correct results.
N.B. Without an ORDER BY there is no guarantee as to what the '1st' row will be; not even that it will be the 1st as returned by a SELECT * FROM tst.
UPDATE
Unfortunately one cannot add an ORDER BY if there is a join.. so you have a choice:
Initialise #first_row outside the query and remove the JOIN.
Otherwise you are probably better off rewriting the query to something similar to:
UPDATE tst
JOIN (
SELECT locationId,
scheduleCount,
displayFlag,
#row_number := #row_number + 1 AS row_number
FROM tst
JOIN (SELECT #row_number := 0) init
WHERE locationId = 5
ORDER BY scheduleCount DESC
) tst2
ON tst2.locationId = tst.locationId
AND tst2.scheduleCount = tst.scheduleCount
AND tst2.displayFlag = tst.displayFlag
SET tst.displayFlag = 1,
tst.scheduleCount = CASE
WHEN tst2.row_number = 1 THEN tst.scheduleCount+1
ELSE tst.scheduleCount
END;
Or write two queries:
UPDATE tst
SET displayFlag = 1
WHERE locationId = 5;
UPDATE tst
SET scheduleCount = scheduleCount + 1
WHERE locationId = 5
ORDER BY scheduleCount DESC
LIMIT 1;
I have a mysql query where I need to calculate values like ROUND(SUM(temp.total_pq),2) multiple times, so I defined variables to avoid repeating them.
But the line 5 in the query returns wrong value in the results. The value for #diff_client_partner_qtty := ROUND((#partner_qtty_all_runs - #client_qtty_all_runs), 2) AS diff_client_partner_qtty is always NULL the first time I run and thereafter always 84.
I asked the in-house DBA and he says I should not use variables in my query like this because the order in which mysql will set values for the variable is not predictable and hence I may get NULL value.
But why? Also can someone please propose then another way whereby I can avoid rewriting ROUND(SUM(temp.total_pq),2) multiple times other than a subquery. I would prefer to avoid a subquery because I think even in its current form query is not that readable.
SELECT temp.dtaccounted AS accounting_period,
#partner_qtty_all_runs := ROUND(SUM(temp.total_pq),2) AS partner_qtty_all_runs,
ROUND(temp.mmq,2) AS mopay_qtty,
#client_qtty_all_runs := ROUND(SUM(temp.total_cq),2) AS client_qtty_all_runs,
#diff_client_partner_qtty := ROUND((#partner_qtty_all_runs - #client_qtty_all_runs), 2) AS diff_client_partner_qtty,
#partner_gtv := ROUND(temp.total_pq_gtv, 2) AS partner_gtv,
#client_gtv := ROUND(temp.total_cq_gtv,2) AS client_gtv,
#diff_client_partner_gtv := ROUND((#partner_gtv - #client_gtv), 2) AS diff_client_partner_gtv,
temp.stariffcurrency AS tariffcurrency,
ROUND(#diff_client_partner_gtv * ffactor, 2) AS diff_client_partner_gtv_eur,
temp.scountry AS country,
temp.spartnererpid AS partner_erp_id,
c.name AS partner_name,
temp.nproducttype AS product,
temp.capping
FROM
(SELECT SUM(npartnerquantity) AS total_pq,
SUM(nmindmaticsquantity) AS mmq,
SUM(nclientquantity) AS total_cq,
SUM(dgrosstariff * npartnerquantity) AS total_pq_gtv,
SUM(dgrosstariff * nclientquantity) AS total_cq_gtv,
nrun,
vb.scountry,
vb.spartnererpid,
dtaccounted,
stariffcurrency,
vb.nproducttype,
cq.bisenabled AS capping
FROM report_table vb,
client_table cq
WHERE vb.accperiod > '2013-12-01'
AND vb.partnerid = cq.partnerid
AND vb.scountry = cq.scountry
AND vb.nproducttype = cq.nproducttype
AND (cq.dtvalidto IS NULL
OR cq.dtvalidto > vb.accperiod)
GROUP BY scountry,
nproducttype,
partnerid,
nrun,
accperiod
) temp,
customer c,
currency_conversion cc
WHERE temp.partnerid = c.erp_id
AND temp.total_pq <> temp.total_cq
AND cc.scurrencyfrom = temp.stariffcurrency
AND cc.scurrencyto = 'EUR'
AND cc.dtrefdate = temp.accperiod
GROUP BY temp.scountry,
temp.partnerid,
c.name,
temp.nproducttype,
temp.accperiod
ORDER BY temp.accperiod,
temp.scountry,
temp.partnerid,
temp.nproducttype,
temp.capping \G;