MySQL changing numbers per batch - mysql

I have table with column like this:
id (int auto inc primary), title (varchar), batch (int)
which characteristic is having 3 item per batch. But in this case the batch is not in sequence
Here the sample data:
1,a,5
2,b,5
3,c,5
4,d,7
5,e,7
6,f,7
7,g,10
8,h,10
Is there any query to update those batches become like auto increment? (order by batch) So batch 5 become 1, batch 7 become 2, batch 10 become 3 and so on.
TL;DR: I want to update sequence per batch, not per row
Thanks for any help.

You can do it like this:
UPDATE t
JOIN (
SELECT
t.*,
#newbatch := IF (#rc % 3 = 0, #newbatch + 1, #newbatch) AS newbatch,
#rc := #rc + 1
FROM t
, (SELECT #rc := 0, #newbatch := 0) var_init_subquery
) sq ON t.id = sq.id
SET t.batch = newbatch;
see it working live in an sqlfiddle
If you have questions about how this works, please read this manual entry about user defined variables first.

Related

Determine if date range is contained in another range in MySQL

I'm looking for a method to do this in a "clean" way (not 3..n cross JOINS), just want to know if it's possible to do it in sql, if not I'll go for another solution.
Will use numbers instead of dates for simplification
I have n rows with n tasks and n items
task item start end
1 1 1 5
1 2 2 6
1 3 0 4
1 4 8 10
In this case I'm looking to use the min(start) max(end) of the overlapping dates so the result will be:
task item start end
1 1,2,3 0 6
1 4 8 10
Any ideas of how to resolve it in sql? is like a challenge, if can't do it this way I'll go to python.
Thank you
This similar to the problem I answered here, and similar data "island" problems. However, it is more complicated in your case as the identification of the "islands" will need to be calculated from more than just the record immediately prior.
It will end up looking something like this:
SET #iEnd = -1; /* init value should be something you don't expect to see */
SET #task = -1; /* init value should be something you don't expect to see */
SET #isNewIsland = 0 /* init value doesn't actually matter */;
SET #i = 0;
SELECT islandNum
, GROUP_CONCAT(item ORDER BY item) AS items
, MIN(start) AS iStart
, MAX(end) AS iEnd
FROM (
SELECT #isNewIsland := IF(#task <> task OR start > #iEnd, 1, 0)
, #task := task, item, start, end
, #i := IF(#isNewIsland = 1, #i + 1, #i) AS islandNum
, #end := IF(#isNewIsland = 1, end, GREATEST(end, #iEnd))
FROM ( /* Session(#) variables evaluation can be a bit unpredictable
the subquery helps guarantee ordering before evaluation */
SELECT task, item, start, end
FROM theTable
ORDER BY task, start, end
) AS subQ
) AS subQ 2
some are not fond of needing the separate, preceding SET statements; to avoid the need, replace ) AS subQ with
) AS subQ, (SELECT #iEnd := -1, #task := -1, #isNewIsland := 0, #i := 0) AS sInit

Leaderboard Ranking in MySQL with viewer in middle of resultset

I have a Leaderboard that looks like this:
|--------------------------------------|
| userId | allTimePoints | allTimeRank |
|--------------------------------------|
| .. | ... | ... |
| xx | 5555555 | ? |
| .. | ... | ... |
----------------------------------------
Let's assume the table has a million records, and that allTimePoints is updated constantly. When a user asks to see the Leaderboard, I'd like to be able to show them their rank, score, as well as their closest competitors. I'd like to achieve the following:
figure out the rank of each user (sort table by allTimePoints DESC)
figure out paging offset so that leaderboard viewer is in the middle of reduced resultset
do it within an acceptable runtime (e.g. create perception of instant response even if hundreds of thousands of other users are also hitting the Leaderboard screen at the same time)
I've started like this and this takes about 0.4sec on my machine when the table has 1mil rows.
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC;
Btw, the runtime benefit of this method over using a self-join, is described here (it's much faster): http://code.openark.org/blog/mysql/sql-ranking-without-self-join
I keep rowIndex and rank as separate variables, so that I can calculate the requesting user's paging offset more accurately if there are rank ties (i.e., n users have same score).
So far so good, although I fear that if this doesn't reduce to msec runtime, it won't be viable when hundreds of thousands of users run the query simultaneously.
To make matters worse, if I expand this query to work correctly with paging as described above, then runtime increases to 1.5sec
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
SELECT sortedL.userId, sortedL.rank, sortedL.allTimePoints
FROM
(SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
-- simulate paging, as LIMIT doesn't seem to accept variables
WHERE sortedL.rowIndex > sortedL.requestedOffset -15 AND sortedL.rowIndex < sortedL.requestedOffset + 15;
This returns 29 users and the requesting user is in the middle, as desired.
If I run this with EXPLAIN, I can see that the subquery is using a FILESORT, but the results are not indexed, and hence the outer SELECT is forced to do yet another full scan of the resultset using WHERE (slower than FILESORT).
Questions (1): how can I optimize this?
Another idea was to store the ranking in an indexed column: allTimeRank. I thought I'd experiment with sorting the table in a procedure on a schedule (say, every 10 min), and then offer very quick access with a simpler SELECT that would utilize the index. I haven't managed to get this to work properly, it doesn't seem to be using the condition in my WHERE clause (the ranking stored in allTimeRank is incorrect, and MySQL complains so I have to turn off safe updates to get it to even run)
SET SQL_SAFE_UPDATES=0;
SET #rowIndex := 0;
SET #rank := 0;
SET #prev := NULL;
SET #userIdPosition := 0;
UPDATE Leaderboard L,
(SELECT
#rowIndex := #rowIndex+1 AS rowIndex,
userId,
#rank := IF(#prev=allTimePoints, #rank, #rank+1) AS rank,
#prev := allTimePoints AS allTimePoints,
#userIdPosition := IF(userId=1860, #rowIndex, #userIdPosition) AS requestedOffset
FROM Leaderboard
ORDER BY allTimePoints DESC) AS sortedL
SET L.allTimeRank = sortedL.rank
WHERE sortedL.userId = L.userId;
SET SQL_SAFE_UPDATES=1;
Question (2): how do I make the WHERE condition work.
This has taken between 2min and 12 sec to run. Not sure why the inconsistency. In any case, this will block UPDATEs from users that are winning points, giving the sense that the app has hung. Question (3): is there a work around?
First thing, you are not computing Rank correctly. If there are three players: Britney(100 pts), Rachel(100 pts), and Susan(75 pts), then Britney and Rachel each have a rank of 1, and Susan should have a rank of 3. Your routine would give Susan a rank of 2.
Second, when players have the same score (and rank) they should display in a consistent order. The order within tied scores/ranks should be the order in which she attained that score.
I would add two columns to the table: allTimeRank, and allTimeRankOrder. And update in real time every time the points change. Realize that if my score goes from 100 to 125, the only users that need to be reranked are those that had scores from 100-124 -- just the people I jumped over.
Here is a routine to do it. It assumes points always go up, never down. I don't have a million row table to test with, but if you have the right indexes set up I hope it will run pretty fast.
CREATE PROCEDURE `updateUserPoints`(IN `puserid` VARCHAR(10), IN `pnewPoints` INT)
BEGIN
SET #currPoints = 0;
SET #currRank = 0;
SET #currRankOrder = 0;
SELECT allTimePoints, allTimeRank, allTimeRankOrder INTO #currPoints, #currRank, #currRankOrder from Leaderboard where userid = puserid;
SET #newRank = 0;
SET #newRankOrder = 0;
SELECT max(allTimeRank), max(allTimeRankOrder)+1 INTO #newRank, #newRankOrder FROM Leaderboard WHERE allTimePoints = pnewPoints;
IF (#newRank IS NULL) THEN
SET #newRank = (SELECT min(allTimeRank) from Leaderboard WHERE allTimePoints < pnewPoints);
SET #newRankOrder = 0;
END IF;
UPDATE Leaderboard
SET allTimePoints = pnewPoints,
allTimeRank = #newRank,
allTimeRankOrder = #newRankOrder
WHERE userid = puserid;
/* all the people that I was tied with, but ahead in order,
slide up one in the order */
UPDATE Leaderboard
SET allTimeRankOrder = allTimeRankOrder - 1
WHERE allTimeRank = #currRank
AND allTimeRankOrder > #currRankOrder;
/* did I jump anyone? Their rank goes down. */
UPDATE Leaderboard
SET allTimeRank = allTimeRank + 1
WHERE userid <> puserid
AND allTimePoints >= #currPoints
AND allTimePoints < pnewPoints;
END

Slow mySql update containing Join

I have a system that collects data from production reports (CSV files) and puts them into a mySql DB.
I have an header table, that contain the production data of sequential report with same setting, and a table with the single reports, connected to the first one (trfCamRep.hdrId -> trfCamHdr.id).
I have a query to calculate the total report, the dubt and the faulty, and the maxTs. These datas are used in the visualizator.
The query is too slow, it requires 9sec.
Can you help me to speed up it?
SET #maxId:=(SELECT MAX(id) FROM trfCamHdr WHERE srcCod='7');
UPDATE trfCamHdr AS hdr
LEFT JOIN (SELECT hdrF.id,COUNT(*) AS nTot,
SUM(IF(res=1,1,0)) AS nWrn,SUM(IF(res=2,1,0)) AS nKO,
MAX(ts) AS maxTS
FROM trfCamHdr AS hdrF
JOIN trfCamRep AS repF ON repF.hdrId=hdrF.id
WHERE clcEnd=0 AND srcCod='7'
GROUP BY hdrF.id) AS valT ON valT.id=hdr.id
SET hdr.clcEnd=IF(hdr.id<#maxId,1,0),
hdr.nTot=valT.nTot,
hdr.nWrn=valT.nWrn,
hdr.nKO=valT.nKO,
hdr.maxTS=valT.maxTS
WHERE hdr.id>=0 AND hdr.clcEnd=0 AND hdr.srcCod='7';
Note trfCamHdr has these columns:
id (primary key)
clcEnd : flag of end calculation (the last remain to 0 because in progress)
nTot : elements with this header
nWrn : elements with res = 1
nKO : elements with res = 2
maxTs : TS of the last element
trfCamRep has these columns:
hdrId (refer to id of trfCamHdr)
res : 0 good, 1 dubt, 2 fault
ts : report timestamp
I'd take this out:
SET #maxId:=(SELECT MAX(id) FROM trfCamHdr WHERE srcCod='7');
And any allusions to the MaxId variable, I believe it to be redundant.
Everything you do will be lower than the max id, and it will take time to calculate if its a big table. You are already checking for srcCod = 7, so it isn't necessary.
In fact, it would miss the update on the one with the actual max id, which is not what I believe you want.
Your left join will also update all other rows in the table with NULL, is that what you want? You could switch that to an inner join, and if your rows are already null, they will just get left alone, rather than getting updated with NULL again.
Then you could just switch out this:
SET
hdr.clcEnd = IF(hdr.id < #maxId, 1, 0),
To
SET
hdr.clcEnd = 1,
Here is the rewritten thing, as always, back your data up before trying:
UPDATE trfCamHdr AS hdr
INNER JOIN
(SELECT
hdrF.id,
COUNT(*) AS nTot,
SUM(IF(res = 1, 1, 0)) AS nWrn,
SUM(IF(res = 2, 1, 0)) AS nKO,
MAX(ts) AS maxTS
FROM
trfCamHdr AS hdrF
JOIN trfCamRep AS repF ON repF.hdrId = hdrF.id
WHERE
clcEnd = 0 AND srcCod = '7'
GROUP BY hdrF.id) AS valT ON valT.id = hdr.id
SET
hdr.clcEnd = 1,
hdr.nTot = valT.nTot,
hdr.nWrn = valT.nWrn,
hdr.nKO = valT.nKO,
hdr.maxTS = valT.maxTS
WHERE
hdr.id >= 0 AND hdr.clcEnd = 0
AND hdr.srcCod = '7';
I found the solution: I created a KEY on hdrId column and now the query requires 0.062s.

Update multiple rows, but only first row with different value

table
create table tst(locationId int,
scheduleCount tinyint(1) DEFAULT 0,
displayFlag tinyint(1) DEFAULT 0);
INSERT INTO tst(locationId,scheduleCount)
values(5,0),(2,0),(5,1),(5,2),(2,1),(2,2);
I update multiple rows and multiple columns with one query, but want to change the one of the columns only for the first row and keep the other things the same for that column.
I want to update all the rows with some location id and change displayFlag to 1 and increment scheduleCount of only the top entry with 1 , rest would remain the same
**Query **
update tst,(select #rownum:=0) r,
set tst.displayFlag =1,
scheduleCount = (CASE WHEN #rownum=0
then scheduleCount+1
ELSE scheduleCount
END),
#rownum:=1 where locationId = 5
But it gives error and does not set the user defined variable rownum, I am able to join the tables in a select and change the value of the rownum, is there any other way to update the values.
I'm not sure this is the correct way of doing such a thing, but it is possible to include the user variable logic in the CASE condition:
UPDATE tst
JOIN (SELECT #first_row := 1) r
SET tst.displayFlag = 1,
scheduleCount = CASE
WHEN #first_row = 1 AND ((#first_row := 0) OR TRUE) THEN scheduleCount+1
ELSE scheduleCount
END
WHERE locationId = 5;
I have used a #first_row flag as this is more inline with your initial attempt.
The CASE works as follows:
On the first row #first_row = 1 so the second part of the WHEN after AND is processed, setting #first_row := 0. Unfortunately for us, the assignment returns 0, hence the OR TRUE to ensure the condition as a whole is TRUE. Thus scheduleCount + 1 is used.
On the second row #first_row != 1 so the condition is FALSE, the second part of the WHEN after AND is not processed and the ELSE scheduleCount is used.
You can see it working in this SQL Fiddle. Note; I have had to set the column types to TINYINT(3) to get the correct results.
N.B. Without an ORDER BY there is no guarantee as to what the '1st' row will be; not even that it will be the 1st as returned by a SELECT * FROM tst.
UPDATE
Unfortunately one cannot add an ORDER BY if there is a join.. so you have a choice:
Initialise #first_row outside the query and remove the JOIN.
Otherwise you are probably better off rewriting the query to something similar to:
UPDATE tst
JOIN (
SELECT locationId,
scheduleCount,
displayFlag,
#row_number := #row_number + 1 AS row_number
FROM tst
JOIN (SELECT #row_number := 0) init
WHERE locationId = 5
ORDER BY scheduleCount DESC
) tst2
ON tst2.locationId = tst.locationId
AND tst2.scheduleCount = tst.scheduleCount
AND tst2.displayFlag = tst.displayFlag
SET tst.displayFlag = 1,
tst.scheduleCount = CASE
WHEN tst2.row_number = 1 THEN tst.scheduleCount+1
ELSE tst.scheduleCount
END;
Or write two queries:
UPDATE tst
SET displayFlag = 1
WHERE locationId = 5;
UPDATE tst
SET scheduleCount = scheduleCount + 1
WHERE locationId = 5
ORDER BY scheduleCount DESC
LIMIT 1;

Coldfusion CFScript Query with MySQL Assignment Operator

I want to select currentrow as part of my query - I know I can loop over queries and get the currentrow variable, but I'm doing a QoQ before I use the rows and I want to keep the original rows, e.g.
//Original query
1, Audi
2, BMW
3, Skoda
//QoQ
1, Audi
3, Skoda
This is the code I've got:
q = new Query( datasource = application.db.comcar );
q.setSQL('
SELECT make, #rownum := #rownum +1 AS `rownumber`
FROM cars, ( SELECT #rownum :=0 )
LIMIT 10
');
r = q.execute().getResult();
But it's throwing the following error:
Parameter '=' not found in the list of parameters specified
SQL: SELECT make, #rownum := #rownum + 1 AS `rownumber` FROM cars, ( SELECT #rownum :=0 ) LIMIT 10
This will work in cfquery but I'd like to use it in CFScript. Is there an alternative to using := or some way of escaping this in the query.
It looks like this is a bug in Coldfusion. I could change my code to use cfquery but I'd rather not mix script and tags in my page.
So my workaround was is as follows:
/*
* based on the existing query 'tmpFields'
*/
// build array of row numbers
arrRowNumbers = [];
cntRowNumbers = tmpFields.recordCount;
for( r = 1; r <= cntRowNumbers; r++ ) {
arrayAppend( arrRowNumbers, r );
}
// add a new column with the new row number array
queryAddColumn( tmpFields, "fieldNumber", "integer", arrRowNumbers );