MYSQL query winning streak including score - mysql

I have a query-generated table that counts up the winning streak as long as the player keeps winning. When they player gets a positive score, the streak rises with 1, if he gets a negative score, the streak falls back to 0. The table looks like this:
+--------+------------------+--------+--------+
| player | timestamp | points | streak |
+--------+------------------+--------+--------+
| John | 22/11/2012 23:01 | -2 | 0 |
| John | 22/11/2012 23:02 | 3 | 1 |
| John | 22/11/2012 23:04 | 5 | 2 |
| John | 22/11/2012 23:05 | -2 | 0 |
| John | 22/11/2012 23:18 | 15 | 1 |
| John | 23/11/2012 23:20 | 5 | 2 |
| Chris | 27/11/2012 22:12 | 20 | 1 |
| Chris | 27/11/2012 22:14 | -12 | 0 |
| Chris | 27/11/2012 22:17 | 4 | 1 |
| Chris | 27/11/2012 22:18 | -4 | 0 |
| Chris | 27/11/2012 22:20 | 10 | 1 |
| Chris | 27/11/2012 22:21 | 20 | 2 |
| Chris | 27/11/2012 22:22 | 90 | 3 |
+--------+------------------+--------+--------+
I would like to get the players maximum streak, which is easy to get ofcourse, but I would also like to include the points that the player scored in that particular streak. So, for the above example the result would have to look like this:
+--------+--------+-----------+
| player | points | maxstreak |
+--------+--------+-----------+
| John | 20 | 2 |
| Chris | 120 | 3 |
+--------+--------+-----------+
Any idea's of how I could achieve this? Thanks in advance!

I have not had a chance to actually try this, but is SHOULD work using mySQL Variables...
At the beginning, the inner-most query just queries from your scores table and forces the data in order of player and timestamp. From that, I have to process sequentially with MySQL variables. First thing... on each new record being processed, if I am on a different "Player" (which should ACTUALLY be based on an ID instead of name), I am resetting the streak, points, maxStreak, maxStreakPoints to zero, THEN setting the last user to whoever its about to process.
Immediately after that, I am checking for the streak status, points, etc...
Once all have been tabulated, I then use the OUTERMOST query to get on a per-player basis, what their highest max streak / max streak points.
SELECT
Final.Player,
MAX( Final.MaxStreak ) MaxStreak,
MAX( Final.MaxStreakPoints ) MaxStreakPoints
FROM
(
SELECT
PreOrd.Player,
PreOrd.TimeStamp,
PreOrd.Points,
#nStreak := case when PreOrd.Points < 0 then 0
when PreOrd.Player = #cLastPlayer then #nStreak +1
else 1 end Streak,
#nStreakPoints := case when #nStreak = 1 then PreOrd.Points
when #nStreak > 1 then #nStreakPoints + PreOrd.Points
else 0 end StreakPoints,
#nMaxStreak := case when PreOrd.Player != #cLastPlayer then #nStreak
when #nStreak > #nMaxStreak then #nStreak
else #nMaxStreak end MaxStreak,
#nMaxStreakPoints := case when PreOrd.Player != #cLastPlayer then #nStreakPoints
when #nStreak >= #nMaxStreak and #nStreakPoints > #nMaxStreakPoints then #nStreakPoints
else #nMaxStreakPoints end MaxStreakPoints,
#cLastPlayer := PreOrd.Player PlayerChange
FROM
( select
S.Player,
S.TimeStamp,
S.Points
from
Scores2 S
ORDER BY
S.Player,
S.TimeStamp,
S.`index` ) PreOrd,
( select
#nStreak := 0,
#nStreakPoints := 0,
#nMaxStreak := 0,
#nMaxStreakPoints := 0,
#cLastPlayer := '~' ) SQLVars
) as Final
group by
Final.Player
Now, this could give a false max streak points, such that on a single score the person has 90 points, then a streak of 1 for 10 points, 2 for 10 points, 3 for 10 points, 30 total.. Still thinking on that though... :)
Here's what I get when I add the index column as you've made available from data supplied
SQL Fiddle Showing my solution...

My recommendation is to store additional information when you calculate the streak. For instance, you could store the time stamp when the streak began.
A less-serious recommendation is to switch to another database, that supports window functions. This would be much easier.
The approach is to find when the streak began and then sum up everything between that time and the max streak. To do this, we'll use a correlated subquery:
select t.*,
(select max(timestamp) from t t2 where t2.timestamp <= t.timestamp and t2.player = t.player and t2.streak = 0
) as StreakStartTimeStamp
from t
where t.timeStamp = (select max(streak) from t t2 where t.player = t2.player)
Now, we will embed this query as a subquery, so we can add the appropriate times:
select t.player,
sum(s.points)
from t join
(select t.*,
(select max(timestamp) from t t2 where t2.timestamp <= t.timestamp and t2.player = t.player and t2.streak = 0
) as StreakStartTimeStamp
from t
where t.streak = (select max(streak) from t t2 where t.player = t2.player)
) s
on t.player = s.player
group by t.player
I haven't tested this query, so there are probably some syntax errors. However, the approach should work. You may want to have indexes on the table, on streak and timestamp for performance reasons.

Related

Find longest consecutive time interval per object

There are Merchants and they can submit Claims.
I need to find the longest time period during which a Merchant had at least 1 claim. So a time period (in fractions of a day, whatever) per merchant_id.
So, for example:
+-------------+-----------+----------------------+----------------------+
| merchant_id | claim_id | from | to |
+-------------+-----------+----------------------+----------------------+
| 1 | 11 | 2016-08-15 12:00:00 | 2016-08-17 12:00:00 |
| 1 | 22 | 2016-08-16 12:00:00 | 2016-08-18 12:00:00 |
| 1 | 33 | 2016-08-19 12:00:00 | 2016-08-20 12:00:00 |
| 2 | 66 | 2016-08-15 12:00:00 | 2016-08-17 12:00:00 |
| 2 | 67 | 2016-08-18 12:00:00 | 2016-08-19 12:00:00 |
+-------------+-----------+----------------------+----------------------+
For merchant_id = 1 it would be 3 days.
For merchant_id = 2 it would be 2 days.
How do I do that?
Doing this alone in MySQL is really complex. I've tried for a particular merchant_id. I am not still sure if this is 100% right without checking for different set of inputs.
But you can give this a try and later I can explain the logic behind.
SELECT
firstTable.merchant_id,
MAX(TIMESTAMPDIFF(DAY,firstTable.from,secondTable.to)) AS maxConsecutiveDays
FROM
(
SELECT
A.merchant_id,
A.from,
#rn1 := #rn1 + 1 AS row_number
FROM merchants A
CROSS JOIN (SELECT #rn1 := 0) var
WHERE A.merchant_id = 2
AND NOT EXISTS (
SELECT 1 FROM merchants B WHERE B.merchant_id = A.merchant_id AND A.idt <> B.idt AND A.`from` BETWEEN B.from AND B.to
)
ORDER BY A.from
) AS firstTable
INNER JOIN (
SELECT
A.merchant_id,
A.to,
#rn2 := #rn2 + 1 AS row_number
FROM merchants A
CROSS JOIN (SELECT #rn2 := 0) var
WHERE A.merchant_id = 2
AND NOT EXISTS (
SELECT 1 FROM merchants B WHERE B.merchant_id = A.merchant_id AND A.idt <> B.idt AND A.to BETWEEN B.from AND B.to
)
ORDER BY A.to
) AS secondTable
ON firstTable.row_number = secondTable.row_number;
WORKING DEMO
Algorithm:
Let's consider the following steps for a particular merchant_id
First find all the start points which are not inside in any of the
ranges. I call this independent start points. Let's say these start
points are stored in a set S.
Second now find all the end points which are not inside in any of
the ranges. These are independent end points and are stored in a set
E.
Sort the sets in ascending order of time.
Now give a rank to every element of a set starting from 1.
Join these two sets on matching rank number.
Now enumerate the two sets simultaneously and get the difference in
days. And later find the maximum of this difference.
The last step can be illustrated by the following code snippet:
int maxDiff = 0;
for(int i=0; i< E.size(); i++){
if((E.get(i) - S.get(i) > maxDiff){
maxDiff = E.get(i) - S.get(i);
}
}
And maxDiff is your output;
EDIT:
In order to get longest consecutive days for each merchant check this DEMO

MySQL top 2 records per group

Basically I need to get only the last 2 records for each user, considering the last created_datetime:
id | user_id | created_datetime
1 | 34 | '2015-09-10'
2 | 34 | '2015-10-11'
3 | 34 | '2015-05-23'
4 | 34 | '2015-09-13'
5 | 159 | '2015-10-01'
6 | 159 | '2015-10-02'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
Returns (expected output):
2 | 34 | '2015-10-11'
1 | 34 | '2015-09-10'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
I was trying with this idea:
select user_id, created_datetime,
$num := if($user_id = user_id, $num + 1, 1) as row_number,
$id := user_id as dummy
from logs group by user_id
having row_number <= 2
The idea is keep only these top 2 rows and remove all the others.
Any ideas?
Your idea is close. I think this will work better:
select u.*
from (select user_id, created_datetime,
$num := if(#user_id = user_id, #num + 1,
if(#user_id := id, 1, 1)
) as row_number
from logs cross join
(select #user_id := 0, #num := 0) params
order by user_id
) u
where row_number <= 2 ;
Here are the changes:
The variables are set in only one expression. MySQL does not guarantee the order of evaluation of expressions, so this is important.
The work is done in a subquery, which is then processed in the outer query.
The subquery uses order by, not group by.
The outer query uses where instead of having (actually, in MySQL having would work, but where is more appropriate).

Complicated overlap in Mysql query

Here is my problem, I have a MYSQL table with the following columns and data examples :
id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3
2 | Andy | 1988-11-01 | 1991-03-01 | 3
3 | Andy | 2005-06-01 | 2008-08-01 | 3
4 | Andy | 2005-08-01 | 2008-11-01 | 3
5 | Andy | 2005-06-01 | 2010-05-01 | 4
6 | Ben | 2010-03-01 | 2011-06-01 | 3
7 | Ben | 2010-03-01 | 2010-05-01 | 4
8 | Ben | 2005-04-01 | 2011-05-01 | 3
As you can see in this table users can have same activity code and similar dates or periods. And For a same user, periods can overlap others or not. It is also possible to have several overlap periods in the table.
What I want is a MYSQL QUERY to get the following result :
new id | user | starting date | ending date | activity code
1 | Andy | 2010-04-01 | 2010-05-01 | 3 => ok, no overlap period
2 | Andy | 1988-11-01 | 1991-03-01 | 3 => ok, no overlap period
3 | Andy | 2005-06-01 | 2008-11-01 | 3 => same user, same activity but ending date coming from row 4 as extended period
4 | Andy | 2005-06-01 | 2010-05-01 | 4 => ok other activity code
5 | Ben | 2005-04-01 | 2011-06-01 | 3 => ok other user, but as overlap period rows 6 and 8 for the same user and activity, I take the widest range
6 | Ben | 2010-03-01 | 2010-05-01 | 4 => ok other activity for second user
In other words, for a same user and activity code, if there is no overlap, I need the starting and ending dates as they are. If there is an overlap for a same user and activity code, I need the lower starting date and the higher ending date coming from the different related rows. I need this for all the users and activity code of the table and in SQL for MYSQL.
I hope it is clear enough and someone can help me because I try different codes from solutions supplied on this site and others without success.
I have somewhat convoluted (strictly MySQL-specific) solution:
SET #user = NULL;
SET #activity = NULL;
SET #interval_id = 0;
SELECT
MIN(inn.`starting date`) AS start,
MAX(inn.`ending date`) AS end,
inn.user,
inn.`activity code`
FROM
(SELECT
IF(user <> #user OR `activity code` <> #activity,
#interval_id := #interval_id + 1, NULL),
IF(user <> #user OR `activity code` <> #activity,
#interval_end := STR_TO_DATE('',''), NULL),
#user := user,
#activity := `activity code`,
#interval_id := IF(`starting date` > #interval_end,
#interval_id + 1,
#interval_id) AS interval_id,
#interval_end := IF(`starting date` < #interval_end,
GREATEST(#interval_end, `ending date`),
`ending date`) AS interval_end,
t.*
FROM Table1 t
ORDER BY t.user, t.`activity code`, t.`starting date`, t.`ending date`) inn
GROUP BY inn.user, inn.`activity code`, inn.interval_id;
The underlying idea was shamelessly borrowed from the 1st answer to this question.
You can use this SQL Fiddle to review the results and try different source data.
Here is a solution - (see http://sqlfiddle.com/#!2/fda3d/15)
SELECT DISTINCT summarized.`user`
, summarized.activity_code
, summarized.true_begin
, summarized.true_end
FROM (
SELECT t1.id,t1.`user`,t1.activity_code
, MIN(LEAST(t1.`starting`, COALESCE(overlap.`starting` ,t1.`starting`))) as true_begin
, MAX(GREATEST(t1.`ending`, COALESCE(overlap.`ending` ,t1.`ending`))) as true_end
FROM t1
LEFT JOIN t1 AS overlap
ON t1.`user` = overlap.`user`
AND t1.activity_code = overlap.activity_code
AND overlap.`ending` >= t1.`starting`
AND overlap.`starting` <= t1.`ending`
AND overlap.id <> t1.id
GROUP BY t1.id, t1.`user`, t1.activity_code) AS summarized;
I am not sure how it will perform with a large data set with many overlaps. You will definitely need an index on the user and activity_code fields - probably the starting and ending date fields also as part of that index.

MySQL : How to convert multiple row to single row? in mysql

I want to convert multiple rows to a single row, based on week. It should look like the following. Can any one help me?
id | Weight | Created |
1 | 120 | 02-04-2012 |
2 | 110 | 09-04-2012 |
1 | 100 | 16-04-2012 |
1 | 130 | 23-04-2012 |
2 | 140 | 30-04-2012 |
3 | 150 | 07-05-2012 |
Result should look like this:
id | Weight_week1 | Weight_week2 | weight_week3 | weight_week4 |
1 | 120 | 100 | 130 | |
2 | 110 | 140 | | |
3 | 150 | | | |
Thanks in advance.
if this a single table then
SELECT GROUP_CONCAT(weight) as Weight,
WEEK(Created) as Week
Group by Week(Created)
This will give you a row each having week id and comma seperated whights
You could do it like this:
SELECT
t.id,
SUM(CASE WHEN WeekNbr=1 THEN Table1.Weight ELSE 0 END) AS Weight_week1,
SUM(CASE WHEN WeekNbr=2 THEN Table1.Weight ELSE 0 END) AS Weight_week2,
SUM(CASE WHEN WeekNbr=3 THEN Table1.Weight ELSE 0 END) AS Weight_week3,
SUM(CASE WHEN WeekNbr=4 THEN Table1.Weight ELSE 0 END) AS Weight_week4
FROM
(
SELECT
(
WEEK(Created, 5) -
WEEK(DATE_SUB(Created, INTERVAL DAYOFMONTH(Created) - 1 DAY), 5) + 1
)as WeekNbr,
Table1.id,
Table1.Weight,
Table1.Created
FROM
Table1
) AS t
GROUP BY
t.id
I don't know if you want a AVG,SUM,MAX or MIN but you can change the aggregate to what you want.
Useful references:
Function for week of the month in mysql
you cannot create fields on the fly like that but you can group them.
use GROUP_CONCAT to deliver results with a delimiter that you can separate on later.
You could also do this:
SELECT id, created, weight, (
SELECT MIN( created ) FROM weights WHERE w.id = weights.id
) AS `min` , round( DATEDIFF( created, (
SELECT MIN( created )
FROM weights
WHERE w.id = weights.id ) ) /7) AS diff
FROM weights AS w
ORDER BY id, diff
This code does not do pivot table. You should add some additional code to convert the data to your needs. You may run into trouble if you use WEEK() because of the years.

Optimize nested query to single query

I have a (MySQL) table containing dates of the last scan of hosts combined with a report ID:
+--------------+---------------------+--------+
| host | last_scan | report |
+--------------+---------------------+--------+
| 112.86.115.0 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.1 | 2012-01-03 01:39:30 | 4 |
| 112.86.115.2 | 2012-01-03 02:03:40 | 4 |
| 112.86.115.2 | 2012-01-03 04:33:47 | 5 |
| 112.86.115.1 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.6 | 2012-01-03 04:20:23 | 5 |
| 112.86.115.2 | 2012-01-05 04:29:46 | 8 |
| 112.86.115.6 | 2012-01-05 04:17:35 | 8 |
| 112.86.115.5 | 2012-01-05 04:29:48 | 8 |
| 112.86.115.4 | 2012-01-05 04:17:37 | 8 |
+--------------+---------------------+--------+
I want to select a list of all hosts with the date of the last scan and the corresponding report id. I have built the following nested query, but I am sure it can be done in a single query:
SELECT rh.host, rh.report, rh.last_scan
FROM report_hosts rh
WHERE rh.report = (
SELECT rh2.report
FROM report_hosts rh2
WHERE rh2.host = rh.host
ORDER BY rh2.last_scan DESC
LIMIT 1
)
GROUP BY rh.host
Is it possible to do this with a single, non-nested query?
No, but you can do a JOIN in your query
SELECT x.*
FROM report_hosts x
INNER JOIN (
SELECT host,MAX(last_scan) AS last_scan FROM report_hosts GROUP BY host
) y ON x.host=y.host AND x.last_scan=y.last_scan
Your query is doing a filesort, which is very inefficient. My solutions doesn't. It's very advisable to create an index on this table
ALTER TABLE `report_hosts` ADD INDEX ( `host` , `last_scan` ) ;
Else your query will do a filesort twice.
If you want to select from the report_hosts table only once then you could use a sort of 'RANK OVER PARTITION' method (available in Oracle but not, sadly, in MySQL). Something like this should work:
select h.host,h.last_scan as most_recent_scan,h.report
from
(
select rh.*,
case when #curHost != rh.host then #rank := 1 else #rank := #rank+1 end as rank,
case when #curHost != rh.host then #curHost := rh.host end
from report_hosts rh
cross join (select #rank := null,#curHost = null) t
order by host asc,last_scan desc
) h
where h.rank = 1;
Granted it is still nested but it does avoid the 'double select' problem. Not sure if it will be more efficient or not - kinda depends what indexes you have and volume of data.