My SQL Finding a span of dates accross rows - mysql

I am looking for some help with even knowing where to start. Essentially we have a table for clients that hold employment start dates and end dates. For annual reports we have to calculate "continuous employment" which is defined as earliest start date to last end date as long as there is not more than 21 days between one end date and the next start date.
here is an example
employee | Start Date | End Date
1 | 2012-10-1 | 2012-11-05
1 | 2012-11-08 | 2013-1-25
2 | 2012-10-1 | 2012-11-05
2 | 2012-11-30 | 2013-1-02
in the above, i would like to see employee 1 as continuously employed from 2012-10-1 to 2013-1-25
but employee 2 would have 2 separate employment lines showing continuous employment from 2012-10-1 to 2012-11-05 and a different from 012-11-30 to 2013-1-02
Thanks for the help!

The theory is similar to #mellamokb's answer, but somewhat more concise:
SELECT employee, MIN(start) start, end
FROM (
SELECT #end:=IF(employee<=>#emp AND #stt<=end+INTERVAL 21 DAY,#end,end) end,
#stt:=start start,
#emp:=employee AS employee
FROM my_table, (SELECT #emp:=NULL, #stt:=0, #end:=0) init
ORDER BY employee, start DESC
) t
GROUP BY employee, end
See it on sqlfiddle.

One way to find "continuous groups" among a set of records is to use variables to track the difference between each line and develop groupings that combine continuous ranges together. In the example below, I use three variables to track enough information for generating the groups:
#curEmployee - tracks the current employee from the previous record, and is compared with the employee on the current record to know when we've switched to a different employee, which automatically becomes another grouping
#curEndDate - tracks the last end date from the previous record, so it can be compared with the start date of the current record to see if the current record belongs in the same "group" as the previous record - that is to say, it is part of continuous employment with the previous record
#curGroup - this is the key variable which segregates the rows into separate "groups" that represent continuous employment. The logic is that a row should be considered as continuous with the previous row if and only if the following two conditions are true: the two rows have the same employee number, and the end date of the previous row is less than 21 days from the current row.
NOTE: You may want to validate the edge conditions, i.e., whether exactly 20/21/22 days apart will be considered continuous employment or not, and tweak the logic below.
Here is the sample query that calculates those groups. A couple things to take note of: the order of variable assignment matters, because they are assigned from top to bottom in the select list. We need to assign #curGroup first, so that it still has the values of #curEmployee and #curEndDate from the previous record to draw on. Secondly, the order by clause is very important to ensure that when we are comparing the previous and current record, they are the two records that are the closest to each other. If we looked at the records in a random order, they would likely end up all as separate groups.
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
Sample Result (DEMO) - notice how CURGROUP stays at 1 for the first two lines, because they are within 21 days of each other and represent continuous employment, while the last two lines get identified as separate group numbers:
| EMPLOYEE | START DATE | END DATE | CURGROUP | CUREMPLOYEE | CURENDDATE |
-------------------------------------------------------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 1 | 1 | 2012-11-05 00:00:00 |
| 1 | November, 08 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 | 1 | 1 | 2013-01-25 00:00:00 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 2 | 2 | 2012-11-05 00:00:00 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 | 3 | 2 | 2013-01-02 00:00:00 |
Now that we've established groups of records that are part of continuous employment, we merely need to group by those group numbers and find the minimum and maximum date range for the output:
select
employee,
min(`start date`) as `start date`,
max(`end date`) as `end date`
from (
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
) as T
group by curGroup
Sample Result (DEMO):
| EMPLOYEE | START DATE | END DATE |
--------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 |

Related

Getting "filler" empty values for a GROUP BY

Assuming I want to get 'weekly' results from a date range, and the date range contains four separate weeks, but my database only has a record for one week, how can I get filler values for the remaining three weeks?
For example, given this date range: 6/2018 - 7/2018
I run this query:
SELECT
DATE_FORMAT(period, '%m %d') || ' - ' || DATE_FORMAT(period, '%m %d') AS period,
SUM(clicks) AS clicks
FROM tablename
WHERE period >= ? AND period <= ?
GROUP BY YEAR(period), WEEK(period)
With these table rows in the database:
| period | clicks |
| 07/01/2018T:00:00:00Z | 1000 |
And I get these query results:
| period | clicks |
| Jul 1 - Jul 5 | 1000 |
But I want to get these query results to cover default values for the empty weeks so I can use them to populate a d3 chart:
| period | clicks |
| Jul 1 - Jul 5 | 1000 |
| Jun 25 - Jun 23 | 0 |
| Jun 18 - Jun 24 | 0 |
| Jun 12 - Jun 18 | 0 |
Any ideas? This sort of "filler" phenomena should be generic enough to work with other intervals, such as daily, monthly, yearly, hourly.
rewrite your query as
ifnull( SUM(clicks),0) AS clicks
You can use this query to generate the weeks and then left join with your current query to get what you want.
DECLARE #StartingFromDate DATETIME = '2018-12-01';
DECLARE #EndingAtDate DATETIME = '2018-12-31';
WITH CTE_DateRange (DateRange)
AS (SELECT DATEADD(WEEK, DATEDIFF(WEEK, 0, #EndingAtDate) - DATEDIFF(WEEK, #StartingFromDate, #EndingAtDate), 0)
UNION ALL
SELECT DATEADD(WEEK, 1, DateRange)
FROM CTE_DateRange
WHERE DATEADD(WEEK, 1, DateRange) < #EndingAtDate)
SELECT CTE_DateRange.DateRange
FROM CTE_DateRange
WHERE CTE_DateRange.DateRange BETWEEN #StartingFromDate AND #EndingAtDate

Finding MAX and MIN values for each same start and end week

There is a query I am trying to implement in which I am not having much success with in trying to find the MAX and MIN for each week.
I have 2 Tables:
SYMBOL_DATA (contains open,high,low,close, and volume)
WEEKLY_LOOKUP (contains a list of weeks(no weekends) with a WEEK_START and WEEK_END)
**SYMBOL_DATA Example:**
OPEN, HIGH, LOW, CLOSE, VOLUME
23.22 26.99 21.45 22.49 34324995
WEEKLY_LOOKUP (contains a list of weeks(no weekends) with a WEEK_START and WEEK_END)
**WEEKLY_LOOKUP Example:**
WEEK_START WEEK_END
2016-01-25 2016-01-29
2016-01-18 2016-01-22
2016-01-11 2016-01-15
2016-01-04 2016-01-08
I am trying to find for each WEEK_START and WEEK_END the high and low for that particular week.
For instance, if the WEEK is WEEK_START=2016-01-11 and WEEK_END=2016-01-15, I would have
5 entries for that particular symbol listed:
DATE HIGH LOW
2016-01-15 96.38 93.54
2016-01-14 98.87 92.45
2016-01-13 100.50 95.21
2016-01-12 99.96 97.55
2016-01-11 98.60 95.39
2016-01-08 100.50 97.03
2016-01-07 101.43 97.30
2016-01-06 103.77 100.90
2016-01-05 103.71 101.67
2016-01-04 102.24 99.76
For each week_ending (2016-01-15) the HIGH is 100.50 on 2016-01-13 and the LOW is 92.45 on 2016-01-14
I attempted to write a query that gives me a list of highs and lows, but when I tried adding a MAX(HIGH), I had only 1 row returned back.
I tried a few more things in which I couldn't get the query to work (some sort of infinite run type). For now, I just have this that gives me a list of highs and lows for every day instead of the roll-up for each week which I am not sure how to do.
select date, t1.high, t1.low
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='ABCDE' and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
and t1.date<=CURDATE()
LIMIT 30;
How can I get for each week (Start and End) the High_Date, MAX(High), and Low_Date, MIN(LOW) found each week? I probably don't need a
full history for a symbol, so a LIMIT of like 30 or (30 week periods) would be sufficient so I can see trending.
If I wanted to know for example each week MAX(High) and MIN(LOW) start week ending 2016-01-15 the result would show
**Result:**
WEEK_ENDING 2016-01-15 100.50 2016-01-13 92.45 2016-01-14
WEEK_ENDING 2016-01-08 103.77 2016-01-06 97.03 2016-01-08
etc
etc
Thanks to all of you with the expertise and knowledge. I greatly appreciate your help very much.
Edit
Once the Week Ending list is returned containing the MAX(HIGH) and MIN(LOW) for each week, is it possible then on how to find the MAX(HIGH) and MIN(LOW) from that result set so it return then only 1 entry from the 30 week periods?
Thank you!
To Piotr
select part1.end_date,part1.min_l,part1.max_h, s1.date, part1.min_l,s2.date from
(
select t2.start_date, t2.end_date, max(t1.high) max_h, min(t1.low) min_l
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='FB'
and t1.date<='2016-01-22'
and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
group by t2.start_date, t2.end_date order by t1.date DESC LIMIT 1;
) part1, symbol_data s1, symbol_data s2
where part1.max_h = s1.high and part1.min_l = s2.low;
You will notice that the MAX and MIN for each week is staying roughly the same and not changing as it should be different for week to week for both the High and Low.
SQL Fiddle
I have abbreviated some of your names in my example.
Getting the high and low for each week is pretty simple; you just have to use GROUP BY:
SELECT s1.symbol, w.week_end, MAX(s1.high) AS weekly_high, MIN(s1.LOW) as weekly_low
FROM weeks AS w
INNER JOIN symdata AS s1 ON s1.zdate BETWEEN w.week_start AND w.week_end
GROUP BY s1.symbol, w.week_end
Results:
| symbol | week_end | weekly_high | weekly_low |
|--------|---------------------------|-------------|------------|
| ABCD | January, 08 2016 00:00:00 | 103.77 | 97.03 |
| ABCD | January, 15 2016 00:00:00 | 100.5 | 92.45 |
Unfortunately, getting the dates of the high and low requires that you re-join to the symbol_data table, based on the symbol, week and values. And even that doesn't do the job; you have to account for the possibility that there might be two days where the same high (or low) was achieved, and decide which one to choose. I arbitrarily chose the first occurrence in the week of the high and low. So to get that second level of choice, you need another GROUP BY. The whole thing winds up looking like this:
SELECT wl.symbol, wl.week_end, wl.weekly_high, MIN(hd.zdate) as high_date, wl.weekly_low, MIN(ld.zdate) as low_date
FROM (
SELECT s1.symbol, w.week_start, w.week_end, MAX(s1.high) AS weekly_high, MIN(s1.low) as weekly_low
FROM weeks AS w
INNER JOIN symdata AS s1 ON s1.zdate BETWEEN w.week_start AND w.week_end
GROUP BY s1.symbol, w.week_end) AS wl
INNER JOIN symdata AS hd
ON hd.zdate BETWEEN wl.week_start AND wl.week_end
AND hd.symbol = wl.symbol
AND hd.high = wl.weekly_high
INNER JOIN symdata AS ld
ON ld.zdate BETWEEN wl.week_start AND wl.week_end
AND ld.symbol = wl.symbol
AND ld.low = wl.weekly_low
GROUP BY wl.symbol, wl.week_start, wl.week_end, wl.weekly_high, wl.weekly_low
Results:
| symbol | week_end | weekly_high | high_date | weekly_low | low_date |
|--------|---------------------------|-------------|---------------------------|------------|---------------------------|
| ABCD | January, 08 2016 00:00:00 | 103.77 | January, 06 2016 00:00:00 | 97.03 | January, 08 2016 00:00:00 |
| ABCD | January, 15 2016 00:00:00 | 100.5 | January, 13 2016 00:00:00 | 92.45 | January, 14 2016 00:00:00 |
To get the global highs and lows, just remove the weekly table from the original query:
SELECT wl.symbol, wl.high, MIN(hd.zdate) as high_date, wl.low, MIN(ld.zdate) as low_date
FROM (
SELECT s1.symbol, MAX(s1.high) AS high, MIN(s1.low) as low
FROM symdata AS s1
GROUP BY s1.symbol) AS wl
INNER JOIN symdata AS hd
ON hd.symbol = wl.symbol
AND hd.high = wl.high
INNER JOIN symdata AS ld
ON ld.symbol = wl.symbol
AND ld.low = wl.low
GROUP BY wl.symbol, wl.high, wl.low
Results:
| symbol | high | high_date | low | low_date |
|--------|--------|---------------------------|-------|---------------------------|
| ABCD | 103.77 | January, 06 2016 00:00:00 | 92.45 | January, 14 2016 00:00:00 |
The week table seems entirely redundant...
SELECT symbol
, WEEK(zdate)
, MIN(low) min
, MAX(high) max_high
FROM symdata
GROUP
BY symbol, WEEK(zdate);
This is a simplified example. In reality, you might use DATE_FORMAT or something like that instead.
http://sqlfiddle.com/#!9/c247f/3
Check if following query produces desired result:
select part1.end_date,part1.min_l,part1.max_h, s1.date, part1.min_l,s2.date from
(
select t2.start_date, t2.end_date, max(t1.high) max_h, min(t1.low) min_l
from SYMBOL_DATA t1, WEEKLY_LOOKUP t2
where symbol='ABCDE'
and (t1.date>=t2.START_DATE and t1.date<=t2.END_DATE)
group by t2.start_date, t2.end_date
) part1, symbol_data s1, symbol_data s2
where part1.max_h = s1.high and part1.min_l = s2.low
and (s1.date >= part1.start_date and part1.end_date)
and (s2.date >= part1.start_date and part1.end_date)

select range of dates in mysql [duplicate]

I have a data set of projects. The projects change status from beginning to end, and the date of the status change is logged in a table (table is named "events" - not my choice). Would look like this (simplified):
Date Status
2015-06-01 Start
2015-06-03 Stage 2
2015-06-07 Stage 3
In any given date range (to be determined dynamically) I want to be able to see which projects are at which status. However, using BETWEEN or other query against the data will only pull those projects whose status changed during that period, not the ones that are still at a given status.
I've currently created a very clunky solution in Excel which copies rows into new rows between status change dates, like so:
Date Status
2015-06-01 Project start
2015-06-02 Project start (copied)
2015-06-03 Stage 2
2015-06-04 Stage 2 (copied)
2015-06-05 Stage 2 (copied)
2015-06-06 Stage 2 (copied)
2015-06-07 Stage 3
This solution allows me to query the status for the project on, say, 2015-06-06 and see that it is still at Stage 2.
Is there some way I can use mySql to pull this same data, but as output to a query? I've heard some suggest to use a Calendar table, but I'm not sure how that would work. I've also seen someone recommend a Cross Join, but again, I couldn't understand from the description how that would work.
Thanks in advance for your help!
plan
create calendar table by cross joining digits and date_add over calendar period..
join your data to calendar source with date <= calendar date
take max of date <= calendar date
join back to original data source to get status
setup
drop table if exists calendar_t;
CREATE TABLE calendar_t (
id integer primary key auto_increment not null,
`date` date not null,
day varchar(9) not null,
month varchar(13) not null,
`year` integer not null
);
drop view if exists digits_v;
create view digits_v
as
select 0 as n
union all
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
;
insert into calendar_t
( `date`, day, month, `year` )
select
date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day) as `date`,
dayname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as day,
monthname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as month,
year(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as `year`
from
digits_v a2
cross join digits_v a1
cross join digits_v a0
order by date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)
;
drop table if exists example;
create table example
(
`date` date not null,
status varchar(23) not null
);
insert into example
( `date`, status )
values
( '2015-06-01', 'Start' ),
( '2015-06-03', 'Stage 2' ),
( '2015-06-07', 'Stage 3' )
;
query
select cal_date, mdate, ex2.status
from
(
select cal_date, max(ex_date) as mdate
from
(
select cal.`date` as cal_date, ex.`date` as ex_date
from calendar_t cal
inner join example ex
on ex.`date` <= cal.`date`
) maxs
group by cal_date
) m2
inner join example ex2
on m2.mdate = ex2.`date`
-- pick a reasonable end date for filtering..
where cal_date <= date('2015-06-15')
order by cal_date
;
output
+------------------------+------------------------+---------+
| cal_date | mdate | status |
+------------------------+------------------------+---------+
| June, 01 2015 00:00:00 | June, 01 2015 00:00:00 | Start |
| June, 02 2015 00:00:00 | June, 01 2015 00:00:00 | Start |
| June, 03 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 04 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 05 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 06 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 07 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 08 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 09 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 10 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 11 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 12 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 13 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 14 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 15 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
+------------------------+------------------------+---------+
sqlfiddle
reference
SO mysql sequence generation
sql cross join
you dont need to create a table with all the dates. you could alter your table to give the start and end dates for each status and use a between statement.
or using your existing data.
using #datequery as the date you want to find out the status for.
Select top 1 Status from Events
where Date <= #datequery and Date
order by Date desc
returns the most recent status change before the date you are querying.
#datequery = 2015-06-06
Status
Stage 2

Group and sum data based on a day of the month

I have a reoccurring payment day of 14th of each month and want to group a subset of data by month/year and sum the sent column. For example for the given data:-
Table `Counter`
Id Date Sent
1 10/04/2013 2
2 11/04/2013 4
3 15/04/2013 7
4 10/05/2013 3
5 14/05/2013 5
6 15/05/2013 3
7 16/05/2013 4
The output I want is something like:
From Count
14/03/2013 6
14/04/2013 10
14/05/2013 12
I am not worried how the from column is formatted or if its easier to split into month/year as I can recreated a date from multiple columns in the GUI. So the output could easily just be:
FromMth FromYr Count
03 2013 6
04 2013 10
05 2013 12
or even
toMth toYr Count
04 2013 6
05 2013 10
06 2013 12
If the payment date is for example the 31st then the date comparison would need to be the last date of each month. I am also not worried about missing months in the result-set.
I will also turn this into a Stored procedure so that I can push in the the payment date and other filtered criteria. It is also worth mentioning that we can go across years.
Try this query
select
if(day(STR_TO_DATE(date, "%Y-%d-%m")) >= 14,
concat('14/', month(STR_TO_DATE(date, "%Y-%d-%m")), '/', year(STR_TO_DATE(date, "%Y-%d-%m"))) ,
concat('14/', if ((month(STR_TO_DATE(date, "%Y-%d-%m")) - 1) = 0,
concat('12/', year(STR_TO_DATE(date, "%Y-%d-%m")) - 1),
concat(month(STR_TO_DATE(date, "%Y-%d-%m"))-1,'/',year(STR_TO_DATE(date, "%Y-%d-%m")))
)
)
) as fromDate,
sum(sent)
from tbl
group by fromDate
FIDDLE
| FROMDATE | SUM(SENT) |
--------------------------
| 14/10/2013 | 3 |
| 14/12/2012 | 1 |
| 14/3/2013 | 6 |
| 14/4/2013 | 10 |
| 14/5/2013 | 12 |
| 14/9/2013 | 1 |
Pay date could be grouped by months and year separatedly
select Sum(Sent) as "Count",
Extract(Month from Date - 13) as FromMth,
Extract(Year from Date - 13) as FromYr
from Counter
group by Extract(Year from Date - 13),
Extract(Month from Date - 13)
Be careful, since field's name "Date" coninsides with the keyword "date" in ANSISQL
I think the simplest way to do what you want is to just subtract 14 days rom the date and group by that month:
select date_format(date - 14, '%Y-%m'), sum(sent)
from counter
group by date_format(date - 14, '%Y-%m')

Select highest 3 scores in each day for every user

I have a MYSQL table like this:
id | userid | score | datestamp |
-----------------------------------------------------
1 | 1 | 5 | 2012-12-06 03:55:16
2 | 2 | 0,5 | 2012-12-06 04:25:21
3 | 1 | 7 | 2012-12-06 04:35:33
4 | 3 | 12 | 2012-12-06 04:55:45
5 | 2 | 22 | 2012-12-06 05:25:11
6 | 1 | 16,5 | 2012-12-06 05:55:21
7 | 1 | 19 | 2012-12-06 13:55:16
8 | 2 | 8,5 | 2012-12-07 06:27:16
9 | 2 | 7,5 | 2012-12-07 08:33:16
10 | 1 | 10 | 2012-12-07 09:25:19
11 | 1 | 6,5 | 2012-12-07 13:33:16
12 | 3 | 6 | 2012-12-07 15:45:44
13 | 2 | 4 | 2012-12-07 16:05:16
14 | 2 | 34 | 2012-12-07 18:33:55
15 | 2 | 22 | 2012-12-07 18:42:11
I would like to display user scores like this:
if a user on a certain day has more than 3 scores it would get only highest 3, repeat that for every day for this user and then add all days together. I want to display this sum for every user.
EDIT:
So in the example above for user 1 on 06.12. I would add top 3 scores together and ignore 4th score, then add to that number top 3 from the next day and so on. I need that number for every user.
EDIT 2:
Expected output is:
userid | score
--------------------
1 | 59 //19 + 16.5 + 7 (06.12.) + 10 + 6.5 (07.12.)
2 | 87 //22 + 0.5 (06.12.) + 34 + 22 + 8.5 (07.12.)
3 | 18 //12 (06.12.) + 6 (07.12.)
I hope this is more clear :)
I would really appreciate the help because I am stuck.
Please take a look at the following code, if your answer to my comment is yes :) Since your data all in 2012, and month of november, I took day.
SQLFIDDLE sample
Query:
select y.id, y.userid, y.score, y.datestamp
from (select id, userid, score, datestamp
from scores
group by day(datestamp)) as y
where (select count(*)
from (select id, userid, score, datestamp
from scores group by day(datestamp)) as x
where y.score >= x.score
and y.userid = x.userid
) =1 -- Top 3rd, 2nd, 1st
order by y.score desc
;
Results:
ID USERID SCORE DATESTAMP
8 2 8.5 December, 07 2012 00:00:00+0000
20 3 6 December, 08 2012 00:00:00+0000
1 1 5 December, 06 2012 00:00:00+0000
Based on your latter updates to question.
If you need some per user by year/month/day and then find highest, you may simply add aggregation function like sum to the above query. I am reapeating myself, since your sample data is for just one year, there's no point group by year or month. That's why I took day.
select y.id, y.userid, y.score, y.datestamp
from (select id, userid, sum(score) as score,
datestamp
from scores
group by userid, day(datestamp)) as y
where (select count(*)
from (select id, userid, sum(score) as score
, datestamp
from scores
group by userid, day(datestamp)) as x
where y.score >= x.score
and y.userid = x.userid
) =1 -- Top 3rd, 2nd, 1st
order by y.score desc
;
Results based on sum:
ID USERID SCORE DATESTAMP
1 1 47.5 December, 06 2012 00:00:00+0000
8 2 16 December, 07 2012 00:00:00+0000
20 3 6 December, 08 2012 00:00:00+0000
UPDATED WITH NEW SOURCE DATA SAMPLE
Simon, please take a look at my own sample. As your data was changing, I used mine.
Here is the reference. I have used pure ansi style without any over partition or dense_rank.
Also note the data I used are getting top 2 not top 3 scores. You can change is accordingly.
Guess what, the answer is 10 times simpler than the first impression your first data gave....
SQLFIDDLE
Query to 1:
-- for top 2 sum by user by each day
SELECT userid, sum(Score), datestamp
FROM scores t1
where 2 >=
(SELECT count(*)
from scores t2
where t1.score <= t2.score
and t1.userid = t2.userid
and day(t1.datestamp) = day(t2.datestamp)
order by t2.score desc)
group by userid, datestamp
;
Results for query 1:
USERID SUM(SCORE) DATESTAMP
1 70 December, 06 2012 00:00:00+0000
1 30 December, 07 2012 00:00:00+0000
2 22 December, 06 2012 00:00:00+0000
2 25 December, 07 2012 00:00:00+0000
3 30 December, 06 2012 00:00:00+0000
3 30 December, 07 2012 00:00:00+0000
Final Query:
-- for all two days top 2 sum by user
SELECT userid, sum(Score)
FROM scores t1
where 2 >=
(SELECT count(*)
from scores t2
where t1.score <= t2.score
and t1.userid = t2.userid
and day(t1.datestamp) = day(t2.datestamp)
order by t2.score desc)
group by userid
;
Final Results:
USERID SUM(SCORE)
1 100
2 47
3 60
Here goes a snapshot of direct calculations of data I used.
SELECT
*
FROM
table1
LEFT JOIN
(SELECT * FROM table1 ORDER BY score LIMIT 3) as lr on DATE(lr.datestamp) = DATE(table1.datastamp)
GROUP BY
datestamp