I have a data set of projects. The projects change status from beginning to end, and the date of the status change is logged in a table (table is named "events" - not my choice). Would look like this (simplified):
Date Status
2015-06-01 Start
2015-06-03 Stage 2
2015-06-07 Stage 3
In any given date range (to be determined dynamically) I want to be able to see which projects are at which status. However, using BETWEEN or other query against the data will only pull those projects whose status changed during that period, not the ones that are still at a given status.
I've currently created a very clunky solution in Excel which copies rows into new rows between status change dates, like so:
Date Status
2015-06-01 Project start
2015-06-02 Project start (copied)
2015-06-03 Stage 2
2015-06-04 Stage 2 (copied)
2015-06-05 Stage 2 (copied)
2015-06-06 Stage 2 (copied)
2015-06-07 Stage 3
This solution allows me to query the status for the project on, say, 2015-06-06 and see that it is still at Stage 2.
Is there some way I can use mySql to pull this same data, but as output to a query? I've heard some suggest to use a Calendar table, but I'm not sure how that would work. I've also seen someone recommend a Cross Join, but again, I couldn't understand from the description how that would work.
Thanks in advance for your help!
plan
create calendar table by cross joining digits and date_add over calendar period..
join your data to calendar source with date <= calendar date
take max of date <= calendar date
join back to original data source to get status
setup
drop table if exists calendar_t;
CREATE TABLE calendar_t (
id integer primary key auto_increment not null,
`date` date not null,
day varchar(9) not null,
month varchar(13) not null,
`year` integer not null
);
drop view if exists digits_v;
create view digits_v
as
select 0 as n
union all
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
;
insert into calendar_t
( `date`, day, month, `year` )
select
date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day) as `date`,
dayname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as day,
monthname(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as month,
year(date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)) as `year`
from
digits_v a2
cross join digits_v a1
cross join digits_v a0
order by date_add('2015-01-01', interval 100*a2.n + 10*a1.n + a0.n day)
;
drop table if exists example;
create table example
(
`date` date not null,
status varchar(23) not null
);
insert into example
( `date`, status )
values
( '2015-06-01', 'Start' ),
( '2015-06-03', 'Stage 2' ),
( '2015-06-07', 'Stage 3' )
;
query
select cal_date, mdate, ex2.status
from
(
select cal_date, max(ex_date) as mdate
from
(
select cal.`date` as cal_date, ex.`date` as ex_date
from calendar_t cal
inner join example ex
on ex.`date` <= cal.`date`
) maxs
group by cal_date
) m2
inner join example ex2
on m2.mdate = ex2.`date`
-- pick a reasonable end date for filtering..
where cal_date <= date('2015-06-15')
order by cal_date
;
output
+------------------------+------------------------+---------+
| cal_date | mdate | status |
+------------------------+------------------------+---------+
| June, 01 2015 00:00:00 | June, 01 2015 00:00:00 | Start |
| June, 02 2015 00:00:00 | June, 01 2015 00:00:00 | Start |
| June, 03 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 04 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 05 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 06 2015 00:00:00 | June, 03 2015 00:00:00 | Stage 2 |
| June, 07 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 08 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 09 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 10 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 11 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 12 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 13 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 14 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
| June, 15 2015 00:00:00 | June, 07 2015 00:00:00 | Stage 3 |
+------------------------+------------------------+---------+
sqlfiddle
reference
SO mysql sequence generation
sql cross join
you dont need to create a table with all the dates. you could alter your table to give the start and end dates for each status and use a between statement.
or using your existing data.
using #datequery as the date you want to find out the status for.
Select top 1 Status from Events
where Date <= #datequery and Date
order by Date desc
returns the most recent status change before the date you are querying.
#datequery = 2015-06-06
Status
Stage 2
Related
I have an small application which was build with CodeIgniter 3 and need to perform a report which will be converted to Chart.js. The report should be in yearly basis but at given specific date every month. The requirement are all data count must be from 4th to 3rd monthly. Like this:
For example January Report would be from 4th January to 3rd February, 4th February to 3rd March,... and so on.
I have created a MySQL query but I'm stuck on how to get the date too date. My Query are as follows:
SELECT DATE_FORMAT(odd_date_created, '%Y') as 'year',
DATE_FORMAT(odd_date_created, '%m') as 'month',
COUNT(odd_id) as 'total', status
FROM odd_data
WHERE status = $id and
GROUP BY DATE_FORMAT(odd_date_created, '%Y%m'), status
I'm new to MySQl. Could somebody help me on this. I'm stuck where should I put the date to date query.
Firstly I want to caution you not to use "between" with the following when you come to join your data, use this method instead data.date >= r.period_start_dt and data.date < r.period_end_dt
Secondly I am assuming your data does have dates or timestamps and that will fall between the calculated ranges that follow:
set #year :=2017;
select
*
from (
select
start_dt + INTERVAL m.n MONTH period_start_dt
, start_dt + INTERVAL m.n + 1 MONTH period_end_dt
from (
select str_to_date(concat(#year,'-01-04'),'%Y-%m-%d') start_dt ) seed
cross join (select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
## LEFT JOIN YOUR DATA
## ON data.date >= r.period_start_dt and data.date < r.period_end_dt
Example ranges: (produce you own at this demo: http://rextester.com/CHTKSA95303 )
nb dd.mm.yyyy (.de format)
+----+---------------------+---------------------+
| | period_start_dt | period_end_dt |
+----+---------------------+---------------------+
| 1 | 04.01.2017 00:00:00 | 04.02.2017 00:00:00 |
| 2 | 04.02.2017 00:00:00 | 04.03.2017 00:00:00 |
| 3 | 04.03.2017 00:00:00 | 04.04.2017 00:00:00 |
| 4 | 04.04.2017 00:00:00 | 04.05.2017 00:00:00 |
| 5 | 04.05.2017 00:00:00 | 04.06.2017 00:00:00 |
| 6 | 04.06.2017 00:00:00 | 04.07.2017 00:00:00 |
| 7 | 04.07.2017 00:00:00 | 04.08.2017 00:00:00 |
| 8 | 04.08.2017 00:00:00 | 04.09.2017 00:00:00 |
| 9 | 04.09.2017 00:00:00 | 04.10.2017 00:00:00 |
| 10 | 04.10.2017 00:00:00 | 04.11.2017 00:00:00 |
| 11 | 04.11.2017 00:00:00 | 04.12.2017 00:00:00 |
| 12 | 04.12.2017 00:00:00 | 04.01.2018 00:00:00 |
+----+---------------------+---------------------+
Given the specification, I think I would tempted to cheat it... subtract 3 days from the date. Doing that, Jan 4 backs up to Jan 1, Feb 3 backs up to Jan 31... so those all end up as January.
SELECT DATE_FORMAT(odd_date_created + INTERVAL -3 DAY, '%Y') AS `year`
, DATE_FORMAT(odd_date_created + INTERVAL -3 DAY, '%m') AS `month`
, ...
FROM ...
GROUP
BY DATE_FORMAT(odd_date_created + INTERVAL -3 DAY, '%Y')
, DATE_FORMAT(odd_date_created + INTERVAL -3 DAY, '%m')
This falls apart if there's oddball ranges... if it's not always the 4th and 3rd.
I am looking for some help with even knowing where to start. Essentially we have a table for clients that hold employment start dates and end dates. For annual reports we have to calculate "continuous employment" which is defined as earliest start date to last end date as long as there is not more than 21 days between one end date and the next start date.
here is an example
employee | Start Date | End Date
1 | 2012-10-1 | 2012-11-05
1 | 2012-11-08 | 2013-1-25
2 | 2012-10-1 | 2012-11-05
2 | 2012-11-30 | 2013-1-02
in the above, i would like to see employee 1 as continuously employed from 2012-10-1 to 2013-1-25
but employee 2 would have 2 separate employment lines showing continuous employment from 2012-10-1 to 2012-11-05 and a different from 012-11-30 to 2013-1-02
Thanks for the help!
The theory is similar to #mellamokb's answer, but somewhat more concise:
SELECT employee, MIN(start) start, end
FROM (
SELECT #end:=IF(employee<=>#emp AND #stt<=end+INTERVAL 21 DAY,#end,end) end,
#stt:=start start,
#emp:=employee AS employee
FROM my_table, (SELECT #emp:=NULL, #stt:=0, #end:=0) init
ORDER BY employee, start DESC
) t
GROUP BY employee, end
See it on sqlfiddle.
One way to find "continuous groups" among a set of records is to use variables to track the difference between each line and develop groupings that combine continuous ranges together. In the example below, I use three variables to track enough information for generating the groups:
#curEmployee - tracks the current employee from the previous record, and is compared with the employee on the current record to know when we've switched to a different employee, which automatically becomes another grouping
#curEndDate - tracks the last end date from the previous record, so it can be compared with the start date of the current record to see if the current record belongs in the same "group" as the previous record - that is to say, it is part of continuous employment with the previous record
#curGroup - this is the key variable which segregates the rows into separate "groups" that represent continuous employment. The logic is that a row should be considered as continuous with the previous row if and only if the following two conditions are true: the two rows have the same employee number, and the end date of the previous row is less than 21 days from the current row.
NOTE: You may want to validate the edge conditions, i.e., whether exactly 20/21/22 days apart will be considered continuous employment or not, and tweak the logic below.
Here is the sample query that calculates those groups. A couple things to take note of: the order of variable assignment matters, because they are assigned from top to bottom in the select list. We need to assign #curGroup first, so that it still has the values of #curEmployee and #curEndDate from the previous record to draw on. Secondly, the order by clause is very important to ensure that when we are comparing the previous and current record, they are the two records that are the closest to each other. If we looked at the records in a random order, they would likely end up all as separate groups.
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
Sample Result (DEMO) - notice how CURGROUP stays at 1 for the first two lines, because they are within 21 days of each other and represent continuous employment, while the last two lines get identified as separate group numbers:
| EMPLOYEE | START DATE | END DATE | CURGROUP | CUREMPLOYEE | CURENDDATE |
-------------------------------------------------------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 1 | 1 | 2012-11-05 00:00:00 |
| 1 | November, 08 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 | 1 | 1 | 2013-01-25 00:00:00 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 | 2 | 2 | 2012-11-05 00:00:00 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 | 3 | 2 | 2013-01-02 00:00:00 |
Now that we've established groups of records that are part of continuous employment, we merely need to group by those group numbers and find the minimum and maximum date range for the output:
select
employee,
min(`start date`) as `start date`,
max(`end date`) as `end date`
from (
select
e.employee, e.`start date`, e.`end date`
,#curGroup :=
case when employee = #curEmployee
and #curEndDate + INTERVAL 21 DAY >= e.`start date`
then #curGroup
else #curGroup + 1
end as curGroup
,#curEmployee := employee as curEmployee
,#curEndDate := e.`end date` as curEndDate
from
employment e
JOIN (SELECT #curEmployee := 0, #curEndDate := NULL, #curGroup := 0) r
order by e.employee, e.`start date`
) as T
group by curGroup
Sample Result (DEMO):
| EMPLOYEE | START DATE | END DATE |
--------------------------------------------------------------------------------
| 1 | October, 01 2012 00:00:00+0000 | January, 25 2013 00:00:00+0000 |
| 2 | October, 01 2012 00:00:00+0000 | November, 05 2012 00:00:00+0000 |
| 2 | November, 30 2012 00:00:00+0000 | January, 02 2013 00:00:00+0000 |
I want to calculate total hrs spend by an employee between 09:00am and 18:00pm.
My database look like this.
How can I do this??
AttendanceId EmpId CheckTime CheckType
-------------------------------------------------------------------------
3 5 2013-01-03 09:00:15.000 1 (Login)
4 5 2013-01-03 11:00:00.000 2 (Logout)
5 5 2013-01-03 11:30:00.000 1
6 5 2013-01-03 13:00:00.000 2
7 5 2013-01-03 13:30:00.000 1
8 5 2013-01-03 16:00:00.000 2
9 5 2013-01-03 16:30:00.000 1
10 5 2013-01-03 18:00:00.000 2
Since your Login/Logout values are in the same column, this might be easier to PIVOT the login/logout times first then get the datediff to determine the total amount of time an employee is present.
The PIVOT portion of the query is this:
select empid, [1], [2]
from
(
select empid, checktime, checktype,
row_number() over(partition by empid, checktype order by checktime) rn
from yourtable
) src
pivot
(
max(checktime)
for checktype in ([1], [2])
) piv
See SQL Fiddle with Demo
The result of this is:
| EMPID | 1 | 2 |
---------------------------------------------------------------------------
| 5 | January, 03 2013 09:00:15+0000 | January, 03 2013 11:00:00+0000 |
| 5 | January, 03 2013 11:30:00+0000 | January, 03 2013 13:00:00+0000 |
| 5 | January, 03 2013 13:30:00+0000 | January, 03 2013 16:00:00+0000 |
| 5 | January, 03 2013 16:30:00+0000 | January, 03 2013 18:00:00+0000 |
Once the data is in this structure, you can easily get the difference in the time by applying the DateDiff() function.
The final query to generate the amount of time an employee is logged in is:
select empid, sum(SecondsDiff) / 3600 as TotalHours
from
(
select empid, datediff(ss, [1], [2]) SecondsDiff
from
(
select empid, checktime, checktype,
row_number() over(partition by empid, checktype order by checktime) rn
from yourtable
) src
pivot
(
max(checktime)
for checktype in ([1], [2])
) piv
) src
group by empid
See SQL Fiddle with Demo
And the result is:
| EMPID | TOTALHOURS |
----------------------
| 5 | 7 |
To get the difference between two dates, you use the DATEDIFF function:
http://msdn.microsoft.com/en-us/library/ms189794.aspx
I think you'd need to do this row-by-row, though. Because of the structure of your table, you can't just do a simple query.
I have to find the difference in the value of a single column of table as current row value-previous row value for some 'n' number of rows in one column and also I don't have ID to use as reference for increment
date: box_count : total_no_of_boxes_used
1/12/12 2 2
2/12/12 8 6
3/12/12 14 6
I have box_count column and I am trying to get total_no_of_boxes_used column.
Please help me.
Thanks in advance
Given your records are sequential by date.....
SQLFIDDLE DEMO
Query:
select a.date, a.bc, case when (a.bc-b.bc) is null then a.bc else a.bc-b.bc end tot
from tt a
left join
tt b
on a.date > b.date
group by b.date
;
Results:
DATE BC TOT
December, 01 2012 00:00:00+0000 2 2
December, 02 2012 00:00:00+0000 8 6
December, 03 2012 00:00:00+0000 14 6
December, 04 2012 00:00:00+0000 23 9
One way to do this, is using a correlated subquery like so:
SELECT
t1.`date`,
t1.box_count,
t1.box_count -
IFNULL((SELECT t2.box_count
FROM table1 t2
WHERE t2.`date` < t1.`date`
ORDER BY t2.`date` DESC
LIMIT 1),
0 ) AS total_no_of_boxes_used
FROM table1 t1;
SQL Fiddle Demo
This will give you:
| DATE | BOX_COUNT | TOTAL_NO_OF_BOXES_USED |
------------------------------------------------------------------------
| January, 12 2012 00:00:00+0000 | 2 | 2 |
| February, 12 2012 00:00:00+0000 | 8 | 6 |
| March, 12 2012 00:00:00+0000 | 14 | 6 |
Try this:
SELECT date,
(box_count - #diff) total_no_of_boxes_used,
(#diff:=box_count) box_count
FROM table1, (SELECT #diff:=0) A;
Check this SQL FIDDLE DEMO
OUTPUT
| DATE | TOTAL_NO_OF_BOXES_USED | BOX_COUNT |
------------------------------------------------------------------------
| January, 12 2012 00:00:00+0000 | 2 | 2 |
| February, 12 2012 00:00:00+0000 | 6 | 8 |
| March, 12 2012 00:00:00+0000 | 6 | 14 |
I have a MYSQL table like this:
id | userid | score | datestamp |
-----------------------------------------------------
1 | 1 | 5 | 2012-12-06 03:55:16
2 | 2 | 0,5 | 2012-12-06 04:25:21
3 | 1 | 7 | 2012-12-06 04:35:33
4 | 3 | 12 | 2012-12-06 04:55:45
5 | 2 | 22 | 2012-12-06 05:25:11
6 | 1 | 16,5 | 2012-12-06 05:55:21
7 | 1 | 19 | 2012-12-06 13:55:16
8 | 2 | 8,5 | 2012-12-07 06:27:16
9 | 2 | 7,5 | 2012-12-07 08:33:16
10 | 1 | 10 | 2012-12-07 09:25:19
11 | 1 | 6,5 | 2012-12-07 13:33:16
12 | 3 | 6 | 2012-12-07 15:45:44
13 | 2 | 4 | 2012-12-07 16:05:16
14 | 2 | 34 | 2012-12-07 18:33:55
15 | 2 | 22 | 2012-12-07 18:42:11
I would like to display user scores like this:
if a user on a certain day has more than 3 scores it would get only highest 3, repeat that for every day for this user and then add all days together. I want to display this sum for every user.
EDIT:
So in the example above for user 1 on 06.12. I would add top 3 scores together and ignore 4th score, then add to that number top 3 from the next day and so on. I need that number for every user.
EDIT 2:
Expected output is:
userid | score
--------------------
1 | 59 //19 + 16.5 + 7 (06.12.) + 10 + 6.5 (07.12.)
2 | 87 //22 + 0.5 (06.12.) + 34 + 22 + 8.5 (07.12.)
3 | 18 //12 (06.12.) + 6 (07.12.)
I hope this is more clear :)
I would really appreciate the help because I am stuck.
Please take a look at the following code, if your answer to my comment is yes :) Since your data all in 2012, and month of november, I took day.
SQLFIDDLE sample
Query:
select y.id, y.userid, y.score, y.datestamp
from (select id, userid, score, datestamp
from scores
group by day(datestamp)) as y
where (select count(*)
from (select id, userid, score, datestamp
from scores group by day(datestamp)) as x
where y.score >= x.score
and y.userid = x.userid
) =1 -- Top 3rd, 2nd, 1st
order by y.score desc
;
Results:
ID USERID SCORE DATESTAMP
8 2 8.5 December, 07 2012 00:00:00+0000
20 3 6 December, 08 2012 00:00:00+0000
1 1 5 December, 06 2012 00:00:00+0000
Based on your latter updates to question.
If you need some per user by year/month/day and then find highest, you may simply add aggregation function like sum to the above query. I am reapeating myself, since your sample data is for just one year, there's no point group by year or month. That's why I took day.
select y.id, y.userid, y.score, y.datestamp
from (select id, userid, sum(score) as score,
datestamp
from scores
group by userid, day(datestamp)) as y
where (select count(*)
from (select id, userid, sum(score) as score
, datestamp
from scores
group by userid, day(datestamp)) as x
where y.score >= x.score
and y.userid = x.userid
) =1 -- Top 3rd, 2nd, 1st
order by y.score desc
;
Results based on sum:
ID USERID SCORE DATESTAMP
1 1 47.5 December, 06 2012 00:00:00+0000
8 2 16 December, 07 2012 00:00:00+0000
20 3 6 December, 08 2012 00:00:00+0000
UPDATED WITH NEW SOURCE DATA SAMPLE
Simon, please take a look at my own sample. As your data was changing, I used mine.
Here is the reference. I have used pure ansi style without any over partition or dense_rank.
Also note the data I used are getting top 2 not top 3 scores. You can change is accordingly.
Guess what, the answer is 10 times simpler than the first impression your first data gave....
SQLFIDDLE
Query to 1:
-- for top 2 sum by user by each day
SELECT userid, sum(Score), datestamp
FROM scores t1
where 2 >=
(SELECT count(*)
from scores t2
where t1.score <= t2.score
and t1.userid = t2.userid
and day(t1.datestamp) = day(t2.datestamp)
order by t2.score desc)
group by userid, datestamp
;
Results for query 1:
USERID SUM(SCORE) DATESTAMP
1 70 December, 06 2012 00:00:00+0000
1 30 December, 07 2012 00:00:00+0000
2 22 December, 06 2012 00:00:00+0000
2 25 December, 07 2012 00:00:00+0000
3 30 December, 06 2012 00:00:00+0000
3 30 December, 07 2012 00:00:00+0000
Final Query:
-- for all two days top 2 sum by user
SELECT userid, sum(Score)
FROM scores t1
where 2 >=
(SELECT count(*)
from scores t2
where t1.score <= t2.score
and t1.userid = t2.userid
and day(t1.datestamp) = day(t2.datestamp)
order by t2.score desc)
group by userid
;
Final Results:
USERID SUM(SCORE)
1 100
2 47
3 60
Here goes a snapshot of direct calculations of data I used.
SELECT
*
FROM
table1
LEFT JOIN
(SELECT * FROM table1 ORDER BY score LIMIT 3) as lr on DATE(lr.datestamp) = DATE(table1.datastamp)
GROUP BY
datestamp