Dealing with Dates in MySQL. Total number of different projects completed - mysql

I have a table in an old version of MySQL 5.x like this:
+---------+------------+------------+
| Task_ID | Start_Date | End_Date |
+---------+------------+------------+
| 1 | 2015-10-15 | 2015-10-16 |
| 2 | 2015-10-17 | 2015-10-18 |
| 3 | 2015-10-19 | 2015-10-20 |
| 4 | 2015-10-21 | 2015-10-22 |
| 5 | 2015-11-01 | 2015-11-02 |
| 6 | 2015-11-17 | 2015-11-18 |
| 7 | 2015-10-11 | 2015-10-12 |
| 8 | 2015-10-12 | 2015-10-13 |
| 9 | 2015-11-11 | 2015-11-12 |
| 10 | 2015-11-12 | 2015-11-13 |
| 11 | 2015-10-01 | 2015-10-02 |
| 12 | 2015-10-02 | 2015-10-03 |
| 13 | 2015-10-03 | 2015-10-04 |
| 14 | 2015-10-04 | 2015-10-05 |
| 15 | 2015-11-04 | 2015-11-05 |
| 16 | 2015-11-05 | 2015-11-06 |
| 17 | 2015-11-06 | 2015-11-07 |
| 18 | 2015-11-07 | 2015-11-08 |
| 19 | 2015-10-25 | 2015-10-26 |
| 20 | 2015-10-26 | 2015-10-27 |
| 21 | 2015-10-27 | 2015-10-28 |
| 22 | 2015-10-28 | 2015-10-29 |
| 23 | 2015-10-29 | 2015-10-30 |
| 24 | 2015-10-30 | 2015-10-31 |
+---------+------------+------------+
If the End_Date of the tasks are consecutive,
then they are part of the same project.
I am interested in finding the total number of different projects completed.
If there is more than one project that have the same number of completion days,
then order by the Start_Date of the project.
For this few sample records the expected output would be:
2015-10-15 2015-10-16
2015-10-17 2015-10-18
2015-10-19 2015-10-20
2015-10-21 2015-10-22
2015-11-01 2015-11-02
2015-11-17 2015-11-18
2015-10-11 2015-10-13
2015-11-11 2015-11-13
2015-10-01 2015-10-05
2015-11-04 2015-11-08
2015-10-25 2015-10-31
I am a bit jammed with this.
I would really appreciate any help. Thanks.

Following query should work:
select tmp.projectid, date_sub(max(tmp.ed2), interval max(tmp.projectdays) day) start_date,
max(tmp.ed2) end_date,
max(tmp.projectdays) No_Of_ProjectDays
from
(
select t1.task_id tid1, t1.start_date sd1, t1.end_date ed1,
t2.task_id tid2, t2.start_date sd2, t2.end_date ed2,
case when datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
then (#pid := #pid + 1)
else (#pid := #pid)
end as ProjectId,
case when datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
then (#pdays := 1)
else (#pdays := #pdays + 1)
end as ProjectDays
from tasks t1 right join tasks t2
on t2.task_id = t1.task_id + 1
cross join (select #pid :=1, #pdays := 1) vars
) tmp
group by tmp.projectid
order by max(tmp.projectdays), start_date
Please find the Demo here.
EDIT : I have made changes in the query and link according to new data sample. Please have a look.

This answers -- and answers correctly -- the original version of this question.
Hmmmm . . . I think you can use variables. The simplest way is to generate a sequential number and then subtract this value to get a constant for adjacent rows from the date:
select min(start_date), max(end_date)
from (select t.*, (#rn := #rn + 1) as rn
from (select t.* from tasks t order by end_date) t cross join
(select #rn := 0) params
) t
group by (end_date - interval rn day);
Here is a db<>fiddle.

It's a little tricky problem, but the query below works fine.
It builds two tables, one with Start_Date and other with End_Date
that NOT IN End_Date and Start_Date respectively from Projects table,
and query these tables fetching Start_Date WHERE Start_Date < End_Date grouping by Start_Date
using aggregate function MIN with End_Date to get a complete Project.
DATEDIFF(MIN(End_Date), Start_Date) to calculate project_duration and able to order by project_duration.
SELECT Start_Date, MIN(End_Date) AS End_Date, DATEDIFF(MIN(End_Date), Start_Date) AS project_duration
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY project_duration ASC, Start_Date ASC;
expected output
+------------+------------+---------------+
| Start_Date | End_Date | project_duration |
+------------+------------+---------------+
| 2015-10-15 | 2015-10-16 | 1 |
| 2015-10-17 | 2015-10-18 | 1 |
| 2015-10-19 | 2015-10-20 | 1 |
| 2015-10-21 | 2015-10-22 | 1 |
| 2015-11-01 | 2015-11-02 | 1 |
| 2015-11-17 | 2015-11-18 | 1 |
| 2015-10-11 | 2015-10-13 | 2 |
| 2015-11-11 | 2015-11-13 | 2 |
| 2015-10-01 | 2015-10-05 | 4 |
| 2015-11-04 | 2015-11-08 | 4 |
| 2015-10-25 | 2015-10-31 | 6 |
+------------+------------+---------------+

Related

MySQL Order By in Subquery

I'm trying to solve this problem on Hacker Rank:
Input (Projects table)
Output:
2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04
So what the problem is asking for is to treat consecutive end dates as part of one project and return the start and end dates of projects ordered by the date differences in ascending order. As you can see from the above example, tasks 1,2,3 are the in the same project, tasks 4,5 are in the same project and tasks 7 and 8 are their own projects.
This is one of the solutions I found:
set #sdate = null;
set #nextdate = null;
select sd, max(ed) ed2
from (
select if(#nextdate = start_date, #sdate, #sdate := start_date) as sd,
#nextdate := end_date as ed
from Projects
order by start_date
) tmp
group by sd
order by datediff(max(ed), sd)
It is using variables to store the previous end date and compare it to the current row, but I'm confused by the order by clause in the subquery:
If I take out the 'order by start_date' in the subquery, the result it returns will be wrong -- I was under the impression that in MySQL the ordering of subqueries is ignored?
My understanding was that order by is executed after the select so here it would be ordering the results from select in the subquery, but it seems like it's actually ordering the source table (Properties) before the select statement -- am I correct?
Could someone help me understand why this is the case? Thanks
The ORDER BY in the subquery is necessary for the variable assignment to work. This is similar to the idea of assigning row numbers in older MySQL version. When you run:
SELECT *
FROM Projects;
Without the ORDER BY, you'll get almost every Start_date value in ascending order except for the Start_date of 2015-11-04 until 2015-11-07 which gives you result in this order:
...
ID Start_date End_date
16 2015-11-04 2015-11-05
10 2015-11-07 2015-11-08
15 2015-11-06 2015-11-07
11 2015-11-05 2015-11-06
...
It start from 4 but then the next one is 7,6 then 5. This breaks the variable assignment. If you run the query like this:
SELECT *,IF(#nextdate = start_date, #sdate, #sdate := start_date) AS sd,
#nextdate := end_date AS ed
FROM Projects;
You can see the difference in the results where the Start_date is in the correct order vs with the incorrect order:
+-----+------------+------------+------------+------------+
| ID | Start_date | End_date | sd | ed |
+-----+------------+------------+------------+------------+
| 1 | 2015-10-01 | 2015-10-02 | 2015-10-01 | 2015-10-02 |
| 24 | 2015-10-02 | 2015-10-03 | 2015-10-01 | 2015-10-03 |
| 2 | 2015-10-03 | 2015-10-04 | 2015-10-01 | 2015-10-04 |
| 23 | 2015-10-04 | 2015-10-05 | 2015-10-01 | 2015-10-05 |
.......
| 3 | 2015-10-11 | 2015-10-12 | 2015-10-11 | 2015-10-12 |
| 22 | 2015-10-12 | 2015-10-13 | 2015-10-11 | 2015-10-13 |
.......
| 16 | 2015-11-04 | 2015-11-05 | 2015-11-04 | 2015-11-05 |
| 10 | 2015-11-07 | 2015-11-08 | 2015-11-07 | 2015-11-08 |
| 15 | 2015-11-06 | 2015-11-07 | 2015-11-06 | 2015-11-07 |
| 11 | 2015-11-05 | 2015-11-06 | 2015-11-05 | 2015-11-06 |
+-----+------------+------------+------------+------------+
With the subquery returning the result like above, whereby 3 of the rows should actually generate sd=2015-11-04 but instead have different value, the outer query GROUP BY will then gave 14 rows of results. By adding the ORDER BY Start_date in the subquery, you'll get the result for 2015-11-04 until 2015-11-07 like this instead:
+----+------------+------------+------------+------------+
| ID | Start_date | End_date | sd | ed |
+----+------------+------------+------------+------------+
| 16 | 2015-11-04 | 2015-11-05 | 2015-11-04 | 2015-11-05 |
| 11 | 2015-11-05 | 2015-11-06 | 2015-11-04 | 2015-11-06 |
| 15 | 2015-11-06 | 2015-11-07 | 2015-11-04 | 2015-11-07 |
| 10 | 2015-11-07 | 2015-11-08 | 2015-11-04 | 2015-11-08 |
+----+------------+------------+------------+------------+
So, it's not actually the ordering that makes the answer wrong but it's the extra rows in the end result.
Here's a fiddle to play around

Mysql: Select number of transactions from 1 week ago

I have a table like this:
transactions
+------------+------------+
| date_id | t_count |
+------------+------------+
| 2019-01-30 | 100 |
| 2019-01-29 | 99 |
| 2019-01-28 | 98 |
| 2019-01-27 | 97 |
| 2019-01-26 | 96 |
| 2019-01-25 | 95 |
| 2019-01-24 | 94 |
| 2019-01-23 | 93 |
| 2019-01-22 | 92 |
| 2019-01-21 | 91 |
| 2019-01-20 | 90 |
+------------+------------+
I would like to get t_count for the date as well as t_count for one week prior, like so:
+------------+------------+------------------+
| date_id | t_count | t_count_7d_prev |
+------------+------------+------------------+
| 2019-01-30 | 100 | 93 |
| 2019-01-29 | 99 | 92 |
| 2019-01-28 | 98 | 91 |
| 2019-01-27 | 97 | 90 |
+------------+------------+------------------+
I've tried the following query but it gives me nulls for the last column.
select
date_id,
t_count,
(select t_count
from transactions
where date(date_id) = date(date_id) - interval 7 day) as t_count_7d_prev
from
transactions
Is there another way that I should try subtracting the dates?
You can use window functions. If date_id is a date:
select date_id, t_count,
sum(t_count) over (order by date_id
range between interval 7 day preceding and interval 7 day preceding
) as t_count_7d_prev
from transactions t;
Or, if you are sure you have data every date, then use lag():
select t.*,
lag(t_count, 7) over (order by date_id) as t_count_7d_prev
from t;
This is a simple internal join.
select a.date_id, a.t_count, b.t_count as t_count_7d_prev
from
transactions a left join transaction b
on a.dat_id = DATE_ADD(b.date_id,INTERVAL 7 DAY)

MySQL Query to get the monthly data difference

select * from new_joiner;
+------+--------------+
| id | date_of_join |
+------+--------------+
| 1 | 2020-01-10 |
| 2 | 2020-01-02 |
| 3 | 2020-01-05 |
| 4 | 2020-02-10 |
| 5 | 2020-02-11 |
| 6 | 2020-07-11 |
| 7 | 2020-07-11 |
| 8 | 2020-07-11 |
| 9 | 2020-07-11 |
| 10 | 2020-07-11 |
| 11 | 2020-05-01 |
| 12 | 2020-05-02 |
| 13 | 2020-05-03 |
| 14 | 2020-05-04 |
| 15 | 2020-05-05 |
| 16 | 2020-05-05 |
| 17 | 2020-05-06 |
+------+--------------+
select MONTHNAME(date_of_join) as MONTHNAME,
count(id) as JOINEE
from new_joiner
where MONTH(date_of_join)>=1
group by MONTH(date_of_join);
+-----------+--------+
| MONTHNAME | JOINEE |
+-----------+--------+
| January | 3 |
| February | 2 |
| May | 7 |
| July | 5 |
+-----------+--------+
I want a query that gives me the monthly data change compare to previous month.
For example: new joinee in Jan was 3, and in Feb it was 2, so compare to Jan in Feb month -1 joined, so the query should output me:
+-----------+-------------+
| MONTHNAME | JOINEE_DIFF |
+-----------+-------------+
| February | -1 |
| Mar | -2 |
| April | 0 |
| May | 7 |
| June | -7 |
| July | 5 |
| Aug | -5 |
| Sep | 0 |
| Oct | 0 |
| Nov | 0 |
| Dec | 0 |
+-----------+-------------+
Ignore Jan as it doesn't have a previous month and assume we have data only for a given year say 2020. Require data for all months from Feb to Dec.
Assuming you have data for every month, you can use lag():
select MONTHNAME(date_of_join) as MONTHNAME,
count(id) as JOINEE,
(count(*) - lag(count(*)) over (order by min(date_of_join)) as diff
from new_joiner
where MONTH(date_of_join) >= 1
group by MONTH(date_of_join);
Note that using months without years if fraught with peril. Also, the month() of any well-formed date should be larger than 1.
All this suggests a query more like:
select *
from (select MONTHNAME(date_of_join) as MONTHNAME,
count(id) as JOINEE,
(count(*) - lag(count(*)) over (order by min(date_of_join)) as diff,
min(date_of_join) as min_date_of_join
from new_joiner
where date_of_join >= '2020-01-01' and date_of_join < '2021-01-01'
group by MONTH(date_of_join)
) t
where diff is not null
order by min_date_of_join;
Use a correlated subquery to get the number of joinees of previous month and subtract it:
SELECT
t.monthname,
joinee - (SELECT COUNT(*) FROM new_joiner WHERE MONTH(date_of_join) = t.month - 1) JOINEE_DIFF
FROM (
SELECT MONTH(date_of_join) month, MONTHNAME(date_of_join) monthname,
COUNT(id) joinee
FROM new_joiner
GROUP BY month, monthname
) t
WHERE t.month > 1;

MySQL count / track streaks or consecutive dates - follow up

This is a follow up question to MySQL count / track streaks or consecutive dates
The solution provided by Matt to my earlier question works great, but I'm running into an issue now that I'm dealing with 1 extra column (prod_cond). A product can be used or new, and will be listed under the same prod_id. In these cases the streaks are not calculated correctly anymore.
I created an example here: http://sqlfiddle.com/#!9/3f04c3/17
I haven't been able to get them to display correctly with this additional column.
+-----+------------+------------+-----------------------------------+
| id | seller_id | prod_id | prod_cond | date |
+-----+------------+------------------------------------------------+
| 1 | 283 | 4243 | 1 | 2016-10-10 23:55:01 |
| 2 | 283 | 4243 | 2 | 2016-10-10 02:01:06 |
| 3 | 283 | 4243 | 1 | 2016-10-11 23:55:06 |
| 4 | 283 | 4243 | 2 | 2016-10-11 23:55:07 |
| 5 | 283 | 4243 | 1 | 2016-10-12 23:55:07 |
| 6 | 283 | 4243 | 2 | 2016-10-13 23:55:07 |
| 7 | 283 | 4243 | 1 | 2016-10-14 23:55:07 |
| 8 | 283 | 4243 | 2 | 2016-10-14 23:57:06 |
| 9 | 283 | 4243 | 1 | 2016-10-15 23:57:06 |
| 10 | 283 | 4243 | 2 | 2016-10-15 23:57:06 |
+-----+------------+------------+-------------+---------------------+
So basically I need to identify each block of consecutive dates for each seller (seller_id) selling products (prod_id) with product condition (prod_cond) new (1) or (2) used.
This is what the result should look like:
+------------+---------+---------+----------------+---------------+
| seller_id | prod_id | cond_id | streak in days | begin streak |
+------------+---------+---------+----------------+---------------+
| 283 | 4243 | 1 | 3 | 2016-10-10 |
| 283 | 4243 | 1 | 2 | 2016-10-14 |
| 283 | 4243 | 2 | 2 | 2016-10-10 |
| 283 | 4243 | 2 | 3 | 2016-10-13 |
+------------+---------+---------+----------------+---------------|
But as you can see here: http://sqlfiddle.com/#!9/3f04c3/17
It is not working correctly.
In MySQL, you would do this using variables:
select seller_id, prod_id, cond_id, count(*) as numdays,
min(date), max(date)
from (select t.*,
(#rn := if(#grp = concat_ws(':', seller_id, prod_id, cond_id), #rn + 1,
if(#grp := concat_ws(':', seller_id, prod_id, cond_id), #rn + 1, #rn + 1)
)
) rn
from transact t cross join
(select #grp := 0, #rn := '') params
order by seller_id, prod_id, cond_id, date
) t
group by seller_id, prod_id, cond_id, date_sub(date(date), interval rn day)
The idea is that for each group -- based on seller, product, and condition -- the query enumerates the dates. Then, the date minus the enumerated value is constant for consecutive dates.
Here is a SQL Fiddle showing it working.

get all dates in the current month

i have table
userID | date | time
===================
1 | 2015-02-08 | 06:32
1 | 2015-02-08 | 05:36
1 | 2015-02-08 | 17:43
1 | 2015-02-08 | 18:00
1 | 2015-02-09 | 06:36
1 | 2015-02-09 | 15:43
1 | 2015-02-09 | 19:00
1 | 2015-02-10 | 05:36
1 | 2015-02-10 | 17:43
1 | 2015-02-10 | 18:00
2 | 2015-02-08 | 06:32
2 | 2015-02-08 | 05:36
2 | 2015-02-08 | 17:43
2 | 2015-02-08 | 18:00
2 | 2015-02-09 | 06:36
2 | 2015-02-09 | 15:43
2 | 2015-02-09 | 19:00
2 | 2015-02-10 | 05:36
2 | 2015-02-10 | 17:43
2 | 2015-02-10 | 18:00
But i want the number of records returned to be exactly the same as the number of days of the current month and get min time for in and max time for the out. if the current month has 28 days and only had two records it should bring:
userID | date | in | out
========================
1 | 2015-02-01 | |
1 | 2015-02-02 | |
1 | 2015-02-03 | |
1 | 2015-02-04 | |
1 | 2015-02-05 | |
1 | 2015-02-06 | |
1 | 2015-02-07 | |
1 | 2015-02-08 | 06:32 | 18:00
1 | 2015-02-09 | 06:36 | 19:00
1 | 2015-02-10 | 05:36 | 18:00
1 | 2015-02-11 | |
1 | 2015-02-12 | |
1 | 2015-02-13 | |
1 | 2015-02-14 | |
1 | 2015-02-15 | |
1 | 2015-02-16 | |
1 | 2015-02-17 | |
1 | 2015-02-18 | |
1 | 2015-02-19 | |
1 | 2015-02-20 | |
1 | 2015-02-21 | |
1 | 2015-02-22 | |
1 | 2015-02-23 | |
1 | 2015-02-24 | |
1 | 2015-02-25 | |
1 | 2015-02-26 | |
1 | 2015-02-27 | |
1 | 2015-02-28 | |
How can i modify my query to achieve the above result?
this is my query:
$sql = "SELECT
colUserID,
colDate,
if(min(colJam) < '12:00:00',min(colJam), '') as in,
if(max(colJam) > '12:00:00',max(colJam), '') as out
FROM tb_kehadiran
WHERE colDate > DATE_ADD(MAKEDATE($tahun, 31),
INTERVAL($bulan-2) MONTH)
AND
colDate < DATE_ADD(MAKEDATE($tahun, 1),
INTERVAL($bulan) MONTH)
AND
colUserID = $user_id
GROUP BY colUserID,colDate";
I had to think about this one. But probably the simpliest answer so far:
WITH AllMonthDays as (
SELECT n = 1
UNION ALL
SELECT n + 1 FROM AllMonthDays WHERE n + 1 <= DAY(EOMONTH(GETDATE()))
)
SELECT
DISTINCT datefromparts(YEAR(GETDATE()), MONTH(GETDATE()), n) As dates
, MIN(d.time) as 'In'
, MAX(d.time) as 'Out'
FROM AllMonthDays as A
LEFT OUTER JOIN
table as d on
DAY(d.date) = A.n
GROUP BY n,(d.date);
--- Test and tried in this environment: ---
use Example;
CREATE TABLE demo (
ID int identity(1,1)
,date date
,time time
);
INSERT INTO demo (date, time) VALUES
('2015-12-08', '06:32'),
('2015-12-08', '05:36'),
('2015-12-08', '17:43'),
('2015-12-08', '18:00'),
('2015-12-09', '06:36'),
('2015-12-09', '15:43'),
('2015-12-09', '19:00'),
('2015-12-10', '05:36'),
('2015-12-10', '17:43'),
('2015-12-10', '18:00')
;
WITH AllMonthDays as (
SELECT n = 1
UNION ALL
SELECT n + 1 FROM AllMonthDays WHERE n + 1 <= DAY(EOMONTH(GETDATE()))
)
SELECT
DISTINCT datefromparts(YEAR(GETDATE()), MONTH(GETDATE()), n) As dates
, MIN(d.time) as 'In'
, MAX(d.time) as 'Out'
FROM AllMonthDays as A
LEFT OUTER JOIN
demo as d on
DAY(d.date) = A.n
GROUP BY n,(d.date);
DROP table demo;
The way I've approached this problem in the past is to have a date table that is pre-populated for some years in the future.
You could create such a table, possibly defining columns for year, month and date, with indexes on year and month.
You can then use this table with a JOIN on your data to ensure that all dates are present in your results.
You need three things:
A list of dates.
A left join
Aggregation
So:
select d.dte, min(t.time), max(t.time)
from (select date('2015-02-01') as dte union all
select date('2015-02-02') union all
. .
select date('2015-02-28')
) d left join
t
on d.dte = t.date
group by d.dte
order by d.dte;
Try this
set #is_first_date = 0;
set #temp_start_date = date('2015-02-01');
set #temp_end_date = date('2015-02-28');
select my_dates.date,your_table_name.user_id, MIN(your_table_name.time), MAX(your_table_name.time) from
( select if(#is_first_date , #temp_start_date := DATE_ADD(#temp_start_date, interval 1 day), #temp_start_date) as date,#is_first_date:=#is_first_date+1 as start_date from information_schema.COLUMNS
where #temp_start_date < #temp_end_date limit 0, 31
) my_dates left join your_table_name on
my_dates.date = your_table_name.date
group by my_dates.date
Try This query
SELECT `date`, MIN(`time`) as `IN`, MAX('time') AS `OUT`
FROM `table_name` WHERE month(current_date) = month(`date`)
GROUP BY `date`;