SQL date difference using the same column based on conditions - mysql

I'm trying to calculate the days since the last different order so for example let's say I have the following table:
cust_id|Product_id|Order_date|
1 |a |10/02/2020|
2 |b |10/01/2020|
3 |c |09/07/2020|
4 |d |09/02/2020|
1 |a |08/29/2020|
1 |f |08/02/2020|
2 |g |07/01/2020|
3 |t |06/06/2020|
4 |j |05/08/2020|
1 |w |04/20/2020|
I want to find the difference between the most recent date and the previous date that has a product ID that doesn't match the most recent product ID.
So the output should be something like this:
cust_id|latest_Product_id|time_since_last_diff_order_days|
1 |a |30 |
2 |b |92 |
3 |c |91 |
4 |d |123 |
Here's the query that I tried to use but got an error (error code 1064)
SELECT a.cust_id, a.Product_ID as latest_Product_id, DATEDIFF(MAX(a.Order_date),MAX(b.Order_date)) as time_since_last_diff_order_days
FROM database_customers.cust_orders a
INNER JOIN
database_customers.cust_orders b
on
a.cust_id = b.cust_id
WHERE a.product_id =! b.prodcut_id;
Thank you for any help!

It isn't pretty,, but will do the job
CREATE TABLE tab1
(`cust_id` int, `Product_id` varchar(1), `Order_date` datetime)
;
INSERT INTO tab1
(`cust_id`, `Product_id`, `Order_date`)
VALUES
(1, 'a', '2020-10-02 02:00:00'),
(2, 'b', '2020-10-01 02:00:00'),
(3, 'c', '2020-09-07 02:00:00'),
(4, 'd', '2020-09-02 02:00:00'),
(1, 'a', '2020-08-29 02:00:00'),
(1, 'f', '2020-08-02 02:00:00'),
(2, 'g', '2020-07-01 02:00:00'),
(3, 't', '2020-06-06 02:00:00'),
(4, 'j', '2020-05-08 02:00:00'),
(1, 'w', '2020-04-20 02:00:00')
;
WITH CTE AS (SELECT `cust_id`, `Product_id`,`Order_date`,ROW_NUMBER() OVER(PARTITION BY `cust_id` ORDER BY `Order_date` DESC) rn
FROM tab1)
SELECT t1.`cust_id`, t1.`Product_id`, t2.time_since_last_diff_order_days
FROM
(SELECT
`cust_id`, `Product_id`
FROM
CTE
WHERE rn = 1 ) t1
JOIN
( SELECT `cust_id`,DATEDIFF(MAX(`Order_date`), MIN(`Order_date`)) time_since_last_diff_order_days
FROM CTE WHERE rn in (1,2) GROUP BY `cust_id`) t2 ON t1.cust_id = t2.cust_id
cust_id | Product_id | time_since_last_diff_order_days
------: | :--------- | ------------------------------:
1 | a | 34
2 | b | 92
3 | c | 93
4 | d | 117
db<>fiddle here

I want to find the difference between the most recent date and the previous date that has a product ID that doesn't match the most recent product ID.
You can use first_value() to get the last product and then aggregate:
select cust_id, last_product_id, max(order_date),
datediff(max(order_date), max(case when product_id <> last_product_id then order_date end)) as diff_from_last_product
from (select co.*,
first_value(product_id) over (partition by cust_id order by order_date) as last_product_id
from cust_orders co
) co
group by cust_id, last_product_id;

Related

Divide the sum of each group by the grand total

I want to get the sum of group A and B separately, and divide each by the total sum.
I tried to use this:
select name, sum(qt)
from ntbl
group by name
order_id
name
qt
1
A
12
2
A
20
3
B
33
4
B
45
Result should be as:
name
qt
dv
A
32
0.29
B
78
0.70
You can combine aggregate and window functions together:
select name
, sum(qt) as sum_qt
, sum(qt) / sum(sum(qt)) over () * 100 as pct_qt
from t
group by name
You can crossjoin another subquery that sums up all quantities
CREATE TABLE ntbl (
`order_id` INTEGER,
`name` VARCHAR(1),
`qt` INTEGER
);
INSERT INTO ntbl
(`order_id`, `name`, `qt`)
VALUES
('1', 'A', '12'),
('2', 'A', '20'),
('3', 'B', '33'),
('4', 'B', '45');
SELECT name, sum_name, ROUND(sum_name/sum_qt,2) as dv
FROM
(select name,SUM(qt) sum_name from ntbl group by name) q1 CROSS JOIN (SELECT SUM(`qt`) sum_qt FROM ntbl) q2
name | sum_name | dv
:--- | -------: | ---:
A | 32 | 0.29
B | 78 | 0.71
db<>fiddle here

Select all orders with the specified status saved in another table (PDO)

At the moment i have all informations of a order in one table, including the order status.
In the future i will have a new table "status" to make a order history.
My tables currently look like this (simplified):
Table "orders":
id
date
name
10001
2021-08-24 16:47:52
Surname Lastname
10002
2021-08-30 17:32:05
Nicename Nicelastname
Table "status":
id
order_id
statusdate
status
1
10001
2021-08-24 16:47:52
new
2
10002
2021-08-30 17:32:05
new
3
10001
2021-08-26 13:44:11
pending
4
10001
2021-09-02 10:01:12
shipped
My problem is: At this moment i can select all orders with status "shipped" to list them like that:
$sql = $pdo->prepare("SELECT * FROM orders WHERE status = ?");
(? is my status, e.g. "shipped")
I know i have to use LEFT JOIN to combine the two tables (correct? or is there a better/easier way?), but i have absolutely no idea how i can select all orders with status X, because the "status" table can have multiple entries per order_id... So the statement must select only the newest entrie!?
Iy you have no 1 timestamps with the same date, you can use the first query the second is in case you can have multiple timestampo for the same order
CREATE TABLE orders
(`id` int, `date` varchar(19), `name` varchar(21))
;
INSERT INTO orders
(`id`, `date`, `name`)
VALUES
(10001, '2021-08-24 16:47:52', 'Surname Lastname'),
(10002, '2021-08-30 17:32:05', 'Nicename Nicelastname')
;
CREATE TABLE status
(`id` int, `order_id` int, `statusdate` varchar(19), `status` varchar(7))
;
INSERT INTO status
(`id`, `order_id`, `statusdate`, `status`)
VALUES
(1, 10001, '2021-08-24 16:47:52', 'new'),
(2, 10002, '2021-08-30 17:32:05', 'new'),
(3, 10001, '2021-08-26 13:44:11', 'pending'),
(4, 10001, '2021-09-02 10:01:12', 'shipped')
,
(5, 10001, '2021-09-02 10:01:13', 'shipped')
;
select o.`id` FROM orders o
| id |
| ----: |
| 10001 |
| 10002 |
select o.`id`,o.`date`, o.`name`,s.`statusdate`
from orders o join status s on o.`id` = s.order_id
where s.`status` = "shipped"
AND s.`statusdate` = (SELECT MAX(`statusdate`) FROM `status` WHERE order_id = o.`id` AND `status` = "shipped")
order by o.`id` desc
id | date | name | statusdate
----: | :------------------ | :--------------- | :------------------
10001 | 2021-08-24 16:47:52 | Surname Lastname | 2021-09-02 10:01:13
SELECT
id, `date`, `name`,`statusdate`
FROM
(SELECT `date`, `name`,`statusdate`,IF( `id` = #id,#rownum := #rownum +1,#rownum :=1) rn, #id := `id` as id
FROM (select o.`id`,o.`date`, o.`name`,s.`statusdate`
from orders o join status s on o.`id` = s.order_id
where s.`status` = "shipped"
order by o.`id` desc,s.`statusdate` DESC) t2 ,(SELECT #id := 0, #rownum:=0) t1) t2
WHERE rn = 1
id | date | name | statusdate
----: | :------------------ | :--------------- | :------------------
10001 | 2021-08-24 16:47:52 | Surname Lastname | 2021-09-02 10:01:13
db<>fiddle here

MySQL 5.5 - count open items per day

I have the below table that is just a snapshot and all I want to do is to calculate the number of open items per date.
I used to do it in excel with simple formula =COUNTIFS($A$2:$A$30000,"<="&E2,$B$2:$B$30000,">="&E2) where column A was the Open_Date dates and column B the Close_Date dates. I want to use SQL to get the same results.
This is my excel snapshot. Formula above.
In mysql I have replicated it with T1 table:
CREATE TABLE T1
(
ID int (10),
Open_Date date,
Close_Date date);
insert into T1 values (1, '2018-12-17', '2018-12-18');
insert into T1 values (2, '2018-12-18', '2018-12-18');
insert into T1 values (3, '2018-12-18', '2018-12-18');
insert into T1 values (4, '2018-12-19', '2018-12-20');
insert into T1 values (5, '2018-12-19', '2018-12-21');
insert into T1 values (6, '2018-12-20', '2018-12-22');
insert into T1 values (7, '2018-12-20', '2018-12-22');
insert into T1 values (8, '2018-12-21', '2018-12-25');
insert into T1 values (9, '2018-12-22', '2018-12-26');
insert into T1 values (10, '2018-12-23', '2018-12-27');
First step was to create the table with dates in case there any gap in Date_open. So my code at the moment is
SELECT
d.dt, Temp_T1.*
FROM
(
SELECT '2018-12-17' AS dt UNION ALL
SELECT '2018-12-18' UNION ALL
SELECT '2018-12-19' UNION ALL
SELECT '2018-12-20' UNION ALL
SELECT '2018-12-21' UNION ALL
SELECT '2018-12-22' UNION ALL
SELECT '2018-12-23' UNION ALL
SELECT '2018-12-24'
) d
LEFT JOIN
(SELECT * FROM T1) AS Temp_T1
ON Temp_T1.Open_Date = d.dt
I am lost how to calculate the same values as I do in excel?
You want to use GROUP BY to make one row for each date in your d derived table.
Then join d to the t1 table where the d.dt is between the open and close dates.
SELECT
d.dt, COUNT(*) AS open_items
FROM
(
SELECT '2018-12-17' AS dt UNION ALL
SELECT '2018-12-18' UNION ALL
SELECT '2018-12-19' UNION ALL
SELECT '2018-12-20' UNION ALL
SELECT '2018-12-21' UNION ALL
SELECT '2018-12-22' UNION ALL
SELECT '2018-12-23' UNION ALL
SELECT '2018-12-24'
) d
LEFT JOIN T1 ON d.dt BETWEEN t1.Open_Date and t1.Close_Date
GROUP BY d.dt;
Output:
+------------+------------+
| dt | open_items |
+------------+------------+
| 2018-12-17 | 1 |
| 2018-12-18 | 3 |
| 2018-12-19 | 2 |
| 2018-12-20 | 4 |
| 2018-12-21 | 4 |
| 2018-12-22 | 4 |
| 2018-12-23 | 3 |
| 2018-12-24 | 3 |
+------------+------------+

mysql running difference with group by

Dataset I am experimenting has the structure as given in this SQLFiddle.
create table readings_tab (id int, site varchar(15), logged_at datetime, reading smallint);
insert into readings_tab values (1, 'A', '2017-08-21 13:22:00', 2500);
insert into readings_tab values (2, 'B', '2017-08-21 13:22:00', 1210);
insert into readings_tab values (3, 'C', '2017-08-21 13:22:00', 3500);
insert into readings_tab values (4, 'A', '2017-08-22 13:22:00', 2630);
insert into readings_tab values (5, 'B', '2017-08-22 13:22:00', 1400);
insert into readings_tab values (6, 'C', '2017-08-22 13:22:00', 3800);
insert into readings_tab values (7, 'A', '2017-08-23 13:22:00', 2700);
insert into readings_tab values (8, 'B', '2017-08-23 13:22:00', 1630);
insert into readings_tab values (9, 'C', '2017-08-23 13:22:00', 3950);
insert into readings_tab values (10, 'A', '2017-08-24 13:22:00', 2850);
insert into readings_tab values (11, 'B', '2017-08-24 13:22:00', 1700);
insert into readings_tab values (12, 'C', '2017-08-24 13:22:00', 4200);
insert into readings_tab values (13, 'A', '2017-08-25 13:22:00', 3500);
insert into readings_tab values (14, 'B', '2017-08-25 13:22:00', 2300);
insert into readings_tab values (15, 'C', '2017-08-25 13:22:00', 4700);
Current Query:
select t.rownum, t.logged_on, t.tot_reading, coalesce(t.tot_reading - t3.tot_reading, 0) AS daily_generation
from
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t, (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t
left join
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t, (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t3 on t.rownum = t3.rownum + 1
order by t.logged_on desc;
I am expecting below output. I don't need the formula (3500+2300+4700, etc...) in the result set. Just included it to make it understandable.
-----------------------------------------------------------------
| logged_on | tot_reading | daily_generation |
-----------------------------------------------------------------
| 2017-08-25 | (3500+2300+4700) = 10500 | (10500 - 8750) = 1750 |
| 2017-08-24 | (2850+1700+4200) = 8750 | (8750-8280) = 470 |
| 2017-08-23 | (2700+1630+3950) = 8280 | (8280-7830) = 450 |
| 2017-08-22 | (2630+1400+3800) = 7830 | (7830-7210) = 620 |
| 2017-08-21 | (2500+1210+3500) = 7210 | 0 |
-----------------------------------------------------------------
I cannot figure out why it doesn't produce expected output. Can someone please help?
If using variables make sure they are unique to each subquery else you can get incorrect results. I suggest the following adjusted query (which has some added columns to help follow what is happening):
select
t.rownum, t.logged_on, t.tot_reading
, coalesce(t.tot_reading - t3.tot_reading, 0) AS daily_generation
, t3.rownum t3_rownum
, t3.tot_reading t3_to_read
, t.tot_reading t_tot_read
from
(
select #rn:=#rn+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t
cross join (SELECT #rn:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t
left join
(
select #rn2:=#rn2+1 AS rownum, date(t.logged_at) AS logged_on, sum(t.reading) AS tot_reading
from readings_tab t
cross join (SELECT #rn2:=0) t2
group by date(t.logged_at)
order by date(t.logged_at) desc
) t3 on t.rownum = t3.rownum + 1
order by t.logged_on desc
;
Note I also recommend using explicit CROSS JOIN syntax as it leads to easier comprehension for anyone who needs to maintain this query.
Here is the result (& also see http://sqlfiddle.com/#!9/dcb5e2/1 )
| rownum | logged_on | tot_reading | daily_generation | t3_rownum | t3_to_read | t_tot_read |
|--------|------------|-------------|------------------|-----------|------------|------------|
| 5 | 2017-08-25 | 10500 | 1750 | 4 | 8750 | 10500 |
| 4 | 2017-08-24 | 8750 | 470 | 3 | 8280 | 8750 |
| 3 | 2017-08-23 | 8280 | 450 | 2 | 7830 | 8280 |
| 2 | 2017-08-22 | 7830 | 620 | 1 | 7210 | 7830 |
| 1 | 2017-08-21 | 7210 | 0 | (null) | (null) | 7210 |

Filter out rows with date range gaps in mysql

In mysql I have a table similar to the following one:
--------------------------------------------
| id | parent_id | date_start | date_end |
--------------------------------------------
| 1 | | 2017-05-01 | 2017-05-10 |
| 2 | 1 | 2017-05-01 | 2017-05-10 |
| 3 | | 2017-06-01 | 2017-06-10 |
| 4 | 3 | 2017-06-01 | 2017-06-03 |
| 5 | 3 | 2017-06-04 | 2017-06-06 |
| 6 | 3 | 2017-06-07 | 2017-06-10 |
| 7 | | 2017-07-01 | 2017-07-10 |
| 8 | 7 | 2017-07-01 | 2017-07-03 |
| 9 | 7 | 2017-07-04 | 2017-07-06 |
| 10 | 7 | 2017-07-08 | 2017-07-10 |
rows without parent id are "pricelists" while rows with parent are pricelist periods.
I'd want to filter out pricelist ids with periods that have time gaps, so ideally my query should return 1 and 3.
So far I've written a simple query which correctly returns 3:
SELECT distinct period1.parent_id
FROM pricelist period1
INNER JOIN pricelist period2
ON period1.parent_id = period2.parent_id
AND period2.date_start = DATE_ADD(period1.date_end,INTERVAL 1 DAY);
but unfortunately it doesn't take into account pricelists with a single period, which have no gaps by definition!
So I was wondering if it could be possible to modify such a query to return pricelists with either single periods or multiple periods without time gaps, possibly without a UNION.
I had difficulty finding a MySQL workspace so initially I presented a T-SQL solution, but have subsequently learned how to use MySQL at rextester which has helped a lot. The syntax for the DATEDIFF() function in MySQL is the opposite logic to T-SQL which became complex without an ability to test it. Hopefully now resolved.
The basic logic of this approach is to calculate the overall duration of the parents. Then to calculate the sum of duration of all the children. Then compare these durations (in a join) and if they are the same you have no gaps.
Note this logic isn't tested for overlaps in children but I would expect thses to also fail at the join where durations are compared.
Data:
#drop table if exists `PRICELIST`;
create table `PRICELIST`
(`id` int, `parent_id` int, `date_start` datetime, `date_end` datetime)
;
INSERT INTO PRICELIST
(`id`, `parent_id`, `date_start`, `date_end`)
VALUES
(1, NULL, '2017-05-01 00:00:00', '2017-05-10 00:00:00'),
(2, 1, '2017-05-01 00:00:00', '2017-05-10 00:00:00'),
(3, NULL, '2017-06-01 00:00:00', '2017-06-10 00:00:00'),
(4, 3, '2017-06-01 00:00:00', '2017-06-03 00:00:00'),
(5, 3, '2017-06-04 00:00:00', '2017-06-06 00:00:00'),
(6, 3, '2017-06-07 00:00:00', '2017-06-10 00:00:00'),
(7, NULL, '2017-07-01 00:00:00', '2017-07-10 00:00:00'),
(8, 7, '2017-07-01 00:00:00', '2017-07-03 00:00:00'),
(9, 7, '2017-07-04 00:00:00', '2017-07-06 00:00:00'),
(10, 7, '2017-07-08 00:00:00', '2017-07-10 00:00:00')
;
MySQL:
select p.id, datediff(p.date_end,p.date_start) pdu, x.du
from pricelist p
inner join (
select
p1.parent_id, sum(datediff(p1.date_end,p1.date_start) + coalesce(datediff(p2.date_start,p1.date_end),0)) du
from (
select parent_id,date_start,date_end
from pricelist
where parent_id IS NOT NULL
) p1
left join (
select parent_id,date_start,date_end
from pricelist
where parent_id IS NOT NULL
) p2 on p1.parent_id = p2.parent_id and p1.date_end < p2.date_start
where datediff(p2.date_start,p1.date_end) = 1 or p2.parent_id is null
group by p1.parent_id
) x on p.id = x.parent_id and datediff(p.date_end,p.date_start) = x.du
where p.parent_id IS NULL
;
see it working at: http://rextester.com/JRD47056
T-SQL:
Whilst I could not find a MySQL environment that worked, I used MSSQL but avoided any analytic functions etc that can't be used in MySQL. However I have relied on datediff() which is slightly different to the same function in MySQL.
TSQL query
select p.id, datediff(day,p.date_start,p.date_end) du
from #pricelist p
inner join (
select
p1.parent_id, sum(datediff(day,p1.date_start,p1.date_end) + coalesce(datediff(day,p1.date_end,p2.date_start),0)) du
from (
select parent_id,date_start,date_end
from #pricelist
where parent_id IS NOT NULL
) p1
left join (
select parent_id,date_start,date_end
from #pricelist
where parent_id IS NOT NULL
) p2 on p1.parent_id = p2.parent_id and p1.date_end < p2.date_start
where datediff(day,p1.date_end,p2.date_start) = 1 or p2.parent_id is null
group by p1.parent_id
) x on p.id = x.parent_id and datediff(day,p.date_start,p.date_end) = x.du
where p.parent_id IS NULL
see it working at: http://rextester.com/KUK2410
In MySQL the datediff() expects just 2 parameters and ou need to swap the field references (i.e. latest date first)
Perhaps there are easier ways. Best I could come up with for now.