INSERT SELECT ON DUPLICATE not updating - mysql

Short
I want to SUM a column in TABLE_A based on CRITERIA X and insert into TABLE_B.total_x
I want to SUM a column in TABLE_A based on CRITERIA Y and insert into TABLE_B.total_y
Problem: Step 2 does not update TABLE_B.total_y
LONG
TABLE_A: Data
| year | month | type | total |
---------------------------------------
| 2013 | 11 | down | 100 |
| 2013 | 11 | down | 50 |
| 2013 | 11 | up | 60 |
| 2013 | 10 | down | 200 |
| 2013 | 10 | up | 15 |
| 2013 | 10 | up | 9 |
TABLE_B: structure
CREATE TABLE `TABLE_B` (
`year` INT(4) NULL DEFAULT NULL,
`month` INT(2) UNSIGNED ZEROFILL NULL DEFAULT NULL,
`total_x` INT(10) NULL DEFAULT NULL,
`total_y` INT(10) NULL DEFAULT NULL,
UNIQUE INDEX `unique` (`year`, `month`)
)
SQL: CRITERIA_X
INSERT INTO TABLE_B (
`year`, `month`, `total_x`
)
SELECT
t.`year`, t.`month`,
SUM(t.`total`) as total_x
FROM TABLE_A t
WHERE
t.`type` = 'down'
GROUP BY
t.`year`, t.`month`
ON DUPLICATE KEY UPDATE
`total_x` = total_x
;
SQL: CRITERIA_Y
INSERT INTO TABLE_B (
`year`, `month`, `total_y`
)
SELECT
t.`year`, t.`month`,
SUM(t.`total`) as total_y
FROM TABLE_A t
WHERE
t.`type` = 'up'
GROUP BY
t.`year`, t.`month`
ON DUPLICATE KEY UPDATE
`total_y` = total_y
;
The second SQL (CRITERIA_Y) does not update total_y as expected. WHY?

I would do it another way
insert into TABLE_B (year, month, total_x, total_y)
select year, month
, sum (case [type] when 'down' then [total] else 0 end) [total_x]
, sum (case [type] when 'up' then [total] else 0 end) [total_y]
from TABLE_A
group by [year], [month]
Or using two subqueries way would be
insert into TABLE_B (year, month, total_x, total_y)
select coalesce(t1.year, t2.year) year
, coalesce(t1.month, t2.month) month
, t1.total_x total_x
, t2.total_y total_y
from (select year, month, sum(total) total_x
from TABLE_A where [type]='down') t1
full outer join
(select year, month, sum(total) total_y
from TABLE_A where [type]='up') t2
on t1.year = t2.year and t1.month = t2.month
Or using union
insert into TABLE_B (year, month, total_x, total_y)
select year, month, sum(total_x), sum(total_y)
from (
select year, month, sum(total) total_x, 0 total_y
from TABLE_A where [type]='down'
group by year, month
union
select year, month, 0 total_x, sum(total) total_y
from TABLE_A where [type]='up'
group by year, month) t
group by year, month
Reading specs on INSERT...ON DUPLICATE KEY UPDATE, I noticed this:
If ... matches several rows, only one row is updated. In general, you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes.
So syntax with composite key is kind of cumbersome, and I personally would avoid using it.

Related

Query average with nested subquery

I cannot figure out how to calculate the running average per customer up until each month.
I tried to write it in one big query using subqueries, and also joins with no luck
Here is the query I tried with a subquery:
SELECT
date_format(z1.ServiceDate, '%y-%b') as months,
(
SELECT
AVG(cc.total) + 1 AS 'avg'
FROM
(
SELECT
z.Customer_ID,
COUNT(z.BookingId) 'total'
from
Orders z
where
YEAR(z.ServiceDate) <= YEAR(z1.months) AND
MONTH(z.ServiceDate) <= MONTH(z1.months)
GROUP BY
z.Customer_ID
) cc
)
from
Orders z1
GROUP BY
YEAR(z1.ServiceDate),
MONTH(z1.ServiceDate)
I also tried to join these two queries with no luck:
SELECT date_format(Orders.ServiceDate, '%y-%b') from Orders
GROUP BY YEAR(Orders.ServiceDate), month(Orders.ServiceDate)
Could not join it with this one:
(
SELECT AVG(cc.total) + 1 AS 'avg' FROM (
SELECT Orders.Customer_ID as 'c',
COUNT(BookingId) 'total' from Orders
where year(Orders.ServiceDate) <= '2019' and month(Orders.ServiceDate)
<= '01'
GROUP BY Orders.Customer_ID
) cc
)
where '2019' and '01' would be taken from the first query.
Here is my test schema:
CREATE TABLE IF NOT EXISTS `orders` (
`BookingId` INT(6) NOT NULL,
`ServiceDate` DATETIME NOT NULL,
`Customer_ID` varchar(1) NOT NULL,
PRIMARY KEY (`BookingId`)
) DEFAULT CHARSET=utf8;
INSERT INTO `orders` (`BookingId`, `ServiceDate`, `Customer_ID`) VALUES
('1', '2019-01-03T12:00:00', '1'),
('2', '2019-01-04T12:00:00', '2'),
('3', '2019-01-12T12:00:00', '2'),
('4', '2019-02-03T12:00:00', '1'),
('5', '2019-02-04T12:00:00', '2'),
('6', '2019-02-012T12:00:00', '3');
I was expecting something like this for all months
month AVG
19-Jan 1.5
19-Feb 2
...
...
The dots is there only to show that there is much many more months in my original dataset.
For January, there was 3 bookings and two Customer_ID's. Therefore the average for bookings up until that month was 1.5. Up until February, There has been 6 bookings, and 3 Customer_IDs. Therefore the new average is 2
Join a subquery that returns the distinct months to the table and aggregate:
SELECT d.month,
COUNT(o.bookingid) / COUNT(DISTINCT o.customer_id) avg
FROM (
SELECT DISTINCT
EXTRACT(YEAR_MONTH FROM servicedate) yearmonth,
DATE_FORMAT(servicedate, '%y-%b') month
FROM orders
) d INNER JOIN orders o
ON EXTRACT(YEAR_MONTH FROM o.servicedate) <= d.yearmonth
GROUP BY d.yearmonth, d.month
See the demo.
Results:
| month | avg |
| ------ | --- |
| 19-Jan | 1.5 |
| 19-Feb | 2 |

Ordering within a MySQL group

I have two tables which are joined - one holds schedules and the other holds actual worked times.
This works fine if a given user only has a single schedule on a day but when they have more than one schedule I cannot get the query to match up the "right" slot to the right time.
I am beginning to think the only way to do this is to allocate the time to the schedule when the clock event happens but that is going to be a big rewrite so I am hoping there is a way in MySQL.
As this is inside a third party application, I am limited in what I can do to the query - I can modify the basics like from, group, joins etc and I can add aggregates to the fields (I have toyed with using min/max on the times). However, if the only way is to write a hugely complex query especially within the field selections then this system simply doesn't give me that option.
Schedule table:
CREATE TABLE `schedule` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `schedule`
--
INSERT INTO `schedule` (`id`, `user_id`, `date`, `start_time`, `end_time`) VALUES
(1, 1, '2019-07-07', '08:00:00', '12:00:00'),
(2, 1, '2019-07-07', '16:00:00', '22:00:00'),
(3, 1, '2019-07-06', '10:00:00', '18:00:00');
Time table
CREATE TABLE `time` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `time`
--
INSERT INTO `time` (`id`, `user_id`, `date`, `start_time`, `end_time`) VALUES
(1, 1, '2019-07-07', '08:00:00', '12:00:00'),
(2, 1, '2019-07-07', '16:00:00', '22:00:00'),
(3, 1, '2019-07-06', '10:00:00', '18:00:00');
Current query
select
t.date as date, t.user_id,
s.start_time as schedule_start,
s.end_time as schedule_end,
t.start_time as actual_start,
t.end_time as actual_end
from time t
left join schedule s on
t.user_id=s.user_id and t.date=s.date
group by t.date, t.start_time
Current output
== Dumping data for table s
|2019-07-06|1|10:00:00|18:00:00|10:00:00|18:00:00
|2019-07-07|1|08:00:00|12:00:00|08:00:00|12:00:00
|2019-07-07|1|08:00:00|12:00:00|16:00:00|22:00:00
Desired output
== Dumping data for table s
|2019-07-06|1|10:00:00|18:00:00|10:00:00|18:00:00
|2019-07-07|1|08:00:00|12:00:00|08:00:00|12:00:00
|2019-07-07|1|16:00:00|22:00:00|16:00:00|22:00:00
Is this possible to achieve?
I would try something like this.
I selected 15 min time limit that a shift should start
select
t.date as date, t.user_id,
s.start_time as schedule_start,
s.end_time as schedule_end,
t.start_time as actual_start,
t.end_time as actual_end
from time t
left join schedule s on
t.user_id=s.user_id and t.date=s.date
and s.start_time BETWEEN t.start_time - INTERVAL 15 MINUTE
AND t.start_time + INTERVAL 15 MINUTE
order by date,schedule_start;
Grouping would you do be add up time for every day and user day
You need a much more complicated query to distinguish the 2 shifts.
So you must execute 2 separate queries each for each shift and combine them with UNION:
select
s.date, s.user_id,
s.schedule_start,
s.schedule_end,
t.actual_start,
t.actual_end
from (
select s.date, s.user_id,
min(s.start_time) as schedule_start,
min(s.end_time) as schedule_end
from schedule s
group by s.date, s.user_id
) s left join (
select t.date, t.user_id,
min(t.start_time) as actual_start,
min(t.end_time) as actual_end
from time t
group by t.date, t.user_id
) t on t.user_id=s.user_id and t.date=s.date
union
select
s.date, s.user_id,
s.schedule_start,
s.schedule_end,
t.actual_start,
t.actual_end
from (
select s.date, s.user_id,
max(s.start_time) as schedule_start,
max(s.end_time) as schedule_end
from schedule s
group by s.date, s.user_id
) s left join (
select t.date, t.user_id,
max(t.start_time) as actual_start,
max(t.end_time) as actual_end
from time t
group by t.date, t.user_id
) t on t.user_id=s.user_id and t.date=s.date
See the demo.
Results:
> date | user_id | schedule_start | schedule_end | actual_start | actual_end
> :--------- | ------: | :------------- | :----------- | :----------- | :---------
> 2019-07-06 | 1 | 10:00:00 | 18:00:00 | 10:00:00 | 18:00:00
> 2019-07-07 | 1 | 08:00:00 | 12:00:00 | 08:00:00 | 12:00:00
> 2019-07-07 | 1 | 16:00:00 | 22:00:00 | 16:00:00 | 22:00:00

Mysql query to filter out data based on two logic

This is a sample table:
sample_id | timestamp | p_id
============================================
62054 | 2018-09-25 10:18:15 | 2652
62054 | 2018-09-27 16:44:57 | 966
62046 | null | 1809
62046 | 2018-09-25 10:18:15 | 2097
We need to filter out unique sample_id column, but the logic is
IF the timestamp column is null, then return those null column data
62046 | null | 1809
IF the timestamp column is not null, then return the latest timestamp column data
62054 | 2018-09-27 16:44:57 | 966
So its great if anyone provide the sql query.
We need somethings like that,
WHERE
IF(
NOT NULL = all row group by sample_id,
row where cancelled_at is maximum,
null column
)
This query should give you the results you want. It looks for a row with a NULL timestamp, or a row which has a non-NULL timestamp which is the maximum timestamp for that sample_id, but only if there isn't a row for that sample_id which has a NULL timestamp:
SELECT *
FROM table1 t1
WHERE timestamp IS NULL OR
timestamp = (SELECT MAX(timestamp)
FROM table1 t2
WHERE t2.sample_id = t1.sample_id) AND
NOT EXISTS (SELECT *
FROM table1 t3
WHERE t3.sample_id = t1.sample_id AND
t3.timestamp IS NULL)
Output:
sample_id timestamp p_id
62054 2018-09-27T16:44:57Z 966
62046 (null) 1809
Using variables:
SELECT sample_id, timestamp, p_id
FROM (
SELECT sample_id, timestamp, p_id,
#seq := IF(#s_id = sample_id, #seq + 1,
IF(#s_id := sample_id, 1, 1)) AS seq
FROM mytable
CROSS JOIN (SELECT #s_id := 0, #seq := 0) AS vars
ORDER BY
sample_id,
CASE
WHEN timestamp IS NULL THEN 1
ELSE 2
END,
timestamp DESC
) AS t
WHERE t.seq = 1;
Demo
Explanation:
To understand how this works you need to execute the subquery and examine the output it produces:
SELECT sample_id, timestamp, p_id,
#seq := IF(#s_id = sample_id, #seq + 1,
IF(#s_id := sample_id, 1, 1)) AS seq
FROM mytable
CROSS JOIN (SELECT #s_id := 0, #seq := 0) AS vars
ORDER BY
sample_id,
CASE
WHEN timestamp IS NULL THEN 1
ELSE 2
END,
timestamp DESC
Output:
sample_id timestamp p_id seq
-------------------------------------------
62046 NULL 1809 1
62046 25.09.2018 10:18:15 2097 2
62054 27.09.2018 16:44:57 966 1
62054 25.09.2018 10:18:15 2652 2
You can see here that calculated field seq is used to prioritize records inside each sample_id slice.
Note: If you're on MySQL 8.0 you can use window functions to implement the same logic.
Find out those records where time is not null and filter out
timestamp nulls sample_id and for null timestamp
Use union
select * from t1 where (t1.sample_id,t1.timestamp)
in (
SELECT t.sample_id,max(t.timestamp) AS time
FROM t1 t
WHERE t.sample_id NOT IN (select sample_id from t1 where t1.timestamp is null)
GROUP BY t.sample_id
)
UNION
SELECT *
FROM t1 t
WHERE t.timestamp IS NULL
output
sample_id timestamp p_id
62054 2018-09-27 16:44:57 966
62046 null 1809
Group by on the sample_id.
Using If() function, check if the minimum value of the timestamp for the group is null or not. If it is null, return null, else return the Max() value.
Try the following query:
SELECT sample_id,
IF(MIN(timestamp) IS NULL,
NULL,
MAX(timestamp)) AS timestamp
FROM your_table
GROUP BY sample_id

MySQL Query to return total hours for state information

I have a mysql table capturing state information for a signal every minute in MySQL table as follows:
ID | state | timestamp |
--------------------------------------
'sig1'| 'red' | '2017-07-10 15:30:21'
'sig1'| 'green' | '2017-07-10 15:31:26'
'sig1'| 'green' | '2017-07-10 15:32:24'
'sig1'| 'red' | '2017-07-10 15:33:29'
'sig1'| 'red' | '2017-07-10 15:34:30'
'sig1'| 'red' | '2017-07-10 15:35:15'
I need to come up with a query where it result should be the most recent time 'sig1' was in 'red' state for more than 5 minutes consecutively, the output of the query should be
ID | state| duration | start_time | end_time
So if you guys can help me with the query, that would be great!
cheers!
You can try something like this:
SELECT t.id,t.consecutive,t.state
,COUNT(*) consecutive_count
,MIN(timestamp) start_time
,MAX(timestamp) end_time
,TIMEDIFF(MAX(timestamp), MIN(timestamp)) AS diff /* for ckeck*/
FROM (SELECT a.* ,
#r:= CASE WHEN #g = a.state AND #h=a.id THEN #r ELSE #r + 1 END consecutive,
#g:= a.state g,
#h:= a.id h
FROM yourtable a
CROSS JOIN (SELECT #g:='', #r:=0, #h:='') t1
ORDER BY id
) t
GROUP BY t.id,t.consecutive,t.state
HAVING (UNIX_TIMESTAMP(end_time)-UNIX_TIMESTAMP(start_time))/60>5
;
Sample data:
CREATE TABLE yourtable (
id VARCHAR(10) NOT NULL ,
state VARCHAR(10) NOT NULL,
timestamp datetime
);
INSERT INTO yourtable VALUES ('sig1','red','2017-07-10 15:30:21');
INSERT INTO yourtable VALUES ('sig1','green','2017-07-10 15:31:26');
INSERT INTO yourtable VALUES ('sig1','green','2017-07-10 15:32:24');
INSERT INTO yourtable VALUES ('sig1','red','2017-07-10 15:33:29');
INSERT INTO yourtable VALUES ('sig1','red','2017-07-10 15:34:30');
INSERT INTO yourtable VALUES ('sig1','red','2017-07-10 15:39:15');
INSERT INTO yourtable VALUES ('sig2','red','2017-07-10 15:15:15');
Output:
id consecutive state consecutive_count start_time end_time diff
sig1 3 red 3 10.07.2017 15:33:29 10.07.2017 15:39:15 00:05:46
SELECT TIMESTAMPDIFF(HOUR,MAXTIME ,MINTIME),ID,state FROM
(
SELECT ID,state,MIN(timestamp)MINTIME,MAX(timestamp) MAXTIME FROM TABLE GROUP BY ID,state
)Z
Try above query.

Difficult MySQL Query - Getting Max difference between dates

I have a MySQL table of the following form
account_id | call_date
1 2013-06-07
1 2013-06-09
1 2013-06-21
2 2012-05-01
2 2012-05-02
2 2012-05-06
I want to write a MySQL query that will get the maximum difference (in days) between successive dates in call_date for each account_id. So for the above example, the result of this query would be
account_id | max_diff
1 12
2 4
I'm not sure how to do this. Is this even possible to do in a MySQL query?
I can do datediff(max(call_date),min(call_date)) but this would ignore dates in between the first and last call dates. I need some way of getting the datediff() between each successive call_date for each account_id, then finding the maximum of those.
I'm sure fp's answer will be faster, but just for fun...
SELECT account_id
, MAX(diff) max_diff
FROM
( SELECT x.account_id
, DATEDIFF(MIN(y.call_date),x.call_date) diff
FROM my_table x
JOIN my_table y
ON y.account_id = x.account_id
AND y.call_date > x.call_date
GROUP
BY x.account_id
, x.call_date
) z
GROUP
BY account_id;
CREATE TABLE t
(`account_id` int, `call_date` date)
;
INSERT INTO t
(`account_id`, `call_date`)
VALUES
(1, '2013-06-07'),
(1, '2013-06-09'),
(1, '2013-06-21'),
(2, '2012-05-01'),
(2, '2012-05-02'),
(2, '2012-05-06')
;
select account_id, max(diff) from (
select
account_id,
timestampdiff(day, coalesce(#prev, call_date), call_date) diff,
#prev := call_date
from
t
, (select #prev:=null) v
order by account_id, call_date
) sq
group by account_id
| ACCOUNT_ID | MAX(DIFF) |
|------------|-----------|
| 1 | 12 |
| 2 | 4 |
see it working live in an sqlfiddle
If you have an index on account_id, call_date, then you can do this rather efficiently without variables:
select account_id, max(call_date - prev_call_date) as diff
from (select t.*,
(select t2.call_date
from table t2
where t2.account_id = t.account_id and t2.call_date < t.call_date
order by t2.call_date desc
limit 1
) as prev_call_date
from table t
) t
group by account_id;
Just for educational purposes, doing it with JOIN:
SELECT t1.account_id,
MAX(DATEDIFF(t2.call_date, t1.call_date)) AS max_diff
FROM t t1
LEFT JOIN t t2
ON t2.account_id = t1.account_id
AND t2.call_date > t1.call_date
LEFT JOIN t t3
ON t3.account_id = t1.account_id
AND t3.call_date > t1.call_date
AND t3.call_date < t2.call_date
WHERE t3.account_id IS NULL
GROUP BY t1.account_id
Since you didn't specify, this shows max_diff of NULL for accounts with only 1 call.
SELECT a1.account_id , max(a1.call_date - a2.call_date)
FROM account a2, account a1
WHERE a1.account_id = a2.account_id
AND a1.call_date > a2.call_date
AND NOT EXISTS
(SELECT 1 FROM account a3 WHERE a1.call_date > a3.call_date AND a2.call_date < a3.call_date)
GROUP BY a1.account_id
Which gives :
ACCOUNT_ID MAX(A1.CALL_DATE - A2.CALL_DATE)
1 12
2 4