Query average with nested subquery - mysql

I cannot figure out how to calculate the running average per customer up until each month.
I tried to write it in one big query using subqueries, and also joins with no luck
Here is the query I tried with a subquery:
SELECT
date_format(z1.ServiceDate, '%y-%b') as months,
(
SELECT
AVG(cc.total) + 1 AS 'avg'
FROM
(
SELECT
z.Customer_ID,
COUNT(z.BookingId) 'total'
from
Orders z
where
YEAR(z.ServiceDate) <= YEAR(z1.months) AND
MONTH(z.ServiceDate) <= MONTH(z1.months)
GROUP BY
z.Customer_ID
) cc
)
from
Orders z1
GROUP BY
YEAR(z1.ServiceDate),
MONTH(z1.ServiceDate)
I also tried to join these two queries with no luck:
SELECT date_format(Orders.ServiceDate, '%y-%b') from Orders
GROUP BY YEAR(Orders.ServiceDate), month(Orders.ServiceDate)
Could not join it with this one:
(
SELECT AVG(cc.total) + 1 AS 'avg' FROM (
SELECT Orders.Customer_ID as 'c',
COUNT(BookingId) 'total' from Orders
where year(Orders.ServiceDate) <= '2019' and month(Orders.ServiceDate)
<= '01'
GROUP BY Orders.Customer_ID
) cc
)
where '2019' and '01' would be taken from the first query.
Here is my test schema:
CREATE TABLE IF NOT EXISTS `orders` (
`BookingId` INT(6) NOT NULL,
`ServiceDate` DATETIME NOT NULL,
`Customer_ID` varchar(1) NOT NULL,
PRIMARY KEY (`BookingId`)
) DEFAULT CHARSET=utf8;
INSERT INTO `orders` (`BookingId`, `ServiceDate`, `Customer_ID`) VALUES
('1', '2019-01-03T12:00:00', '1'),
('2', '2019-01-04T12:00:00', '2'),
('3', '2019-01-12T12:00:00', '2'),
('4', '2019-02-03T12:00:00', '1'),
('5', '2019-02-04T12:00:00', '2'),
('6', '2019-02-012T12:00:00', '3');
I was expecting something like this for all months
month AVG
19-Jan 1.5
19-Feb 2
...
...
The dots is there only to show that there is much many more months in my original dataset.
For January, there was 3 bookings and two Customer_ID's. Therefore the average for bookings up until that month was 1.5. Up until February, There has been 6 bookings, and 3 Customer_IDs. Therefore the new average is 2

Join a subquery that returns the distinct months to the table and aggregate:
SELECT d.month,
COUNT(o.bookingid) / COUNT(DISTINCT o.customer_id) avg
FROM (
SELECT DISTINCT
EXTRACT(YEAR_MONTH FROM servicedate) yearmonth,
DATE_FORMAT(servicedate, '%y-%b') month
FROM orders
) d INNER JOIN orders o
ON EXTRACT(YEAR_MONTH FROM o.servicedate) <= d.yearmonth
GROUP BY d.yearmonth, d.month
See the demo.
Results:
| month | avg |
| ------ | --- |
| 19-Jan | 1.5 |
| 19-Feb | 2 |

Related

How to get percentage of result set for each day?

I am trying to retrieve the percentage of available products at specific merchants over the last 30 days.
Desired result example:
20210504 merchant1 20%
20210504 merchant2 30%
20210505 merchant1 25%
20210505 merchant2 35%
There are 3 tables:
availability (containing availability info for each product and merchant and day)
products (where the manufacturer_id is, that we want to filter for)
merchants (merchant info)
Minimal example: https://www.db-fiddle.com/f/wtnK5R4DWi7Dy6LwLaP4mX/0
This returns the percentage for only one merchant and one day:
-- get percentage of available products per merchant over time
SELECT
m.name AS metric,
t.s AS AMOUNT_AVAILABLE,
count(*) AS AMOUNT_TOTAL,
t.s / count(*) AS percentage
FROM availability p
CROSS JOIN (
SELECT count(*) AS s FROM availability p2
INNER JOIN products mp on p2.SKU = mp.SKU
WHERE
availability = 'sofort lieferbar'
AND date = curdate() - interval 1 day -- testing for one day, but we want a time series
AND mp.MANUFACTURER_ID = 1
-- AND p2.merchant_id = p.merchant_id -- does not work
-- AND merchant_id = 2
-- GROUP BY merchant_id
) t
INNER JOIN products mp on p.SKU = mp.SKU
INNER JOIN merchants m ON m.id = p.MERCHANT_ID
WHERE
p.date = curdate() - interval 1 day
and mp.MANUFACTURER_ID = 1
-- and merchant_id = 2
GROUP BY
merchant_id
Now I am trying to somehow merge the cross join with the from table so I get the info for each merchant and day. How can a cross join be joined with the from table?
Data & Shema:
create table merchants
(
id tinyint unsigned not null
primary key,
name varchar(255) null
);
INSERT INTO merchants (id, name) VALUES (1, 'Amazon');
INSERT INTO merchants (id, name) VALUES (2, 'eBay');
create table availability
(
DATE date not null,
SKU char(10) not null,
merchant_id tinyint unsigned not null,
availability enum ('sofort lieferbar', 'verzögert lieferbar', 'nicht lieferbar', 'außer Handel') null,
constraint DATE
unique (DATE, SKU, merchant_id)
);
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-11', '1', 1, 'sofort lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-11', '1', 2, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-12', '1', 1, 'sofort lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-12', '1', 2, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-13', '1', 1, 'nicht lieferbar');
INSERT INTO test.availability (DATE, SKU, merchant_id, availability) VALUES ('2021-05-13', '1', 2, 'sofort lieferbar');
create table products
(
SKU char(8) not null
primary key,
NAME varchar(255) null,
MANUFACTURER_ID mediumint unsigned null,
updated datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP
);
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('1', 'Sneaker', 1, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('2', 'Ball', 1, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('3', 'Pen', 2, '2021-05-12 02:27:46');
INSERT INTO test.products (SKU, NAME, MANUFACTURER_ID, updated) VALUES ('4', 'Paper', 2, '2021-05-12 02:27:46');
I have written a query which seems to work for the data you have provided. Let me know if there's any issue and I'll see what I can do.
SELECT CONCAT('merchant', t.ID) as merchant,
t.Date,
g.prod_available / t.all_prod_from_merch AS percentage_available
# gets total number of products in time range Date,
FROM (SELECT ID,
COUNT(merchant_ID) AS all_prod_from_merch
FROM merchants m
JOIN availability a
ON m.ID = a.merchant_ID
WHERE Date < CURDATE()
AND Date >= curdate() - INTERVAL 10 DAY
GROUP BY merchant_ID,
Date ) t
LEFT JOIN (SELECT merchant_ID,
Date,
COUNT(merchant_ID) AS prod_available
FROM availability
WHERE AVAILABILITY = 'sofort lieferbar'
AND date IN (SELECT Date
FROM availability
WHERE date < CURDATE()
AND date >= CURDATE() - INTERVAL 10 DAY
GROUP BY Date )
GROUP BY merchant_ID,
Date ) g
ON g.merchant_ID = t.ID
AND g.Date = t.Date
ORDER BY t.date;
The first select in the join gets the total number of products in the time range for each merchant. The second one gets those available from each merchant. So the select at the beginning just does the fraction.

Finding the entry with the most occurrences per group

I have the following (simplified) Schema.
CREATE TABLE TEST_Appointment(
Appointment_id INT AUTO_INCREMENT PRIMARY KEY,
Property_No INT NOT NULL,
Property_Type varchar(10) NOT NULL
);
INSERT INTO TEST_Appointment(Property_No, Property_Type) VALUES
(1, 'House'),
(1, 'House'),
(1, 'House'),
(2, 'Flat'),
(2, 'Flat'),
(3, 'Flat'),
(4, 'House'),
(5, 'House'),
(6, 'Studio');
I am trying to write a query to get the properties that have the most appointments in each property type group. An example output would be:
Property_No | Property_Type | Number of Appointments
-----------------------------------------------------
1 | House | 3
2 | Flat | 2
6 | Studio | 1
I have the following query to get the number of appointments per property but I am not sure how to go from there
SELECT Property_No, Property_Type, COUNT(*)
from TEST_Appointment
GROUP BY Property_Type, Property_No;
If you are running MySQL 8.0, you can use aggregation and window functions:
select *
from (
select property_no, property_type, count(*) no_appointments,
rank() over(partition by property_type order by count(*) desc) rn
from test_appointment
group by property_no, property_type
) t
where rn = 1
In earlier versions, one option uses a having clause and a row-limiting correlated subquery:
select property_no, property_type, count(*) no_appointments
from test_appointment t
group by property_no, property_type
having count(*) = (
select count(*)
from test_appointment t1
where t1.property_type = t.property_type
group by t1.property_no
order by count(*) desc
limit 1
)
Note that both queries allow ties, if any.

How can I pivot an average in mysql

I am trying to display all months where I have bookings, and an running average for bookings/customer for each month, but can't seem to understand how this is achieved and I don't understand why my query is erroring.
I tried several approaches, one of which is the approach of combining two queries, the other is writing it all in one query
The first query returns all months where we have orders:
SELECT date_format(Orders.ServiceDate, '%y-%b') from Orders
GROUP BY YEAR(Orders.ServiceDate), month(Orders.ServiceDate)
The second query is calculating an average of bookings for per customers up until a month:
(
SELECT AVG(cc.total) + 1 AS 'avg' FROM (
SELECT Orders.Customer_ID as 'c',
COUNT(BookingId) 'total' from Orders
where year(Orders.ServiceDate) <= '2019' and month(Orders.ServiceDate)
<= '01'
GROUP BY Orders.Customer_ID
) cc
)
The last queriy is giving me a single number, which is the average for average bookings per customer up until Jan, 2019, but I need the averages for all the months from the first query.
But I need the year and month to be taken from the first query so I get the average for each month, ending up showing something like:
19-Jan 1.5
19-Feb 2
...
...
I tried joining them without luck, so I hope there is a kind soul who can help me further.
The second thing I tried was to do it without joining to queries like this:
SELECT
date_format(z1.ServiceDate, '%y-%b') as months,
(
SELECT
AVG(cc.total) + 1 AS 'avg'
FROM
(
SELECT
z.Customer_ID,
COUNT(z.BookingId) 'total'
from
Orders z
where
YEAR(z.ServiceDate) <= YEAR(z1.months) AND
MONTH(z.ServiceDate) <= MONTH(z1.months)
GROUP BY
z.Customer_ID
) cc
)
from
Orders z1
GROUP BY
YEAR(z1.ServiceDate),
MONTH(z1.ServiceDate)
Here is my schema:
CREATE TABLE IF NOT EXISTS `orders` (
`BookingId` INT(6) NOT NULL,
`ServiceDate` DATETIME NOT NULL,
`Customer_ID` varchar(1) NOT NULL,
PRIMARY KEY (`BookingId`)
) DEFAULT CHARSET=utf8;
INSERT INTO `orders` (`BookingId`, `ServiceDate`, `Customer_ID`) VALUES
('1', '2019-01-03T12:00:00', '1'),
('2', '2019-01-04T12:00:00', '2'),
('3', '2019-01-12T12:00:00', '2'),
('4', '2019-02-03T12:00:00', '1'),
('5', '2019-02-04T12:00:00', '2'),
('6', '2019-02-012T12:00:00', '3');
I was expecting to see two averages, one where we only include up until Jan, and one where feb is included, but I keep getting the error:
"Unknown column 'z1.months' in 'where clause".
How can I make this query work?

Ordering within a MySQL group

I have two tables which are joined - one holds schedules and the other holds actual worked times.
This works fine if a given user only has a single schedule on a day but when they have more than one schedule I cannot get the query to match up the "right" slot to the right time.
I am beginning to think the only way to do this is to allocate the time to the schedule when the clock event happens but that is going to be a big rewrite so I am hoping there is a way in MySQL.
As this is inside a third party application, I am limited in what I can do to the query - I can modify the basics like from, group, joins etc and I can add aggregates to the fields (I have toyed with using min/max on the times). However, if the only way is to write a hugely complex query especially within the field selections then this system simply doesn't give me that option.
Schedule table:
CREATE TABLE `schedule` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `schedule`
--
INSERT INTO `schedule` (`id`, `user_id`, `date`, `start_time`, `end_time`) VALUES
(1, 1, '2019-07-07', '08:00:00', '12:00:00'),
(2, 1, '2019-07-07', '16:00:00', '22:00:00'),
(3, 1, '2019-07-06', '10:00:00', '18:00:00');
Time table
CREATE TABLE `time` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`date` date NOT NULL,
`start_time` time NOT NULL,
`end_time` time NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `time`
--
INSERT INTO `time` (`id`, `user_id`, `date`, `start_time`, `end_time`) VALUES
(1, 1, '2019-07-07', '08:00:00', '12:00:00'),
(2, 1, '2019-07-07', '16:00:00', '22:00:00'),
(3, 1, '2019-07-06', '10:00:00', '18:00:00');
Current query
select
t.date as date, t.user_id,
s.start_time as schedule_start,
s.end_time as schedule_end,
t.start_time as actual_start,
t.end_time as actual_end
from time t
left join schedule s on
t.user_id=s.user_id and t.date=s.date
group by t.date, t.start_time
Current output
== Dumping data for table s
|2019-07-06|1|10:00:00|18:00:00|10:00:00|18:00:00
|2019-07-07|1|08:00:00|12:00:00|08:00:00|12:00:00
|2019-07-07|1|08:00:00|12:00:00|16:00:00|22:00:00
Desired output
== Dumping data for table s
|2019-07-06|1|10:00:00|18:00:00|10:00:00|18:00:00
|2019-07-07|1|08:00:00|12:00:00|08:00:00|12:00:00
|2019-07-07|1|16:00:00|22:00:00|16:00:00|22:00:00
Is this possible to achieve?
I would try something like this.
I selected 15 min time limit that a shift should start
select
t.date as date, t.user_id,
s.start_time as schedule_start,
s.end_time as schedule_end,
t.start_time as actual_start,
t.end_time as actual_end
from time t
left join schedule s on
t.user_id=s.user_id and t.date=s.date
and s.start_time BETWEEN t.start_time - INTERVAL 15 MINUTE
AND t.start_time + INTERVAL 15 MINUTE
order by date,schedule_start;
Grouping would you do be add up time for every day and user day
You need a much more complicated query to distinguish the 2 shifts.
So you must execute 2 separate queries each for each shift and combine them with UNION:
select
s.date, s.user_id,
s.schedule_start,
s.schedule_end,
t.actual_start,
t.actual_end
from (
select s.date, s.user_id,
min(s.start_time) as schedule_start,
min(s.end_time) as schedule_end
from schedule s
group by s.date, s.user_id
) s left join (
select t.date, t.user_id,
min(t.start_time) as actual_start,
min(t.end_time) as actual_end
from time t
group by t.date, t.user_id
) t on t.user_id=s.user_id and t.date=s.date
union
select
s.date, s.user_id,
s.schedule_start,
s.schedule_end,
t.actual_start,
t.actual_end
from (
select s.date, s.user_id,
max(s.start_time) as schedule_start,
max(s.end_time) as schedule_end
from schedule s
group by s.date, s.user_id
) s left join (
select t.date, t.user_id,
max(t.start_time) as actual_start,
max(t.end_time) as actual_end
from time t
group by t.date, t.user_id
) t on t.user_id=s.user_id and t.date=s.date
See the demo.
Results:
> date | user_id | schedule_start | schedule_end | actual_start | actual_end
> :--------- | ------: | :------------- | :----------- | :----------- | :---------
> 2019-07-06 | 1 | 10:00:00 | 18:00:00 | 10:00:00 | 18:00:00
> 2019-07-07 | 1 | 08:00:00 | 12:00:00 | 08:00:00 | 12:00:00
> 2019-07-07 | 1 | 16:00:00 | 22:00:00 | 16:00:00 | 22:00:00

INSERT SELECT ON DUPLICATE not updating

Short
I want to SUM a column in TABLE_A based on CRITERIA X and insert into TABLE_B.total_x
I want to SUM a column in TABLE_A based on CRITERIA Y and insert into TABLE_B.total_y
Problem: Step 2 does not update TABLE_B.total_y
LONG
TABLE_A: Data
| year | month | type | total |
---------------------------------------
| 2013 | 11 | down | 100 |
| 2013 | 11 | down | 50 |
| 2013 | 11 | up | 60 |
| 2013 | 10 | down | 200 |
| 2013 | 10 | up | 15 |
| 2013 | 10 | up | 9 |
TABLE_B: structure
CREATE TABLE `TABLE_B` (
`year` INT(4) NULL DEFAULT NULL,
`month` INT(2) UNSIGNED ZEROFILL NULL DEFAULT NULL,
`total_x` INT(10) NULL DEFAULT NULL,
`total_y` INT(10) NULL DEFAULT NULL,
UNIQUE INDEX `unique` (`year`, `month`)
)
SQL: CRITERIA_X
INSERT INTO TABLE_B (
`year`, `month`, `total_x`
)
SELECT
t.`year`, t.`month`,
SUM(t.`total`) as total_x
FROM TABLE_A t
WHERE
t.`type` = 'down'
GROUP BY
t.`year`, t.`month`
ON DUPLICATE KEY UPDATE
`total_x` = total_x
;
SQL: CRITERIA_Y
INSERT INTO TABLE_B (
`year`, `month`, `total_y`
)
SELECT
t.`year`, t.`month`,
SUM(t.`total`) as total_y
FROM TABLE_A t
WHERE
t.`type` = 'up'
GROUP BY
t.`year`, t.`month`
ON DUPLICATE KEY UPDATE
`total_y` = total_y
;
The second SQL (CRITERIA_Y) does not update total_y as expected. WHY?
I would do it another way
insert into TABLE_B (year, month, total_x, total_y)
select year, month
, sum (case [type] when 'down' then [total] else 0 end) [total_x]
, sum (case [type] when 'up' then [total] else 0 end) [total_y]
from TABLE_A
group by [year], [month]
Or using two subqueries way would be
insert into TABLE_B (year, month, total_x, total_y)
select coalesce(t1.year, t2.year) year
, coalesce(t1.month, t2.month) month
, t1.total_x total_x
, t2.total_y total_y
from (select year, month, sum(total) total_x
from TABLE_A where [type]='down') t1
full outer join
(select year, month, sum(total) total_y
from TABLE_A where [type]='up') t2
on t1.year = t2.year and t1.month = t2.month
Or using union
insert into TABLE_B (year, month, total_x, total_y)
select year, month, sum(total_x), sum(total_y)
from (
select year, month, sum(total) total_x, 0 total_y
from TABLE_A where [type]='down'
group by year, month
union
select year, month, 0 total_x, sum(total) total_y
from TABLE_A where [type]='up'
group by year, month) t
group by year, month
Reading specs on INSERT...ON DUPLICATE KEY UPDATE, I noticed this:
If ... matches several rows, only one row is updated. In general, you should try to avoid using an ON DUPLICATE KEY UPDATE clause on tables with multiple unique indexes.
So syntax with composite key is kind of cumbersome, and I personally would avoid using it.