Find users with activities in all the last 6 months - mysql

I'm looking for the best solution on retrieving the list of users ID with activities in all the last 6 months.
Table structure and data, simplified, is the following:
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
I need to retrieve the list of users that has at least one activity for each one of the last 6 months. In the previous example just client_id 1
I thought doing a join, but it seems too expensive. I won't give you any idea on possible solutions in order not to deviate and accept whatever you have in mind.
Please, consider that I have to manage a really big data source (more then 50 millions rows).
Any quick idea?

I make no claims for the supremacy of this solution, partly because I find such requests disingenuous, but it should work, at least...
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
SELECT a.client_id
FROM activities a
WHERE a.created_at >= LAST_DAY(CURDATE() - INTERVAL 7 MONTH)+INTERVAL 1 DAY
GROUP
BY a.client_id
HAVING COUNT(DISTINCT(DATE_FORMAT(a.created_at,'%Y-%m'))) >= 6;
+-----------+
| client_id |
+-----------+
| 1 |
+-----------+

Related

How do I build a query to get the latest row per user where a third criteria is in a separate table?

I have three tables
CREATE TABLE `LineItems` (
`LineItemID` int NOT NULL,
`OrderID` int NOT NULL,
`ProductID` int NOT NULL
);
INSERT INTO `LineItems` (`LineItemID`, `OrderID`, `ProductID`) VALUES
(1, 1, 2),
(2, 1, 1),
(3, 2, 3),
(4, 2, 4),
(5, 3, 1),
(6, 4, 2),
(7, 5, 4),
(8, 5, 2),
(9, 5, 3),
(10, 6, 1),
(11, 6, 4),
(12, 7, 4),
(13, 7, 1),
(14, 7, 2),
(15, 8, 1),
(16, 9, 3),
(17, 9, 4),
(18, 10, 3);
CREATE TABLE `Orders` (
`OrderID` int NOT NULL,
`UserID` int NOT NULL,
`OrderDate` datetime NOT NULL
);
INSERT INTO `Orders` (`OrderID`, `UserID`, `OrderDate`) VALUES
(1, 21, '2021-05-01 00:00:00'),
(2, 21, '2021-05-03 00:00:00'),
(3, 24, '2021-05-06 00:00:00'),
(4, 23, '2021-05-12 00:00:00'),
(5, 21, '2021-05-14 00:00:00'),
(6, 22, '2021-05-16 00:00:00'),
(7, 23, '2021-05-20 00:00:00'),
(8, 21, '2021-05-22 00:00:00'),
(9, 24, '2021-05-23 00:00:00'),
(10, 23, '2021-05-26 00:00:00');
CREATE TABLE `Products` (
`ProductID` int NOT NULL,
`ProductTitle` VARCHAR(250) NOT NULL,
`ProductType` enum('doors','windows','flooring') NOT NULL
);
INSERT INTO `Products` (`ProductID`, `ProductTitle`, `ProductType`) VALUES
(1, 'French Doors','doors'),
(2, 'Sash Windows','windows'),
(3, 'Sliding Doors','doors'),
(4, 'Parquet Floor','flooring');
SQL Fiddle:
Orders - contains an order date and a user id
LineItems - Foreign key to the orders table, contains product ids that are in the order
Products - Contains details of the products (including if they are a door, window, or flooring)
I have figured out how to get the latest order per user with
SELECT O.* FROM Orders O LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND O.OrderDate < O2.OrderDate
WHERE O2.OrderDate IS NULL;
This works fine and is included in the SQL fiddle, along with a query that returns a complete picture for reference.
I am trying to figure out how to get the latest order with flooring per user, but I'm not having any luck.
In the SQL fiddle linked above, the intended output for what I am after would be
OrderID | UserID | OrderDate
6 | 22 | 2021-05-16T00:00:00Z
5 | 21 | 2021-05-14T00:00:00Z
9 | 24 | 2021-05-23T00:00:00Z
7 | 23 | 2021-05-20T00:00:00Z
EDIT: To clarify, in the intended result, two rows (for users 21 and 23) are different than in the query that gets just latest order per user. This is because order IDs 8 and 10 (from the latest order per user query) do not include flooring. The intended query has to find the latest order with flooring from each user to return in the result set.
You need to add the LineItems and Products tables to your query to find orders where flooring was purchased:
SELECT DISTINCT O.*
FROM Orders O
LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND
O.OrderDate < O2.OrderDate
INNER JOIN LineItems i
ON i.OrderID = O.OrderID
INNER JOIN Products p
ON p.ProductID = i.ProductID
WHERE O2.OrderDate IS NULL AND
p.ProductType = 'flooring'
db<>fiddle here

MYSQL Average temperature over the last 6 months

I want to calculate the average temperature over the last 6 months.
For now, I have something like this:
SELECT AVG(`temp`) FROM `temperature` WHERE YEAR(date) = 2020 AND MONTH(date) = 1 AND `sensor_id` = "00000b858c95"
Returns the average temperature for me in the selected month ... Is this correct?
My table:
CREATE TABLE `temperature` (
`id` int(11) NOT NULL,
`sensor_id` varchar(32) NOT NULL,
`temp` varchar(32) NOT NULL,
`date` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
--
-- Zrzut danych tabeli `temperature`
--
INSERT INTO `temperature` (`id`, `sensor_id`, `temp`, `date`) VALUES
(1, '00000b845b2b', '38.3', '2019-12-05 20:42:06'),
(2, '00000b858c95', '-1.3', '2019-12-05 20:42:06'),
(3, '00000a035951', '24.7', '2019-12-05 20:42:06'),
(4, '00000b845b2b', '38.4', '2019-12-05 20:43:06'),
(5, '00000b858c95', '-1.2', '2019-12-05 20:43:06'),
(6, '00000a035951', '24.7', '2019-12-05 20:43:06'),
(7, '00000b845b2b', '38.4', '2019-12-05 20:44:06'),
(8, '00000b858c95', '-1.2', '2019-12-05 20:44:06'),
(9, '00000a035951', '24.7', '2019-12-05 20:44:06'),
(10, '00000b845b2b', '38.4', '2019-12-05 20:45:05'),
(11, '00000b858c95', '-1.2', '2019-12-05 20:45:05'),
(12, '00000a035951', '24.7', '2019-12-05 20:45:05'),
(13, '00000b845b2b', '38.5', '2019-12-05 20:46:06'),
(14, '00000b858c95', '-1.3', '2019-12-05 20:46:06'),
(15, '00000a035951', '24.7', '2019-12-05 20:46:06'),
(16, '00000b845b2b', '38.6', '2019-12-05 20:47:06'),
(17, '00000b858c95', '-1.3', '2019-12-05 20:47:06'),
(18, '00000a035951', '24.8', '2019-12-05 20:47:06'),
(19, '00000b845b2b', '38.7', '2019-12-05 20:48:06'),
(20, '00000b858c95', '-1.3', '2019-12-05 20:48:06'),
(21, '00000a035951', '24.9', '2019-12-05 20:48:06'),
(22, '00000b845b2b', '39.1', '2019-12-05 21:00:05'),
(23, '00000b858c95', '-1.4', '2019-12-05 21:00:05'),
(24, '00000a035951', '25.9', '2019-12-05 21:00:05'),
(25, '00000b845b2b', '37.9', '2019-12-05 22:00:06'),
(26, '00000b858c95', '-1.4', '2019-12-05 22:00:06'),
....
I want it to return 6 results for the last 6 months, one for each month.
I want to calculate the average temperature over the last 6 months.
You seem to want simple date arithmetics:
SELECT AVG(temp) FROM temperature WHERE date >= current_date - interval 6 month
For today 2020-01-18, this would select records from 2019-07-17.
Or, if you want the current month and the last 6 months (ie, for today, starting on 2019-07-01):
WHERE date >= date_format(current_date, '%Y-%m-01') - interval 6 month
If you want one record per month over the last 6 month, then you need to group by month, like so:
SELECT DATE_FORMAT(date, '%Y-%m-01') date_month, AVG(temp) avg_temp
FROM temperature
WHERE date >= date_format(current_date, '%Y-%m-01') - interval 6 month
GROUP BY DATE_FORMAT(date, '%Y-%m-01')

mysql - how to delete records with condition

Here is my table structure:
CREATE TABLE `order`
(`order_id` int, `order_status_id` int, `ip` varchar(11), `date_added` datetime)
;
INSERT INTO `order`
(`order_id`, `order_status_id`, `ip`, `date_added`)
VALUES
(1, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(2, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(3, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(4, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(5, 1, '192.168.1.1', '2016-12-07 00:00:00'),
(6, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(7, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(8, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(9, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(10, 1, '192.168.1.2', '2016-12-08 00:00:00'),
(11, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(12, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(13, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(14, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(15, 0, '192.168.1.3', '2016-12-09 00:00:00');
http://sqlfiddle.com/#!9/20c0f
I expect sql erase all records except records where order_id are: 5, 10, 15
Explanations: I would like to erase records which has order_status_id = 0 with conditions:
a) if one of the record (from same ip/date_added order_id) has order_status_id = 1, then erase all records which has order_status_id = 0 (from same ip/date_added). In my example records 1-4 and 6-9 should be deleted.
b) if there are no record from same ip/date_added with order_id = 1 (all records has order_status_id = 0), then leave one record with the highest order_id (all other deleted). In my example records 11-14 should be deleted.
SQL DEMO
First you need separate your condition. You have {ip,date} with order_status_id {0,1}
-- first filter
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 1;
-- second filter
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 0;
First delete
You delete everyone matching the first filter but only those with order_status_id = 0 meaning you left the one with 1
DELETE o
FROM `order` o
INNER JOIN (
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 1
) filter
ON o.`ip` = filter.`ip`
AND o.`date_added` = filter.`date_added`
WHERE o.`order_status_id` = 0;
Second delete
You delete everyone matching second filter, but leave the one who doesnt have any id higher than him.
DELETE o1
FROM `order` o1
INNER JOIN (
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 0
) filter
ON o1.`ip` = filter.`ip`
AND o1.`date_added` = filter.`date_added`
LEFT JOIN `order` o2
ON o1.`order_id` < o2.`order_id`
WHERE o2.`ip` IS NOT NULL;
OUTPUT

SQL getting shifts outside of availability

I'm trying to put together an sql query to get employee shifts that are outside of their availability for a scheduling app. Availability entries will be contiguous and will never have availability entries that are back-to-back for the same employee, nor will there be availability entries that overlap for the same employee.
Basically, I need to get the shift rows where (availabilities.start <= shifts.start AND availabilities.end >= shifts.end) does NOT hold true. Phrased another way, I need to get the rows from the shifts table that are not fully contained by an availability entry.
It needs to account for these possibilities:
Shifts that start before availability
Shifts that end after availability
Shifts that do not have any availability during the shift
I'm ok with using a stored procedure instead of a query if this would be more efficient.
Here's what the tables look like:
CREATE TABLE availabilities (`id` int primary key, `employee_id` int, `start` datetime, `end` datetime);
CREATE TABLE shifts (`id` int primary key, `employee_id` int, `start` datetime, `end` datetime);
Here is some sample data:
INSERT INTO availabilities
(`employee_id`, `start`, `end`)
VALUES
(1, '2015-01-01 08:00:00', '2015-01-01 09:00:00'),
(1, '2015-01-02 08:00:00', '2015-01-02 10:00:00'),
(2, '2015-01-03 08:00:00', '2015-01-03 14:00:00'),
(2, '2015-01-04 08:00:00', '2015-01-04 18:00:00')
;
INSERT INTO shifts
(`employee_id`, `start`, `end`)
VALUES
(1, '2015-01-01 08:00:00', '2015-01-01 09:00:00'),
(1, '2015-01-02 08:30:00', '2015-01-02 10:00:00'),
(1, '2015-01-02 10:30:00', '2015-01-02 12:00:00'),
(2, '2015-01-03 08:00:00', '2015-01-03 09:00:00'),
(2, '2015-01-03 09:00:00', '2015-01-03 14:30:00'),
(2, '2015-01-04 09:30:00', '2015-01-04 17:30:00'),
(2, '2015-01-05 08:00:00', '2015-01-05 10:00:00')
;
I would expect the 3rd, 5th and 7th shifts to be output as they are outside of availability.
I've tried something like the following (as well as many others) however all of them either give false positives or leave out shifts.
SELECT s.* FROM `shifts` AS `s`
LEFT JOIN `availabilities` AS `a` ON `s`.`employee_id` = `a`.`employee_id`
WHERE (NOT(a.start <= s.start AND a.end >= s.end) OR a.id IS NULL);
Does this help?
select *
from shifts as s
where not exists (
select 1
from availabilities as a
where a.start <= s.start AND a.end >= s.end and a.employee_id = s.employee_id
)

How group the employees in ranges of hours?

I have this table to save the time the employees spend doing a routine task.
CREATE TABLE tasks (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(100),
date_task date,
time_ini time,
time_end time
);
I'm trying to group the employees who have at least two time_ini's with a difference < 15 minutes with any employee's time_ini or time_end.
If none of their time_inis meet this condition, then this employee would be grouped alone.
The groups will be numbered from 1 to n.
And then the groups will be ordered by date ascending, and time_ini ascending.
This is an example of data:
(1, "oscar", '2012-01-01', '01:30', '01:32'),
(2, "oscar", '2012-01-01', '02:30', '02:32'),
(3, "oscar", '2012-01-01', '05:30', '05:32'),
(4, "oscar", '2012-01-01', '06:30', '06:32'),
(5, "mario", '2012-01-01', '02:43', '02:43'),
(6, "mario", '2012-01-01', '02:53', '02:53'),
(7, "mario", '2012-01-01', '05:30', '05:30'),
(8, "martah", '2012-01-01', '01:25', '01:28'),
(9, "martah", '2012-01-01', '02:29', '02:41'),
(10, "jesus", '2012-01-01', '01:25', '01:28'),
(11, "jesus", '2012-01-01', '01:25', '02:28'),
(12, "jesus", '2012-01-01', '07:33', '08:32'),
(13, "jesus", '2012-01-01', '07:35', '07:36'),
(14, "jesus", '2012-01-01', '08:36', '08:39'),
(15, "rober", '2012-01-01', '02:43', '02:46'),
(16, "rober", '2012-01-01', '02:56', '03:00'),
(17, "rober", '2012-01-01', '02:29', '11:32'),
(18, "pedro", '2012-01-01', '11:36', '12:46'),
(19, "pedro", '2012-01-01', '12:36', '16:46');
This would be the result:
GROUP NAME
1 oscar
1 marta
1 jesus
2 mario
2 rober
3 pedro
I came up with something like this:
select distinct a.name
from tasks a
where
(select count(id)
from tasks b
where (
MINUTE(TIMEDIFF(a.time_ini, b.time_ini)) < 15 OR
MINUTE(TIMEDIFF(a.time_end, b.time_ini)) < 15
) and
b.name <> a.name) >= 2;
I'm afraid I can't group them this way, but I think I'm not too far to the solution, isn't it?
Any idea, tip or advice will be appreciated, and if you need more info, let me know and I'll edit the post. Is little bit hard to explain...
You can try this (although its not in the format you need it.. it should do the job):
SELECT
a.id as groupId,
a.name as first,
b.name as second,
COUNT(*) as occ
FROM
tasks a,task b
WHERE
b.name <> a.name
AND a.id > b.id
AND (
MINUTE(TIMEDIFF(a.date_ini, b.date_ini)) < 15 OR
MINUTE(TIMEDIFF(a.date_end, b.date_ini)) < 15
)
GROUP BY
groupId,
first,
second
BTW jesus should be in the group with oscar and martha due to records 10 and 11