MYSQL Average temperature over the last 6 months - mysql

I want to calculate the average temperature over the last 6 months.
For now, I have something like this:
SELECT AVG(`temp`) FROM `temperature` WHERE YEAR(date) = 2020 AND MONTH(date) = 1 AND `sensor_id` = "00000b858c95"
Returns the average temperature for me in the selected month ... Is this correct?
My table:
CREATE TABLE `temperature` (
`id` int(11) NOT NULL,
`sensor_id` varchar(32) NOT NULL,
`temp` varchar(32) NOT NULL,
`date` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
--
-- Zrzut danych tabeli `temperature`
--
INSERT INTO `temperature` (`id`, `sensor_id`, `temp`, `date`) VALUES
(1, '00000b845b2b', '38.3', '2019-12-05 20:42:06'),
(2, '00000b858c95', '-1.3', '2019-12-05 20:42:06'),
(3, '00000a035951', '24.7', '2019-12-05 20:42:06'),
(4, '00000b845b2b', '38.4', '2019-12-05 20:43:06'),
(5, '00000b858c95', '-1.2', '2019-12-05 20:43:06'),
(6, '00000a035951', '24.7', '2019-12-05 20:43:06'),
(7, '00000b845b2b', '38.4', '2019-12-05 20:44:06'),
(8, '00000b858c95', '-1.2', '2019-12-05 20:44:06'),
(9, '00000a035951', '24.7', '2019-12-05 20:44:06'),
(10, '00000b845b2b', '38.4', '2019-12-05 20:45:05'),
(11, '00000b858c95', '-1.2', '2019-12-05 20:45:05'),
(12, '00000a035951', '24.7', '2019-12-05 20:45:05'),
(13, '00000b845b2b', '38.5', '2019-12-05 20:46:06'),
(14, '00000b858c95', '-1.3', '2019-12-05 20:46:06'),
(15, '00000a035951', '24.7', '2019-12-05 20:46:06'),
(16, '00000b845b2b', '38.6', '2019-12-05 20:47:06'),
(17, '00000b858c95', '-1.3', '2019-12-05 20:47:06'),
(18, '00000a035951', '24.8', '2019-12-05 20:47:06'),
(19, '00000b845b2b', '38.7', '2019-12-05 20:48:06'),
(20, '00000b858c95', '-1.3', '2019-12-05 20:48:06'),
(21, '00000a035951', '24.9', '2019-12-05 20:48:06'),
(22, '00000b845b2b', '39.1', '2019-12-05 21:00:05'),
(23, '00000b858c95', '-1.4', '2019-12-05 21:00:05'),
(24, '00000a035951', '25.9', '2019-12-05 21:00:05'),
(25, '00000b845b2b', '37.9', '2019-12-05 22:00:06'),
(26, '00000b858c95', '-1.4', '2019-12-05 22:00:06'),
....
I want it to return 6 results for the last 6 months, one for each month.

I want to calculate the average temperature over the last 6 months.
You seem to want simple date arithmetics:
SELECT AVG(temp) FROM temperature WHERE date >= current_date - interval 6 month
For today 2020-01-18, this would select records from 2019-07-17.
Or, if you want the current month and the last 6 months (ie, for today, starting on 2019-07-01):
WHERE date >= date_format(current_date, '%Y-%m-01') - interval 6 month
If you want one record per month over the last 6 month, then you need to group by month, like so:
SELECT DATE_FORMAT(date, '%Y-%m-01') date_month, AVG(temp) avg_temp
FROM temperature
WHERE date >= date_format(current_date, '%Y-%m-01') - interval 6 month
GROUP BY DATE_FORMAT(date, '%Y-%m-01')

Related

How do I build a query to get the latest row per user where a third criteria is in a separate table?

I have three tables
CREATE TABLE `LineItems` (
`LineItemID` int NOT NULL,
`OrderID` int NOT NULL,
`ProductID` int NOT NULL
);
INSERT INTO `LineItems` (`LineItemID`, `OrderID`, `ProductID`) VALUES
(1, 1, 2),
(2, 1, 1),
(3, 2, 3),
(4, 2, 4),
(5, 3, 1),
(6, 4, 2),
(7, 5, 4),
(8, 5, 2),
(9, 5, 3),
(10, 6, 1),
(11, 6, 4),
(12, 7, 4),
(13, 7, 1),
(14, 7, 2),
(15, 8, 1),
(16, 9, 3),
(17, 9, 4),
(18, 10, 3);
CREATE TABLE `Orders` (
`OrderID` int NOT NULL,
`UserID` int NOT NULL,
`OrderDate` datetime NOT NULL
);
INSERT INTO `Orders` (`OrderID`, `UserID`, `OrderDate`) VALUES
(1, 21, '2021-05-01 00:00:00'),
(2, 21, '2021-05-03 00:00:00'),
(3, 24, '2021-05-06 00:00:00'),
(4, 23, '2021-05-12 00:00:00'),
(5, 21, '2021-05-14 00:00:00'),
(6, 22, '2021-05-16 00:00:00'),
(7, 23, '2021-05-20 00:00:00'),
(8, 21, '2021-05-22 00:00:00'),
(9, 24, '2021-05-23 00:00:00'),
(10, 23, '2021-05-26 00:00:00');
CREATE TABLE `Products` (
`ProductID` int NOT NULL,
`ProductTitle` VARCHAR(250) NOT NULL,
`ProductType` enum('doors','windows','flooring') NOT NULL
);
INSERT INTO `Products` (`ProductID`, `ProductTitle`, `ProductType`) VALUES
(1, 'French Doors','doors'),
(2, 'Sash Windows','windows'),
(3, 'Sliding Doors','doors'),
(4, 'Parquet Floor','flooring');
SQL Fiddle:
Orders - contains an order date and a user id
LineItems - Foreign key to the orders table, contains product ids that are in the order
Products - Contains details of the products (including if they are a door, window, or flooring)
I have figured out how to get the latest order per user with
SELECT O.* FROM Orders O LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND O.OrderDate < O2.OrderDate
WHERE O2.OrderDate IS NULL;
This works fine and is included in the SQL fiddle, along with a query that returns a complete picture for reference.
I am trying to figure out how to get the latest order with flooring per user, but I'm not having any luck.
In the SQL fiddle linked above, the intended output for what I am after would be
OrderID | UserID | OrderDate
6 | 22 | 2021-05-16T00:00:00Z
5 | 21 | 2021-05-14T00:00:00Z
9 | 24 | 2021-05-23T00:00:00Z
7 | 23 | 2021-05-20T00:00:00Z
EDIT: To clarify, in the intended result, two rows (for users 21 and 23) are different than in the query that gets just latest order per user. This is because order IDs 8 and 10 (from the latest order per user query) do not include flooring. The intended query has to find the latest order with flooring from each user to return in the result set.
You need to add the LineItems and Products tables to your query to find orders where flooring was purchased:
SELECT DISTINCT O.*
FROM Orders O
LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND
O.OrderDate < O2.OrderDate
INNER JOIN LineItems i
ON i.OrderID = O.OrderID
INNER JOIN Products p
ON p.ProductID = i.ProductID
WHERE O2.OrderDate IS NULL AND
p.ProductType = 'flooring'
db<>fiddle here

Find users with activities in all the last 6 months

I'm looking for the best solution on retrieving the list of users ID with activities in all the last 6 months.
Table structure and data, simplified, is the following:
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
I need to retrieve the list of users that has at least one activity for each one of the last 6 months. In the previous example just client_id 1
I thought doing a join, but it seems too expensive. I won't give you any idea on possible solutions in order not to deviate and accept whatever you have in mind.
Please, consider that I have to manage a really big data source (more then 50 millions rows).
Any quick idea?
I make no claims for the supremacy of this solution, partly because I find such requests disingenuous, but it should work, at least...
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
SELECT a.client_id
FROM activities a
WHERE a.created_at >= LAST_DAY(CURDATE() - INTERVAL 7 MONTH)+INTERVAL 1 DAY
GROUP
BY a.client_id
HAVING COUNT(DISTINCT(DATE_FORMAT(a.created_at,'%Y-%m'))) >= 6;
+-----------+
| client_id |
+-----------+
| 1 |
+-----------+

Select rows grouped by a column having max aggregate

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref11#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.
I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10#test.org | 2
ref11#test.org | 2
For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;
MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref30#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
select email, CountDecision
from (
SELECT er.email, COUNT(fd.Decision) AS CountDecision
, max(COUNT(fd.Decision)) over() maxCountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
) d
where CountDecision = maxCountDecision
email | CountDecision
:------------- | ------------:
ref10#test.org | 2
dbfiddle here

Count number of rows in each day grouped by another field

For the purpose of drawing an activity chart, how can we count number of rows for each type (distinct field value) in each day?
Consider a table with a date field and a field for each type:
CREATE TABLE TableName
(`PK` int, `type` varchar(1), `timestamp` datetime)
;
INSERT INTO TableName
(`PK`, `type`, `timestamp`)
VALUES
(11, 'Q', '2013-01-04 22:23:56'),
(7, 'A', '2013-01-03 22:23:41'),
(8, 'C', '2013-01-04 22:23:42'),
(10, 'Q', '2013-01-05 22:23:56'),
(5, 'C', '2013-01-03 22:23:25'),
(12, 'Q', '2013-01-05 22:23:57'),
(6, 'Q', '2013-01-07 22:23:40'),
(4, 'Q', '2013-01-02 22:23:23'),
(9, 'A', '2013-01-05 22:23:55'),
(1, 'A', '2013-01-08 21:29:38'),
(2, 'Q', '2013-01-02 21:31:59'),
(3, 'C', '2013-01-04 21:32:22')
;
For example output can be (last field is the count of rows with that type and in that day):
'Q', 2013-01-04, 1
'C', 2013-01-04, 2
'A', 2013-01-03, 1
'C', 2013-01-03, 2
and so on...
You just need a group by.
select `type`, date(`timestamp`), count(*)
from tableName
group by `type`, date(`timestamp`)
select `type`, date(`timestamp`) as the_date, count(*) as counter
from MyTable
group by `type`, date(`timestamp`)

SQL getting shifts outside of availability

I'm trying to put together an sql query to get employee shifts that are outside of their availability for a scheduling app. Availability entries will be contiguous and will never have availability entries that are back-to-back for the same employee, nor will there be availability entries that overlap for the same employee.
Basically, I need to get the shift rows where (availabilities.start <= shifts.start AND availabilities.end >= shifts.end) does NOT hold true. Phrased another way, I need to get the rows from the shifts table that are not fully contained by an availability entry.
It needs to account for these possibilities:
Shifts that start before availability
Shifts that end after availability
Shifts that do not have any availability during the shift
I'm ok with using a stored procedure instead of a query if this would be more efficient.
Here's what the tables look like:
CREATE TABLE availabilities (`id` int primary key, `employee_id` int, `start` datetime, `end` datetime);
CREATE TABLE shifts (`id` int primary key, `employee_id` int, `start` datetime, `end` datetime);
Here is some sample data:
INSERT INTO availabilities
(`employee_id`, `start`, `end`)
VALUES
(1, '2015-01-01 08:00:00', '2015-01-01 09:00:00'),
(1, '2015-01-02 08:00:00', '2015-01-02 10:00:00'),
(2, '2015-01-03 08:00:00', '2015-01-03 14:00:00'),
(2, '2015-01-04 08:00:00', '2015-01-04 18:00:00')
;
INSERT INTO shifts
(`employee_id`, `start`, `end`)
VALUES
(1, '2015-01-01 08:00:00', '2015-01-01 09:00:00'),
(1, '2015-01-02 08:30:00', '2015-01-02 10:00:00'),
(1, '2015-01-02 10:30:00', '2015-01-02 12:00:00'),
(2, '2015-01-03 08:00:00', '2015-01-03 09:00:00'),
(2, '2015-01-03 09:00:00', '2015-01-03 14:30:00'),
(2, '2015-01-04 09:30:00', '2015-01-04 17:30:00'),
(2, '2015-01-05 08:00:00', '2015-01-05 10:00:00')
;
I would expect the 3rd, 5th and 7th shifts to be output as they are outside of availability.
I've tried something like the following (as well as many others) however all of them either give false positives or leave out shifts.
SELECT s.* FROM `shifts` AS `s`
LEFT JOIN `availabilities` AS `a` ON `s`.`employee_id` = `a`.`employee_id`
WHERE (NOT(a.start <= s.start AND a.end >= s.end) OR a.id IS NULL);
Does this help?
select *
from shifts as s
where not exists (
select 1
from availabilities as a
where a.start <= s.start AND a.end >= s.end and a.employee_id = s.employee_id
)