I'm trying to solve this challenge: https://www.hackerrank.com/challenges/sql-projects/problem.
I tried the following:
SELECT
(SELECT start_date
FROM projects
WHERE
(SELECT DATE_ADD(start_date, INTERVAL -1 DAY)) NOT IN (SELECT start_date FROM projects)
ORDER BY start_date ASC) AS start_date,
(SELECT end_date
FROM projects
WHERE
(SELECT DATE_ADD(end_date, INTERVAL 1 DAY)) NOT IN (SELECT end_date FROM projects)
ORDER BY end_date ASC) AS end_date
FROM projects p
ORDER BY DATEDIFF(end_date, start_date) ASC, start_date ASC
Nonetheless,I got the following error: 'Subquery returns more than 1 row' Despite using the NOT IN operator.
However, when I tried executing only this part of the code:
SELECT start_date
FROM projects p
WHERE (SELECT (DATE_ADD(start_date, INTERVAL -1 DAY)) NOT IN (SELECT start_date FROM projects)
ORDER BY start_date ASC
It worked fine.
What could be the problem?
The two subquery for start _date and end_date could return a different numbers of rows adn any way the db engine not allow so called "parallel query"
in this case you should gets all the date involved and the left join for the subquery
select t1.start_date, t2.end_date
from (
SELECT start_date
FROM projects
WHERE DATE_ADD(start_date, INTERVAL -1 DAY) NOT IN (SELECT start_date FROM projects)
UNION
SELECT end_date
FROM projects
WHERE SELECT DATE_ADD(start_date, INTERVAL -1 DAY) NOT IN (SELECT end_date FROM projects)
) t
left join (
SELECT start_date
FROM projects
WHERE DATE_ADD(start_date, INTERVAL -1 DAY) NOT IN (SELECT start_date FROM projects)
) t1 on t.start_date = t1.start_date
left join (
SELECT end_date
FROM projects
WHERE DATE_ADD(start_date, INTERVAL -1 DAY) NOT IN (SELECT start_date FROM projects)
) t2 on t.start_date = t2.start_date
order by t1.syaty_date
You select project rows. Per project row you select a start date. The query for the this start date looks like this:
(SELECT start_date ... ORDER BY start_date ASC)
Do you really think it is one start_date you are selecting here? Why then the ORDER BY clause? This subquery returns multiple rows and this is why you are getting the error.
This query does not selects one start date, but all start dates for which not exists the previous date in the table. It doesn't even relate to the project row in the main query.
It seems you want to find all start dates that have no predecessor and all end dates that have no follower. These are two data sets you can select from. So the subqueries don't belong in the SELECT clause where you say which columns to select, but in the FROM clause where you say from which data sets to select.
You would then have to join the two sets. The join criteria would be the rows' positions in the ordered data sets (first start date belongs to first end date, second start date belongs to second end date, ...). For this you need a way to number these data rows.
Such a task is easy to solve with ROW_NUMBER. This is only featured since MySQL 8.
SELECT s.start_date, e.end_date
FROM
(
SELECT start_date, ROW_NUMBER() OVER (ORDER BY start_date) AS rn
FROM projects
WHERE start_date - INTERVAL 1 DAY NOT IN (SELECT start_date FROM projects)
) s
JOIN
(
SELECT end_date, ROW_NUMBER() OVER (ORDER BY end_date) AS rn
FROM projects
WHERE start_date + INTERVAL 1 DAY NOT IN (SELECT end_date FROM projects)
) e USING (rn)
ORDER BY s.start_date;
This kind of problem is called gaps & islands. There are other ways to solve this, but I think that above query plainly builds up on yours and is thus easy to understand.
Here is another answer that may explain better what you are doing.
You can:
select
start_date,
end_date,
start_date - interval 1 day as prev_day,
1 as one
from projects;
The select clause contains what you want to select from a projects row. For the first row you will get its start date, end date, its start date minus one day, and a 1 we call "one" here. For the second row you will get its start date (which is probably another start date than the one of the first row), its end date, its start date minus one day, and again a 1 we call "one".
You can
select
(select start_date) as start_date,
(select end_date) as end_date,
(select start_date - interval 1 day) as prev_day,
(select 1) as one
from projects;
which doesn't change anything and only obfuscates things. (This is what you do here: (SELECT DATE_ADD(end_date, INTERVAL 1 DAY)).
You cannot
select
(select start_date from projects) as start_date,
(select end_date from projects) as end_date,
(select start_date - interval 1 day from projects) as prev_day,
(select 1 from projects) as one
from projects;
because here you are not selecting one value for the first project row's start date, but all start dates from the table. Same for its end date etc. of course, same for the second row etc. This is what you are doing here:
SELECT
(SELECT start_date FROM projects ...) AS start_date,
(SELECT end_date FROM projects ...) AS end_date
FROM projects p
and this is why you are getting the error "Subquery returns more than 1 row".
I have question about a MySQL query that is logging error's since updating the MySQL-5.7.
The error is the "only_full_group_by" which is will spoken off on stackoverflow.
In many answers it's stated not to disable this option but improve your sql query.
The query that I'm using is returning the minimum and maximum values of a counter per hour.
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
DATE_FORMAT(date_time, '%H:%i') AS dt
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY YEAR(date_time), MONTH(date_time), DAY(date_time), HOUR(date_time)
as I understand from the error message I'm missing one of the items from the SELECT cause in the GROUP BY cause. But however I restort/remove/add items I'm not getting the result I got before the upgrade to MySQL-5.7.
I tried to subquery the main query to improve the SQL query. But somehow I can't recreate the results.
What is it I'm missing?
MySQL isn't able to determine the functional dependence ... between the expressions in the GROUP BY clause, and the expressions in the SELECT list.
The non-aggregate expression in the SELECT list (DATE_FORMAT(date_time, '%H:%i') includes a minutes component. The GROUP BY clause is going to collapse the rows into groups by just hour. So the value of the minutes is indeterminate... we know it's going to come from some row in the group, but there's no guarantee which one.
(The question reference to ONLY_FULL_GROUP_BY seems to indicate that we've got some understanding of indeterminate values...)
The easiest (fewest) changes fix would be to wrap that expression in a MIN or MAX function.
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, MIN(DATE_FORMAT(t.date_time,'%H:%i')) AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY YEAR(t.date_time)
, MONTH(t.date_time)
, DAY(t.date_time)
, HOUR(t.date_time)
ORDER
BY YEAR(t.date_time)
, MONTH(t.date_time)
, DAY(t.date_time)
, HOUR(t.date_time)
If we want rows returned in a particular order, we should include an ORDER BY clause, and not rely on MySQL-specific extension or behavior of GROUP BY (which may disappear in future releases.)
It's a bit odd to be doing a GROUP BY year, month, day and not including those values in the SELECT list. (It's not invalid to do that, just kind of strange. The conditions in the WHERE clause are guaranteeing that we don't have more than 24 hours span for date_time.
My preference would to do the GROUP BY on the same expression as the non-aggregate in the SELECT list. If I ever needed more than 24 hours, I'd include the date component:
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY
ORDER
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H:00') + INTERVAL 0 DAY
--or--
if we always know it's just one day's worth of date_time, and we only want to return the hour, then we can group by just the hour. The same expression as in the SELECT list.
SELECT MAX(t.counter) AS `max`
, MIN(t.counter) AS `min`
, DATE_FORMAT(t.date_time,'%H:00') AS `dt`
FROM table1 t
WHERE t.date_time >= NOW() - INTERVAL 1 DAY
GROUP
BY DATE_FORMAT(t.date_time,'%H:00')
, DATE_FORMAT(t.date_time,'%Y-%m-%d %H')
ORDER
BY DATE_FORMAT(t.date_time,'%Y-%m-%d %H')
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
YEAR(date_time) AS g_year,
MONTH(date_time)AS g_month,
DAY(date_time) AS g_day,
HOUR(date_time) AS g_hour
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY g_year, g_month, g_day, g_hour
Or you can get rid of redundant data if you always do it for 1 day:
SELECT MAX( counter ) AS max,
MIN( counter ) AS min,
DAY(date_time) AS g_day,
HOUR(date_time) AS g_hour
FROM table1
WHERE date_time >= NOW() - INTERVAL 1 DAY
GROUP BY g_day, g_hour
Good Day,
I am using the following code to calculate the 9 Day Moving average.
SELECT SUM(close)
FROM tbl
WHERE date <= '2002-07-05'
AND name_id = 2
ORDER BY date DESC
LIMIT 9
But it does not work because it first calculates all of the returned fields before the limit is called. In other words it will calculate all the closes before or equal to that date, and not just the last 9.
So I need to calculate the SUM from the returned select, rather than calculate it straight.
IE. Select the SUM from the SELECT...
Now how would I go about doing this and is it very costly or is there a better way?
If you want the moving average for each date, then try this:
SELECT date, SUM(close),
(select avg(close) from tbl t2 where t2.name_id = t.name_id and datediff(t2.date, t.date) <= 9
) as mvgAvg
FROM tbl t
WHERE date <= '2002-07-05' and
name_id = 2
GROUP BY date
ORDER BY date DESC
It uses a correlated subquery to calculate the average of 9 values.
Starting from MySQL 8, you should use window functions for this. Using the window RANGE clause, you can create a logical window over an interval, which is very powerful. Something like this:
SELECT
date,
close,
AVG (close) OVER (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
FROM tbl
WHERE date <= DATE '2002-07-05'
AND name_id = 2
ORDER BY date DESC
For example:
WITH t (date, `close`) AS (
SELECT DATE '2020-01-01', 50 UNION ALL
SELECT DATE '2020-01-03', 54 UNION ALL
SELECT DATE '2020-01-05', 51 UNION ALL
SELECT DATE '2020-01-12', 49 UNION ALL
SELECT DATE '2020-01-13', 59 UNION ALL
SELECT DATE '2020-01-15', 30 UNION ALL
SELECT DATE '2020-01-17', 35 UNION ALL
SELECT DATE '2020-01-18', 39 UNION ALL
SELECT DATE '2020-01-19', 47 UNION ALL
SELECT DATE '2020-01-26', 50
)
SELECT
date,
`close`,
COUNT(*) OVER w AS c,
SUM(`close`) OVER w AS s,
AVG(`close`) OVER w AS a
FROM t
WINDOW w AS (ORDER BY date DESC RANGE INTERVAL 9 DAY PRECEDING)
ORDER BY date DESC
Leading to:
date |close|c|s |a |
----------|-----|-|---|-------|
2020-01-26| 50|1| 50|50.0000|
2020-01-19| 47|2| 97|48.5000|
2020-01-18| 39|3|136|45.3333|
2020-01-17| 35|4|171|42.7500|
2020-01-15| 30|4|151|37.7500|
2020-01-13| 59|5|210|42.0000|
2020-01-12| 49|6|259|43.1667|
2020-01-05| 51|3|159|53.0000|
2020-01-03| 54|3|154|51.3333|
2020-01-01| 50|3|155|51.6667|
Use something like
SELECT
sum(close) as sum,
avg(close) as average
FROM (
SELECT
(close)
FROM
tbl
WHERE
date <= '2002-07-05'
AND name_id = 2
ORDER BY
date DESC
LIMIT 9 ) temp
The inner query returns all filtered rows in desc order, and then you avg, sum up those rows returned.
The reason why the query given by you doesn't work is due to the fact that the sum is calculated first and the LIMIT clause is applied after the sum has already been calculated, giving you the sum of all the rows present
an other technique is to do a table:
CREATE TABLE `tinyint_asc` (
`value` tinyint(3) unsigned NOT NULL default '0',
PRIMARY KEY (value)
) ;
INSERT INTO `tinyint_asc` VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41),(42),(43),(44),(45),(46),(47),(48),(49),(50),(51),(52),(53),(54),(55),(56),(57),(58),(59),(60),(61),(62),(63),(64),(65),(66),(67),(68),(69),(70),(71),(72),(73),(74),(75),(76),(77),(78),(79),(80),(81),(82),(83),(84),(85),(86),(87),(88),(89),(90),(91),(92),(93),(94),(95),(96),(97),(98),(99),(100),(101),(102),(103),(104),(105),(106),(107),(108),(109),(110),(111),(112),(113),(114),(115),(116),(117),(118),(119),(120),(121),(122),(123),(124),(125),(126),(127),(128),(129),(130),(131),(132),(133),(134),(135),(136),(137),(138),(139),(140),(141),(142),(143),(144),(145),(146),(147),(148),(149),(150),(151),(152),(153),(154),(155),(156),(157),(158),(159),(160),(161),(162),(163),(164),(165),(166),(167),(168),(169),(170),(171),(172),(173),(174),(175),(176),(177),(178),(179),(180),(181),(182),(183),(184),(185),(186),(187),(188),(189),(190),(191),(192),(193),(194),(195),(196),(197),(198),(199),(200),(201),(202),(203),(204),(205),(206),(207),(208),(209),(210),(211),(212),(213),(214),(215),(216),(217),(218),(219),(220),(221),(222),(223),(224),(225),(226),(227),(228),(229),(230),(231),(232),(233),(234),(235),(236),(237),(238),(239),(240),(241),(242),(243),(244),(245),(246),(247),(248),(249),(250),(251),(252),(253),(254),(255);
After you can used it like that:
select
date_add(tbl.date, interval tinyint_asc.value day) as mydate,
count(*),
sum(myvalue)
from tbl inner
join tinyint_asc.value <= 30 -- for a 30 day moving average
where date( date_add(o.created_at, interval tinyint_asc.value day ) ) between '2016-01-01' and current_date()
group by mydate
This query is fast:
select date, name_id,
case #i when name_id then #i:=name_id else (#i:=name_id)
and (#n:=0)
and (#a0:=0) and (#a1:=0) and (#a2:=0) and (#a3:=0) and (#a4:=0) and (#a5:=0) and (#a6:=0) and (#a7:=0) and (#a8:=0)
end as a,
case #n when 9 then #n:=9 else #n:=#n+1 end as n,
#a0:=#a1,#a1:=#a2,#a2:=#a3,#a3:=#a4,#a4:=#a5,#a5:=#a6,#a6:=#a7,#a7:=#a8,#a8:=close,
(#a0+#a1+#a2+#a3+#a4+#a5+#a6+#a7+#a8)/#n as av
from tbl,
(select #i:=0, #n:=0,
#a0:=0, #a1:=0, #a2:=0, #a3:=0, #a4:=0, #a5:=0, #a6:=0, #a7:=0, #a8:=0) a
where name_id=2
order by name_id, date
If you need an average over 50 or 100 values, it's tedious to write, but
worth the effort. The speed is close to the ordered select.