How to group same table differently in outer query and subquery - mysql

Table looks like this:
id | number | provider| datetime | keyword|country|
1 | 1 | Mobitel |2012-11-05| JAM | RS |
2 | 2 | Telekom |2013-04-25| ASTRO| RS |
3 | 1 | Si.Mobil|2013-04-27| DOMACE| BA |
4 | 4 | Telenor |2013-04-21| BIP | HR |
5 | 7 | VIP |2013-04-18| WIN | CZ |
6 | 13 | VIP |2014-05-21| DOMACE| RS |
7 | 5 | VIP |2014-06-04| WIN | HU |
I need to sum all numbers grouped by keyword and country and to sum all numbers again grouped by keyword, country and provider all in one query.
Here is how I tried to do it:
SELECT (SELECT SUM(number),country, keyword
FROM daily_subscriptions
WHERE datetime >= '2016-02-01 23:59:59'
GROUP BY country, keyword )
num_of_all_subs,
SUM(number) as num_of_subs,
country,
keyword,
provider
FROM daily_subscriptions
WHERE datetime >= '2016-02-01 23:59:59'
GROUP BY country, keyword, provider
But this query throws an error:
#1241 - Operand should contain 1 column(s)
Here is what I expect to get:
id | num_of_all_subs|num_of_subs | provider| datetime | keyword|country|
1 | 19 | 4 | Mobitel |2012-11-05| JAM | RS |
2 | 12 | 5 |Telekom |2013-04-25| ASTRO| RS |
3 | 18 | 1 |Si.Mobil |2013-04-27| DOMACE| BA |
4 | 42 | 21 |Telenor |2013-04-21| BIP | HR |
5 | 76 | 23 |VIP |2013-04-18| WIN | CZ |
6 | 13 | 3 |VIP |2014-05-21| DOMACE| RS |
7 | 53 | 11 |VIP |2014-06-04| WIN | HU |
Field num_of_all_subs meaning that sum of all numbers for lets say JAM(keyword) and RS(country) is 19 , but per Mobitel(provider) is num_of_subs 4 from all 19, since there are other providers for that country and keyword(even though they are not displayed in table schema).
Please help me to extract this data, since I'm stuck.

Your subquery for num_of_all_subs (which is a single number) must only return one column and, next problem, one row. Also, this subquery will be evaluated before you group, while you actually want to first group and get the columns num_of_subs, country, keyword and provider, and, afterwards, add another column num_of_all_subs to that first resultset.
You can do this exactly as just described: first get the grouped subquery (here called details), then use a dependent subquery to get, for each row in that subquery, the value for num_of_all_subs by looking at the table (again) and sum over all rows that have the same provider and country:
SELECT
(SELECT SUM(number)
FROM daily_subscriptions ds
WHERE datetime >= '2016-02-01 23:59:59'
and ds.country = details.country
and ds.keyword = details.keyword
) as num_of_all_subs,
details.*
from
(select
SUM(number) as num_of_subs,
country,
keyword,
provider
FROM daily_subscriptions
WHERE datetime >= '2016-02-01 23:59:59'
GROUP BY country, keyword, provider
) as details;
An alternative would be to do calculate both groups seperately, one including provider (details), and one without (all_subs). One will contain num_of_subs, one will contain num_of_all_subs. The you can combine (join) these two queries when they have the same country and keyword:
SELECT
all_subs.num_of_all_subs,
details.*
from
(select
SUM(number) as num_of_subs,
country,
keyword,
provider
FROM daily_subscriptions
WHERE datetime >= '2016-02-01 23:59:59'
GROUP BY country, keyword, provider
) as details
left join
(SELECT
SUM(number) as num_of_all_subs,
country,
keyword
FROM daily_subscriptions
WHERE datetime >= '2016-02-01 23:59:59'
GROUP BY country, keyword
) as all_subs
on all_subs.keyword = details.keyword and all_subs.country = details.country;
In your case, you can use a join instead of a left join, because every row in the first subquery will have a row in the second subquery, although it's usually the safer way do keep it.
While in theory, MySQL could execute these queries identically (and for less complicated queries, it will actually optimize and treat, whenever possible and useful, dependent subqueries like joins), in current MySQL versions this will most likely not be the case and the 2nd option is probably faster. Anyway, for both versions, a composite index on (country, keyword, provider) will do wonders.

Related

Where is the syntax error in this WITH clause?

I'm trying out this problem on LeetCode. The interface is throwing a syntax error, but for the life of me I can't figure it out. Here's the question:
Table: Stadium
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| visit_date | date |
| people | int |
+---------------+---------+
visit_date is the primary key for this table. Each row of this table
contains the visit date and visit id to the stadium with the number of
people during the visit. No two rows will have the same visit_date,
and as the id increases, the dates increase as well.
Write an SQL query to display the records with three or more rows with
consecutive id's, and the number of people is greater than or equal to
100 for each.
Return the result table ordered by visit_date in ascending order.
The query result format is in the following example.
Stadium table:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 1 | 2017-01-01 | 10 |
| 2 | 2017-01-02 | 109 |
| 3 | 2017-01-03 | 150 |
| 4 | 2017-01-04 | 99 |
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
Result table:
+------+------------+-----------+
| id | visit_date | people |
+------+------------+-----------+
| 5 | 2017-01-05 | 145 |
| 6 | 2017-01-06 | 1455 |
| 7 | 2017-01-07 | 199 |
| 8 | 2017-01-09 | 188 |
+------+------------+-----------+
The four rows with ids 5, 6, 7, and 8 have consecutive ids and each of
them has >= 100 people attended. Note that row 8 was included even
though the visit_date was not the next day after row 7. The rows with
ids 2 and 3 are not included because we need at least three
consecutive ids.
Here's a fiddle with the data (NOTE: I'm still trying to figure out how to insert dates, so I saved the dates as strings instead).
Can anyone spot the syntax error in the query below?
# Write your MySQL query statement below
SET #rowIndex := 0;
WITH s1 as (
SELECT #rowIndex := #rowIndex + 1 as rowIndex, s.*
FROM Stadium as s
WHERE s.people >= 100
GROUP BY s.id
)
SELECT s2.id, s2.visit_date, s2.people
FROM s1 as s2
GROUP BY s2.rowIndex - s2.id, s2.id, s2.visit_date, s2.people
ORDER BY s2.visit_date
Error message:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near 'WITH s1 (rowIndex, id, visit_date, people) as (
SELECT #rowIndex := #rowInde' at line 4
Also, the LeetCode interface uses MySQL v8.0, so I don't think that's the problem.
I was using the query below as a reference. (Original.)
SET #rowIndex := -1;
SELECT ROUND(AVG(t.LAT_N), 4) FROM
(
SELECT #rowIndex := #rowIndex+1 AS rowIndex, s.LAT_N FROM STATION AS s ORDER BY s.LAT_N
) AS t
WHERE t.rowIndex IN (FLOOR(#rowIndex / 2), CEIL(#rowIndex / 2));
Thanks.
Edit:
For future reference, here's the final query I came up with:
# Write your MySQL query statement below
WITH s1 as (
SELECT ROW_NUMBER() OVER (ORDER BY s.id) as rowIndex, s.*
FROM Stadium as s
WHERE s.people >= 100
GROUP BY s.id, s.visit_date, s.people
), s2 as (
SELECT COUNT(s.id) OVER (PARTITION BY s.id-s.rowIndex) as groupSize, s.*
FROM s1 as s
)
SELECT s3.id, s3.visit_date, s3.people
FROM s2 as s3
GROUP BY s3.groupSize, s3.id, s3.visit_date, s3.people
HAVING s3.groupSize >= 3
ORDER BY s3.visit_date
You confirmed that you are using MySQL 8.0.21 server so the only other suggestion I have is that you're trying to run two SQL statements in one call:
SET #rowIndex := 0;
WITH s1 as (
SELECT...
Most MySQL connectors do not support multi-query by default. In other words, you can only do one statement per call. As soon as MySQL sees any syntax following your first ; it treats this as a syntax error.
There's no reason you need to use multi-query. Just run the two statements separately. As long as you use the same session, your value of #rowIndex will be available to subsequent statements.
The former Director of Engineering for MySQL once told me, "there's no reason multi-query should exist."
Not sure what problem you might be facing, yet this would pass on LeetCode:
Not sure if that's a right way to do it:
SELECT DISTINCT S1.id,
S1.visit_date,
S1.people
FROM stadium AS S1,
stadium AS S2,
stadium AS S3
WHERE S1.people > 99
AND S2.people > 99
AND S3.people > 99
AND ( (S2.id = S1.id + 1
AND S3.id = S1.id + 2)
OR (S2.id = S1.id - 1
AND S3.id = S1.id + 1)
OR (S2.id = S1.id - 1
AND S3.id = S1.id - 2) )
ORDER BY id ASC;

MySQL select last row each day

Trying to select last row each day.
This is my (simplified, more records in actual table) table:
+-----+-----------------------+------+
| id | datetime | temp |
+-----+-----------------------+------+
| 9 | 2017-06-05 23:55:00 | 9.5 |
| 8 | 2017-06-05 23:50:00 | 9.6 |
| 7 | 2017-06-05 23:45:00 | 9.3 |
| 6 | 2017-06-04 23:55:00 | 9.4 |
| 5 | 2017-06-04 23:50:00 | 9.2 |
| 4 | 2017-06-05 23:45:00 | 9.1 |
| 3 | 2017-06-03 23:55:00 | 9.8 |
| 2 | 2017-06-03 23:50:00 | 9.7 |
| 1 | 2017-06-03 23:45:00 | 9.6 |
+-----+-----------------------+------+
I want to select row with id = 9, id = 6 and id = 3.
I have tried this query:
SELECT MAX(datetime) Stamp
, temp
FROM weatherdata
GROUP
BY YEAR(DateTime)
, MONTH(DateTime)
, DAY(DateTime)
order
by datetime desc
limit 10;
But datetime and temp does not match.
Kind Regards
Here's one way, which gets the MAX date per day and then uses it in the INNER query to get the other fields:
SELECT *
FROM test
WHERE `datetime` IN (
SELECT MAX(`datetime`)
FROM test
GROUP BY DATE(`datetime`)
);
Here's the SQL Fiddle.
If your rows are always inserted and never updated, and if id is an autoincrementing primary key, then
SELECT w.*
FROM weatherdata w
JOIN ( SELECT MAX(id) id
FROM weatherdata
GROUP BY DATE(datetime)
) last ON w.id = last.id
will get you what you want. Why? The inner query returns the largest (meaning most recent) id value for each date in weatherdata. This can be very fast indeed, especially if you put an index on the datetime column.
But it's possible the conditions for this to work don't hold. If your datetime column sometimes gets updated to change the date, it's possible that larger id values don't always imply larger datetime values.
In that case you need something like this.
SELECT w.*
FROM weatherdata w
JOIN ( SELECT MAX(datetime) datetime
FROM weatherdata
GROUP BY DATE(datetime)
) last ON w.datetime = last.datetime
Your query doesn't work because it misuses the nasty nonstandard extension to MySQL GROUP BY. Read this: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
It should, properly, use the ANY_VALUE() function to highlight the unpredictability of the results. It shoud read ....
SELECT MAX(datetime) Stamp, ANY_VALUE(temp) temp
which means you aren't guaranteed the right row's temp value. Rather, it can return the temp value from any row in each day's grouping.

Optimizing SQL Query for max value with various conditions from a single MySQL table

I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.

Sorting some rows by average with SQL

All right, so here's a challenge for all you SQL pros:
I have a table with two columns of interest, group and birthdate. Only some rows have a group assigned to them.
I now want to print all rows sorted by birthdate, but I also want all rows with the same group to end up next to each other. The only semi-sensible way of doing this would be to use the groups' average birthdates for all the rows in the group when sorting. The question is, can this be done with pure SQL (MySQL in this instance), or will some scripting logic be required?
To illustrate, with the given table:
id | group | birthdate
---+-------+-----------
1 | 1 | 1989-12-07
2 | NULL | 1990-03-14
3 | 1 | 1987-05-25
4 | NULL | 1985-09-29
5 | NULL | 1988-11-11
and let's say that the "average" of 1987-05-25 and 1989-12-07 is 1988-08-30 (this can be found by averaging the UNIX timestamp equivalents of the dates and then converting back to a date. This average doesn't have to be completely correct!).
The output should then be:
id | group | birthdate | [sort_by_birthdate]
---+-------+------------+--------------------
4 | NULL | 1985-09-29 | 1985-09-29
3 | 1 | 1987-05-25 | 1988-08-30
1 | 1 | 1989-12-07 | 1988-08-30
5 | NULL | 1988-11-11 | 1988-11-11
2 | NULL | 1990-03-14 | 1990-03-14
Any ideas?
Cheers,
Jon
I normally program in T-SQL, so please forgive me if I don't translate the date functions perfectly to MySQL:
SELECT
T.id,
T.group
FROM
Some_Table T
LEFT OUTER JOIN (
SELECT
group,
'1970-01-01' +
INTERVAL AVG(DATEDIFF('1970-01-01', birthdate)) DAY AS avg_birthdate
FROM
Some_Table T2
GROUP BY
group
) SQ ON SQ.group = T.group
ORDER BY
COALESCE(SQ.avg_birthdate, T.birthdate),
T.group

MySQL grouping by date range with multiple joins

I currently have quite a messy query, which joins data from multiple tables involving two subqueries. I now have a requirement to group this data by DAY(), WEEK(), MONTH(), and QUARTER().
I have three tables: days, qos and employees. An employee is self-explanatory, a day is a summary of an employee's performance on a given day, and qos is a random quality inspection, which can be performed many times a day.
At the moment, I am selecting all employees, and LEFT JOINing day and qos, which works well. However, now, I need to group the data in order to breakdown a team or individual's performance over a date range.
Taking this data:
Employee
id | name
------------------
1 | Bob Smith
Day
id | employee_id | day_date | calls_taken
---------------------------------------------
1 | 1 | 2011-03-01 | 41
2 | 1 | 2011-03-02 | 24
3 | 1 | 2011-04-01 | 35
Qos
id | employee_id | qos_date | score
----------------------------------------
1 | 1 | 2011-03-03 | 85
2 | 1 | 2011-03-03 | 95
3 | 1 | 2011-04-01 | 91
If I were to start by grouping by DAY(), I would need to see the following results:
Day__date | Day__Employee__id | Day__calls | Day__qos_score
------------------------------------------------------------
2011-03-01 | 1 | 41 | NULL
2011-03-02 | 1 | 24 | NULL
2011-03-03 | 1 | NULL | 90
2011-04-01 | 1 | 35 | 91
As you see, Day__calls should be SUM(calls_taken) and Day__qos_score is AVG(score). I've tried using a similar method as above, but as the date isn't known until one of the tables has been joined, its only displaying a record where there's a day saved.
Is there any way of doing this, or am I going about things the wrong way?
Edit: As requested, here's what I've come up with so far. However, it only shows dates where there's a day.
SELECT COALESCE(`day`.day_date, qos.qos_date) AS Day__date,
employee.id AS Day__Employee__id,
`day`.calls_taken AS Day__Day__calls,
qos.score AS Day__Qos__score
FROM faults_employees `employee`
LEFT JOIN (SELECT `day`.employee_id AS employee_id,
SUM(`day`.calls_taken) AS `calls_in`,
FROM faults_days AS `day`
WHERE employee.id = 7
GROUP BY (`day`.day_date)
) AS `day`
ON `day`.employee_id = `employee`.id
LEFT JOIN (SELECT `qos`.employee_id AS employee_id,
AVG(qos.score) AS `score`
FROM faults_qos qos
WHERE employee.id = 7
GROUP BY (qos.qos_date)
) AS `qos`
ON `qos`.employee_id = `employee`.id AND `qos`.qos_date = `day`.day_date
WHERE employee.id = 7
GROUP BY Day__date
ORDER BY `day`.day_date ASC
The solution I'm comming up with looks like:
SELECT
`date`,
`employee_id`,
SUM(`union`.`calls_taken`) AS `calls_taken`,
AVG(`union`.`score`) AS `score`
FROM ( -- select from union table
(SELECT -- first select all calls taken, leaving qos_score null
`day`.`day_date` AS `date`,
`day`.`employee_id`,
`day`.`calls_taken`,
NULL AS `score`
FROM `employee`
LEFT JOIN
`day`
ON `day`.`employee_id` = `employee`.`id`
)
UNION -- union both tables
(
SELECT -- now select qos score, leaving calls taken null
`qos`.`qos_date` AS `date`,
`qos`.`employee_id`,
NULL AS `calls_taken`,
`qos`.`score`
FROM `employee`
LEFT JOIN
`qos`
ON `qos`.`employee_id` = `employee`.`id`
)
) `union`
GROUP BY `union`.`date` -- group union table by date
For the UNION to work, we have to set the qos_score field in the day table and the calls_taken field in the qos table to null. If we don't, both calls_taken and score would be selected into the same column by the UNION statement.
After this, I selected the required fields with the aggregation functions SUM() and AVG() from the union'd table, grouping by the date field in the union table.