I'm trying to select data at two different resolutions based on data points per unit of time. Right now I'm just running two queries and joining them with a UNION. To get the number resolutions I want I'm using this to achieve 1 data point per minute:
GROUP BY UNIX_TIMESTAMP(`datetime`) DIV 60
Just wondering if there is a more efficient way to do this?
SELECT (UNIX_TIMESTAMP(`datetime`)*1000) as `dt`, `value1`, `value2`
FROM `table`
WHERE `datetime` BETWEEN '2017-01-01' AND '2018-01-01'
GROUP BY UNIX_TIMESTAMP(`datetime`) DIV 240
UNION
SELECT (UNIX_TIMESTAMP(`datetime`)*1000) as `dt`, `value1`, `value2`
FROM `table`
WHERE `datetime` BETWEEN '2017-01-01' AND '2018-01-01'
AND TIME(`datetime`) BETWEEN TIME('12:00:00') AND TIME('13:00:00')
GROUP BY UNIX_TIMESTAMP(`datetime`) DIV 60
ORDER BY `dt` ASC;
Here's an alternative I came across. This one seems a little quicker and returns the actual values for the times selected instead of mySQL just picking one from each time group.
On this table datetime is an index.
SELECT a.`datetime`, a.`value1`, a.`value2`
FROM `table` a
INNER JOIN
(
SELECT `datetime`
FROM `table`
WHERE DATE(`datetime`) BETWEEN '2017-01-01' AND '2018-01-01'
GROUP BY UNIX_TIMESTAMP(`datetime`) DIV 240
UNION
SELECT `datetime`
FROM `table`
WHERE DATE(`datetime`) BETWEEN '2017-01-01' AND '2018-01-01'
AND TIME(`datetime`) BETWEEN '12:00:00' AND '13:00:00'
GROUP BY UNIX_TIMESTAMP(`datetime`) DIV 60
ORDER BY `datetime`
) b on a.`datetime` = b.`datetime`;
Related
I am having performance issues with a query, I have 21 million records across the table, and 2 of the tables I'm looking in here have 8 million each; individually, they are very quick. But I've done a query that, in my opinion, isn't very good, but it's the only way I know how to do it.
This query takes 65 seconds, I need to get it under 1 second and I think it's possible if I don't have all the SELECT queries, but once again, I am not sure how else to do it with my SQL knowledge.
Database server version is MariaDB 10.6.
SELECT
pa.`slug`,
(
SELECT
SUM(`impressions`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_impressions,
(
SELECT
SUM(`clicks`)
FROM `rh_pages_gsc_country`
WHERE `page_id` = pa.`page_id`
AND `country` = 'aus'
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as au_clicks,
(
SELECT
COUNT(`keywords_id`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as keywords,
(
SELECT
AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_pos,
(
SELECT
AVG(`ctr`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN
(
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
)
) as avg_ctr
FROM `rh_pages` pa
WHERE pa.`site_id` = 13
ORDER BY au_impressions DESC, keywords DESC, slug DESC
If anyone can help, I don't think the table structure is needed here as it's basically shown in the query, but here is a photo of the constraints and table types.
Anyone that can help is greatly appreciated.
Do NOT normalize any column that will be regularly used in a "range scan", such as date. The following is terribly slow:
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
It also consumes extra space to have BIGINT (8 bytes) pointing to a DATE (5 bytes).
Once you move the date to the various tables, the subqueries simplify, such as
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date_id` IN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH
AND NOW() )
becomes
SELECT AVG(`position`)
FROM `rh_pages_gsc_keywords`
WHERE `page_id` = pa.`page_id`
AND `date` >= NOW() - INTERVAL 12 MONTH
I'm assuming that nothing after "NOW" has yet been stored.
If there are dates in the future, then add
AND `date` < NOW()
Each table will probably need a new index, such as
INDEX(page_id, date) -- in that order
(Yes, the "JOIN" suggestion by others is a good one. It's essentially orthogonal to my suggestions above and below.)
After you have made those changes, if the performance is not good enough, we can discuss Summary Tables
Your query is aggregating (summarizing) rows from two different detail tables, rh_pages_gsc_country and rh_pages_gsc_keywords, and doing so for a particular date range. And it has a lot of correlated subqueries.
The first steps in your path to better performance are
Converting your correlated subqueries to independent subqueries, then JOINing them.
Writing one subquery for each detail table, rather than one for each column you need summarized.
You mentioned you've been struggling with this. The concept I hope you learn from this answer is this: you can often refactor away your correlated subqueries if you can come up with independent subqueries that give the same results, and then join them together. If you mention subqueries in your SELECT clause -- SELECT ... (SELECT whatever) whatever ... -- you probably have an opportunity to do this refactoring.
Here goes. First you need a subquery for your date range. You have this one right, just repeated.
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
Next you need a subquery for rh_pages_gsc_country. It is a modification of what you have. We'll fetch both SUMs in one subquery.
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus'
GROUP BY page_id, date_id
See how this goes? This subquery yields a virtual table with exactly one row for every combination of page_id and date_id, containing the number of impressions and the number of clicks.
Next, let's join the subqueries together in a main query. This yields some columns of your result set.
SELECT pa.slug, country.impressions, country.clicks
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_dates`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY country.impressions DESC
This runs through the rows of rh_pages_gsc_dates and rh_pages_gsc_country just once to satisfy your query. So, faster.
Finally let's do the same thing for your rh_pages_gsc_keywords table's summary.
SELECT pa.slug, country.impressions, country.clicks,
keywords.keywords, keywords.avg_pos, keywords.avg_ctr
FROM rh_pages pa
JOIN (
SELECT SUM(`impressions`) impressions,
SUM(`clicks`) clicks,
page_id, date_id
FROM `rh_pages_gsc_country`
WHERE `country` = 'aus' -- constant for country code
GROUP BY page_id, date_id
) country ON country.page_id = pa.page_id
JOIN (
SELECT SUM(`keywords_id`) keywords,
AVG(`position`) position,
AVG(`ctr`) avg_ctr,
page_id, date_id
FROM `rh_pages_gsc_keywords`
GROUP BY page_id, date_id
) keywords ON keywords.page_id = pa.page_id
JOIN (
SELECT `date_id`
FROM `rh_pages_gsc_keywords`
WHERE `date` BETWEEN NOW() - INTERVAL 12 MONTH AND NOW()
) dates ON dates.date_id = country.date_id
AND dates.date_id = keywords.date_id
WHERE pa.site_id = 13 -- constant for page id
ORDER BY impressions DESC, keywords DESC, slug DESC
This will almost certainly be faster than what you have now. If it's fast enough, great. If not, please don't hesitate to ask another question for help, tagging it query-optimization. We will need to see your table definitions, your index definitions, and the output of EXPLAIN. Please read this before asking a followup question.
I did not, repeat not, debug any of this. That's up to you.
I have a database that stores the temperature reading of each sensor in a along with the sensor ID, and date the reading was taken.
SELECT DISTINCT `date` FROM `temperatureData` ORDER BY `date` ASC LIMIT 10
this allows me to select the last 10 readings that are going to be plotted in a chart.
there are up to 40 sensor readings for each date.
I tried doing the following.
SELECT `date`, `sensor`, `temp`
FROM `temperatureData`
WHERE `date` = (
SELECT DISTINCT `date` FROM `temperatureData` ORDER BY `date` ASC LIMIT 10
)
Can anyone assist me as to how to select all the readings for the dates that are returned back from the last 10 dates?
Thanks in advance.
Boris
You just need in instead of =:
SELECT `date`, `sensor`, `temp`
FROM `temperatureData`
WHERE `date` IN (
SELECT DISTINCT `date` FROM `temperatureData` ORDER BY `date` DESC LIMIT 10
)
NB: if you want the readings of the last 10 dates, you probably want to ORDER BY date DESC instead of ASC. I changed that too.
In MySQL 8.0, this could also be rewritten with window function dense_rank():
SELECT `date`, `sensor`, `temp`
FROM (
SELECT
`date`,
`sensor`,
`temp`
DENSE_RANK() OVER(ORDER BY `date` DESC) rn
FROM `temperatureData`
) t
WHERE rn <= 10
Edit
To workaround the limitation of MySQL 5.7 not supporting LIMIT in subqueries with IN, you can use a join instead:
SELECT t.`date`, t.`sensor`, t.`temp`
FROM `temperatureData` t
INNER JOIN (
SELECT DISTINCT `date` FROM `temperatureData` ORDER BY `date` DESC LIMIT 10
) d ON d.`date` = t.`date`
Since my web-host has updated the MySQL server version, my old SQL Query is not working anymore:
select COUNT(ID) AS Anzahl,
DAY(STR_TO_DATE(created_at, '%Y-%m-%d')) AS Datum
from `leads`
where `created_at` >= 2018-02-01
and `created_at` <= 2018-02-15
and `shopID` = 20
group by DAY(created_at)
order by DAY(created_at) asc
That means, I have to create a full group by query. I already have read this article but I don't really get it.
I should name all columns which are unique
Thats what I don't get. If I want to count the ID, I cannot create a group by ID query because in this case my count would always be 1. Could anybody please explain to me how full group by works and how my statement would like with a full group by statement?
Just use the same expression in the select as in the group by:
select COUNT(ID) AS Anzahl, DAY(created_at) AS Datum
from `leads` l
where `created_at` >= '2018-02-01' and
`created_at` <= '2018-02-15' and
`shopID` = 20
group by DAY(created_at)
order by DAY(created_at) asc;
You also need single quotes around the date constants.
Your select and group by columns doesn't match. You should make them same. Try below query:
select COUNT(ID) AS Anzahl,
DAY(STR_TO_DATE(created_at, '%Y-%m-%d')) AS Datum
from `leads`
where `created_at` >= 2018-02-01
and `created_at` <= 2018-02-15
and `shopID` = 20
group by Datum
order by Datum asc
I have this query i use to get statistics of blogs in our own tracking system.
I use union select over 2 tables as we daily aggregate data in 1 table and keeps todays data in another table.
I want to have the last 10 months of traffic show.. This query does that, but of there is no traffic in a specific month that row is not in the result.
I have previously used a calendar table in mysql to join against to at avoid that, but im simply not skilled enoght to rewrite this query to join against that calendar table.
The calendart table has 1 field called "datefield" which i date format YYY-MM-DD
This is the current query i use
SELECT FORMAT(SUM(`count`),0) as `count`, DATE(`date`) as `date`
FROM
(
SELECT count(distinct(uniq_id)) as `count`, `timestamp` as `date`
FROM tracking
WHERE `timestamp` > now() - INTERVAL 1 DAY AND target_bid = 92
group by `datestamp`
UNION ALL
select sum(`count`),`datestamp` as `date`
from aggregate_visits
where `datestamp` > now() - interval 10 month
and target_bid = 92
group by `datestamp`
) a
GROUP BY MONTH(date)
Something like this?
select sum(COALESCE(t.`count`,0)),s.date as `date`
from DateTable s
LEFT JOIN (SELECT * FROM aggregate_visits
where `datestamp` > now() - interval 10 month
and target_bid = 92) t
ON(s.date = t.datestamp)
group by s.date
I need to select first value for every hour from my db. But I don't know how to reverse order on GROUP BY statement.
How can i rewrite my query (now it selects last value in hour)?
SELECT HOUR(`time`) as hour, mytable.*
FROM mytable
WHERE DATE(`time`) ="2015-09-12" GROUP BY HOUR(`time`) ORDER BY `time` ASC;
This query gave me expected result:
SELECT HOUR(`time`) as hour, sortedTable.* FROM
(SELECT electrolysis.* FROM electrolysis
WHERE DATE(`time`)='2015-09-12' ORDER BY `time`) as sortedTable
GROUP BY HOUR(`time`);
You can just select the MIN HOUR in sub query , try using the query:
SELECT * from mytable WHERE `time` IN (
SELECT MIN(HOUR(`time`)) as `hour`
FROM mytable
WHERE DATE(`time`) ="2015-09-12"
GROUP BY HOUR(`time`) ) ORDER BY `time` ASC;
You can do something like this:-
SELECT sub0.min_time,
mytable.*
FROM mytable
INNER JOIN
(
SELECT MIN(`time`) AS min_time
FROM mytable
GROUP BY HOUR(`time`)
) sub0
ON mytable.`time` = sub0.min_time
WHERE DATE(`time`) ="2015-09-12"
ORDER BY `time` ASC
This is using a sub query to get the smallest time in each hour. This is then joined back against your main table on this min time to get the record that has this time.
Note that there is a potential problem here if there are multiple records that share the same time as the smallest one for an hour. There are ways around this, but that will depend on your data (eg, if you have a unique id field which is always ascending with time then you could select the min id for each hour and join based on that)
You can use below query, which is more optimized just make sure that time field should be indexed.
SELECT HOUR(m.time), m.*
FROM mytable AS m
JOIN
(
SELECT MIN(`time`) AS tm
FROM mytable
WHERE `time` >= '2015-09-12 00:00:00' AND `time` <= '2015-09-12 23:59:59'
GROUP BY HOUR(`time`)
) AS a ON m.time=a.tm
GROUP BY HOUR(m.time)
ORDER BY m.time;