Why does SQL query visit all rows and is very slow - mysql

The table 'reading' contains readings taken every 40s, for today. The query returns averages for 180s periods. 'time_stamp' is indexed. The query below returns a reasonable number of rows (a few hundred) but visits ALL rows and get slower the bigger the table gets. WHERE clause does not seem to be restricting it to today's rows only.
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
Table schema:
CREATE TABLE reading (
id bigint(20) NOT NULL AUTO_INCREMENT,
time_stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
temp_c float NOT NULL,
pressure_hpa float NOT NULL,
wind_speed_kt int(11) NOT NULL,
wind_dir_degree int(11) NOT NULL,
rain_mm float NOT NULL,
rain_day_mm float NOT NULL,
wind_gust_kt int(11) NOT NULL,
humidity float DEFAULT NULL,
PRIMARY KEY (id),
KEY time_stamp (time_stamp),
KEY time_stamp_idx (time_stamp)
) ENGINE=InnoDB AUTO_INCREMENT=1747097 DEFAULT CHARSET=latin1;

EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
When the above query is executed, MySQL optimizer isn't interested in index scan (could be because of cost factor) rather full table scan is initiated and the issue appears to be because of WHERE DATE(time_stamp) = CURDATE().
Having changed your where clause to time_stamp >= CURDATE(), I've seen index being used and less number of rows were fetched shunning full scan.
Hence, your final query will be:
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE time_stamp >= CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180);
I suspect date(time_stamp) isn't that efficient with index. Similar topic was discussed here (see ypercube's answer).
The above query can be further improved by choosing an alternate of round(UNIX_TIMESTAMP(time_stamp) / 180) as UNIX_TIMESTAMP(timestamp) doesn't use index. But, I'm not trying furthermore.
Hope this helps!

Related

How should i properly index the mysql column when dealing with sort?

I have a log table, but I find it become very slow when I sort it.
Here's my database table structure in short.
CREATE TABLE `webhook_logs` (
`ID` bigint(20) UNSIGNED NOT NULL,
`event_id` bigint(20) UNSIGNED DEFAULT NULL,
`object_id` bigint(20) UNSIGNED DEFAULT NULL,
`occurred_at` bigint(20) UNSIGNED DEFAULT NULL,
`payload` text COLLATE utf8mb4_unicode_520_ci,
`priority` bigint(1) UNSIGNED DEFAULT NULL,
`status` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
ALTER TABLE `webhook_logs`
ADD PRIMARY KEY (`ID`),
ADD KEY `event_id` (`event_id`),
ADD KEY `object_id` (`object_id`),
ADD KEY `occurred_at` (`occurred_at`),
ADD KEY `priority` (`priority`),
ADD KEY `status` (`status`);
There are 5M + records.
When I do
SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 ORDER BY priority ASC LIMIT 100
, it took about 5 seconds to get the records.
However, when i remove the sorting, and just do
SELECT * FROM `webhook_logs` WHERE status = 'pending' AND occurred_at < 1652838913000 LIMIT 100
, it took only 0.0022 seconds.
I've been playing around with the index and see if the time improved, but with no luck. I wonder if I'm doing something wrong here.
I tried creating combo index with "occurred_at" and "priority", or combo index with all "occurred_at", "priority" and "status". None of them improved the speed, still take around 5 seconds. If any help, there server is running MYSQL 5.7.12.
Any help will be appropriated. Thanks.
Pure index can't solve your problem. In your query, the DB must first find out all records where "occurred_at < 1652838913000" and then sort them to get the records with highest priority. No index can help to reduce the sort.
But there are solutions to your problem, because priority always has only serveral values. You can create an index(status, priority, occurred_at), and then write a query like this:
select * from (
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=1 AND occurred_at < 1652838913000 LIMIT 100)
union
(SELECT * FROM `webhook_logs` WHERE status = 'pending' and priority=2 AND occurred_at < 1652838913000 LIMIT 100)
) a ORDER BY priority asc LIMIT 100
In this query, DB will use the index to do each sub query of the union, and then sort only very few rows. The result can be returned in less than 0.1 seconds
You don't need BIGINT for most of those columns. That datatype takes 8 bytes. There are much smaller datatypes. priority could be TINYINT UNSIGNED (1 byte, range of 0..255). status could be changed to a 1-byte ENUM. Such changes will shrink the data and index sizes, hence speed up most operations somewhat.
Replace INDEX(status) with
INDEX(status, occurred_at, priority, id) -- in this order
Then your query will run somewhat faster, depending on the distribution of the data.
This might run even faster:
SELECT w.*
FROM (
SELECT id
FROM `webhook_logs`
WHERE status = 'pending'
AND occurred_at < 1652838913000
ORDER BY priority ASC
LIMIT 100
) AS t
JOIN webhook_logs USING(id)
ORDER BY priority ASC -- yes, this is repeated
;
That is because it can pick the 100 ids from the my index much faster since it is "covering", then do 100 lookups to get "*".

How I can optimize query with where index?

I have query
select `price`, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
And table
CREATE TABLE IF NOT EXISTS history_average_pairs (
id bigint(20) unsigned NOT NULL,
asset_id bigint(20) unsigned NOT NULL,
currency_id bigint(20) unsigned NOT NULL,
market_cap bigint(20) NOT NULL,
price double(20,6) NOT NULL,
volume bigint(20) NOT NULL,
circulating bigint(20) NOT NULL,
change_1h double(8,2) NOT NULL,
change_24h double(8,2) NOT NULL,
change_7d double(8,2) NOT NULL,
created_at timestamp NOT NULL DEFAULT current_timestamp(),
updated_at timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
total_supply bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE history_average_pairs
ADD PRIMARY KEY (id),
ADD KEY history_average_pairs_currency_id_asset_id_foreign (currency_id,asset_id),
ALTER TABLE history_average_pairs
MODIFY id bigint(20) unsigned NOT NULL AUTO_INCREMENT;
It contains more than 10 000 000 rows, and query takes
Showing rows 0 - 24 (32584 total, Query took 27.8344 seconds.)
But without currency_id = 1, it takes like 4 sec.
UPDATE 1
Okey, I updated key from currency_id, asset_id to currency_id, asset_id, created_at and it takes
Showing rows 0 - 24 (32784 total, Query took 6.4831 seconds.)
Its much faster, any proposal to do it more faster?
GROUP BY here to take only first row for every hour.
For example:
19:01:10
19:02:14
19:23:15
I need only 19:01:10
You can rephrase the filtering predicate to avoid using expressions on columns. For example:
select max(`price`) as max_price, `asset_id`
from `history_average_pairs`
where `currency_id` = 1
and created_at >= date_add(curdate(), interval - 7 day)
group by hour(created_at), date(created_at), asset_id
order by `created_at` asc
Then, this query could be much faster if you added the index:
create index ix1 on `history_average_pairs` (`currency_id`, created_at);
You must make the test "sargeable"; change
date(`created_at`) >= DATE_SUB(NOW(), INTERVAL 7 DAY)
to
created_at >= CURDATE() - INTERVAL 7 DAY
Then the optimal index is
INDEX(currency_id, -- 1st because of "=" test
created_at, -- 2nd to finish out WHERE
asset_id) -- only for "covering"
When designing an index, it is usually best to handle the WHERE first.
The GROUP BY cannot use the index. Did you really want the hour first?
"I need only 19:01:10" is unclear, so I have not factored that in. Where's the date? Where's the asset_id? See "only_full_group_by". Do you need "groupwise max"?
Making the ORDER BY have the same columns as the GROUP BY avoids a sort. (In your query, the order may be slightly different, but it probably does not matter.)
Datatype issues...
BIGINT takes 8 bytes; INT takes only 4 bytes and is usually big enough. Shrinking the table provides some speed.
double(8,2) takes 8 bytes -- Don't use (m,n) on FLOAT or DOUBLE; it adds an extra rounding. Perhaps you meant DECIMAL(8,2), which takes 4 bytes.

MySQL select optimization

A table with a few Million rows, something like this:
my_table (
`CONTVISITID` bigint(20) NOT NULL AUTO_INCREMENT,
`NODE_ID` bigint(20) DEFAULT NULL,
`CONT_ID` bigint(20) DEFAULT NULL,
`NODE_NAME` varchar(50) DEFAULT NULL,
`CONT_NAME` varchar(100) DEFAULT NULL,
`CREATE_TIME` datetime DEFAULT NULL,
`HITS` bigint(20) DEFAULT NULL,
`UPDATE_TIME` datetime DEFAULT NULL,
`CLIENT_TYPE` varchar(20) DEFAULT NULL,
`TYPE` bigint(1) DEFAULT NULL,
`PLAY_TIMES` bigint(20) DEFAULT NULL,
`FIRST_PUBLISH_TIME` bigint(20) DEFAULT NULL,
PRIMARY KEY (`CONTVISITID`),
KEY `cont_visit_contid` (`CONT_ID`),
KEY `cont_visit_createtime` (`CREATE_TIME`),
KEY `cont_visit_publishtime` (`FIRST_PUBLISH_TIME`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=57676834 DEFAULT CHARSET=utf8
I had a query that I have managed to optimize to the following departing from a flat select:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits,type,first_publish_time
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
Can this be further optimized?
Edit:
I started with a FLAT select like I mentioned before, what I mean by flat select not to have a composite select like my current one. Instead of the single select that someone responded with. A single select is twice slower, so not viable in my case.
Edit2: I have a DBA friend who suggested me to change the query to this:
SELECT a.cont_id, SUM(a.hits)
FROM (
SELECT cont_id,hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time>1398310263000
AND type=1) as a group by a.cont_id
order by sum(HITS) DESC LIMIT 10;
As I do not need the fields extra (type,first_publish_time) and the TMP table is smaller, this makes the query faster about about 1/4 total time of the fastest version I have. He also suggested to add a composite index between (create_time, cont_id, hits). He says with this index I will get really good performance, but I have not done that as this is a production DB and the alter might affect replication. I will post results once done.
INDEX(type, first_publish_time)
INDEX(type, create_time)
Then do
SELECT cont_id, SUM(hits) AS tot_hits
FROM my_table
where create_time > '2017-03-10 00:00:00'
AND first_publish_time > 1398310263000
AND type = 1
group by cont_id
order by tot_hits DESC
LIMIT 10;
Start the index with any = filters (type, in this case); then you get one chance to us a range.
The reason for 2 indexes -- The Optimizer will look at statistics and decide which look better based on the values given.
Consider shrinking the BIGINTs (8 bytes) to some smaller INT type. Saving space will help speed, especially if the table is too big to be cached.
For further discussion, please provide EXPLAIN SELECT ...;.

MySQL: best practice for querying last value before a certain date in a time series

I have the following table in MySQL:
CREATE TABLE `history` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`code` CHAR(32) NOT NULL,
`value` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `timestamp_code` (`timestamp`, `code`),
INDEX `code` (`code`),
INDEX `timestamp` (`timestamp`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;
I would like to know what is the best practice in order to access the last available value before a certain date for a certain set of codes the most efficiently?
So far I came up with the following query:
SELECT h.* FROM history h
JOIN (
SELECT code, MAX(timestamp) as 'last_ts'
FROM history WHERE
timestamp < '2015-09-04 13:50:00' AND
code IN ('119813249', '12087792', '12087797',
'127012151', '131014335', '131014378',
'132757371', '15016059', '15016062',
'150250238', '153462747', '155802712',
'156974389', '162277696', '166330444',
'166483001', '167220356', '167264923',
'167867931', '172283682', '177539478',
'177583937', '177648754', '177649011',
'187532416', '189230667', '70273253',
'70342790', '79342386', '82460282',
'98693280', '98693380')
GROUP BY code) last_price
ON last_price.last_ts = h.timestamp
AND last_price.code = h.code
The query above works, but becomes slow as the number of entries in the table grows (100'000'000 rows).
You can download sample data to populate the table.
Create an index by code, timestamp - rather than timestamp, code. This would let mysql sort out codes before looking for the max timestamp per code - and should be much faster. Use explain for verifying that the index is used.
And if you create that index - you should not have to modify your query.

Speed up this MySQL query

I have a query which is getting slower and slower because there are more and more records in my table. So I'm trying to speed things up.
Database size:
Records: 1,200,000
Data 22,9 MiB
Index 46,8 MiB
Total 69,7 MiB
The purpose of the query is counting the number of records that exist that match the conditions. The conditions are a date (current date) and a status number. See query below:
SELECT
COUNT(id) AS total
FROM
order_process
WHERE
DATE(datetime) = CURDATE() AND
status = '7';
At the moment, this query is taking 800ms. And I need to run this query multiple times with different dates. These are all in the same script so script execution is going over the 3 seconds at the moment. How can I speed this up?
What have I already done:
Created indexes (Index on status and datetime both don't speed up the query).
Tested InnoDB engine (which is slower, mostly reading on this table)
To make it complete, below the current table setup.
CREATE TABLE IF NOT EXISTS `order_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`remark` text NOT NULL,
PRIMARY KEY (`id`),
KEY `orderid` (`order_id`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When you use date() function on a timestamp/datetime column and even if the column is indexed it can't use the index
So you need to construct the query as
where
datetime >= concat(CURDATE(),' 00:00:00')
and datetime <= concat(CURDATE(),' 23:59:59')
and status = '7'