I have a query which is getting slower and slower because there are more and more records in my table. So I'm trying to speed things up.
Database size:
Records: 1,200,000
Data 22,9 MiB
Index 46,8 MiB
Total 69,7 MiB
The purpose of the query is counting the number of records that exist that match the conditions. The conditions are a date (current date) and a status number. See query below:
SELECT
COUNT(id) AS total
FROM
order_process
WHERE
DATE(datetime) = CURDATE() AND
status = '7';
At the moment, this query is taking 800ms. And I need to run this query multiple times with different dates. These are all in the same script so script execution is going over the 3 seconds at the moment. How can I speed this up?
What have I already done:
Created indexes (Index on status and datetime both don't speed up the query).
Tested InnoDB engine (which is slower, mostly reading on this table)
To make it complete, below the current table setup.
CREATE TABLE IF NOT EXISTS `order_process` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`order_id` int(11) NOT NULL,
`status` int(11) NOT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`remark` text NOT NULL,
PRIMARY KEY (`id`),
KEY `orderid` (`order_id`),
KEY `datetime` (`datetime`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When you use date() function on a timestamp/datetime column and even if the column is indexed it can't use the index
So you need to construct the query as
where
datetime >= concat(CURDATE(),' 00:00:00')
and datetime <= concat(CURDATE(),' 23:59:59')
and status = '7'
Related
I can't figure out why MySQL is so slow summing less than 400 rows. Both u and t have indexes and return the rows quickly.
SELECT sum(t) FROM `s_table`
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
Query took 3.5299
If I remove the sum part of the query.
SELECT t FROM `s_table`
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
Query took 0.0090 seconds returns 397 rows.
So to sum 397 rows takes over 3 seconds!
Then I tried.
SELECT SUM(t)
FROM ( SELECT t
FROM s_table
WHERE `u` LIKE 'dogs%'
AND `t`> 10000
) AS total;
Query took 3.5767 seconds, so basically the same as the first query.
I'm going insane here. Why is it taking MySQL over 3 seconds to sum only 398 numbers?
Here is the explain:
CREATE TABLE `s_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(100) NOT NULL,
`v` int(12) NOT NULL,
`c` float NOT NULL,
`r` int(3) NOT NULL,
`u` varchar(350) NOT NULL,
`w` int(1) NOT NULL,
`t` int(12) NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_v` (`v`),
KEY `idx_c` (`c`),
KEY `idx_r` (`r`),
KEY `idx_u` (`u`),
KEY `idx_t` (`t`),
KEY `idx_date` (`date`),
KEY `s` (`s`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
Change these
KEY `idx_u` (`u`),
KEY `idx_t` (`t`),
to
KEY `u_t` (`u`, `t`),
KEY `t_u` (`t`, `u`),
I said "change", not "add". I have seen cases where "adding" a composite index did not change the Optimizer's choice of index; I think this is a bug. Note that these 2-column indexes are "covering", which, by itself, gives a performance boost.
Don't use ROW_FORMAT=COMPRESSED, it may be expending a lot of effort in uncompressing to run the query.
The UI you are using seems to stop at 25 rows -- this could explain the extra speed.
What do you get from these? (They may help in analyzing things.)
How many rows "need" to be looked at by each single-column index:
SELECT SUM(U LIKE 'dogs%'), SUM(t > 10000) FROM s_table;
More details than a plain EXPLAIN:
EXPLAIN FORMAT=JSON SELECT ... -- your query
This will definitively say whether 397 versus 272580 rows were fetched:
FLUSH STATUS;
SELECT ...; -- your query
SHOW SESSION STATUS LIKE 'Handler%';
The table 'reading' contains readings taken every 40s, for today. The query returns averages for 180s periods. 'time_stamp' is indexed. The query below returns a reasonable number of rows (a few hundred) but visits ALL rows and get slower the bigger the table gets. WHERE clause does not seem to be restricting it to today's rows only.
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
Table schema:
CREATE TABLE reading (
id bigint(20) NOT NULL AUTO_INCREMENT,
time_stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
temp_c float NOT NULL,
pressure_hpa float NOT NULL,
wind_speed_kt int(11) NOT NULL,
wind_dir_degree int(11) NOT NULL,
rain_mm float NOT NULL,
rain_day_mm float NOT NULL,
wind_gust_kt int(11) NOT NULL,
humidity float DEFAULT NULL,
PRIMARY KEY (id),
KEY time_stamp (time_stamp),
KEY time_stamp_idx (time_stamp)
) ENGINE=InnoDB AUTO_INCREMENT=1747097 DEFAULT CHARSET=latin1;
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE DATE(time_stamp) = CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)
When the above query is executed, MySQL optimizer isn't interested in index scan (could be because of cost factor) rather full table scan is initiated and the issue appears to be because of WHERE DATE(time_stamp) = CURDATE().
Having changed your where clause to time_stamp >= CURDATE(), I've seen index being used and less number of rows were fetched shunning full scan.
Hence, your final query will be:
EXPLAIN SELECT
DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
AVG(temp_c)
FROM reading
WHERE time_stamp >= CURDATE()
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180);
I suspect date(time_stamp) isn't that efficient with index. Similar topic was discussed here (see ypercube's answer).
The above query can be further improved by choosing an alternate of round(UNIX_TIMESTAMP(time_stamp) / 180) as UNIX_TIMESTAMP(timestamp) doesn't use index. But, I'm not trying furthermore.
Hope this helps!
I have a table of bitcoin transactions:
CREATE TABLE `transactions` (
`trans_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`trans_exchange` int(10) unsigned DEFAULT NULL,
`trans_currency_base` int(10) unsigned DEFAULT NULL,
`trans_currency_counter` int(10) unsigned DEFAULT NULL,
`trans_tid` varchar(20) DEFAULT NULL,
`trans_type` tinyint(4) DEFAULT NULL,
`trans_price` decimal(15,4) DEFAULT NULL,
`trans_amount` decimal(15,8) DEFAULT NULL,
`trans_datetime` datetime DEFAULT NULL,
`trans_sid` bigint(20) DEFAULT NULL,
`trans_timestamp` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`trans_id`),
KEY `trans_tid` (`trans_tid`),
KEY `trans_datetime` (`trans_datetime`),
KEY `trans_timestmp` (`trans_timestamp`),
KEY `trans_price` (`trans_price`),
KEY `trans_amount` (`trans_amount`)
) ENGINE=MyISAM AUTO_INCREMENT=6162559 DEFAULT CHARSET=utf8;
As you can see from the AUTO_INCREMENT value, the table has over 6 million entries. There will eventually be many more.
I would like to query the table to obtain max price, min price, volume and total amount traded during arbitrary time intervals. To accomplish this, I'm using a query like this:
SELECT
DATE_FORMAT( MIN(transactions.trans_datetime),
'%Y/%m/%d %H:%i:00'
) AS trans_datetime,
SUM(transactions.trans_amount) as trans_volume,
MAX(transactions.trans_price) as trans_max_price,
MIN(transactions.trans_price) as trans_min_price,
COUNT(transactions.trans_id) AS trans_count
FROM
transactions
WHERE
transactions.trans_datetime BETWEEN '2014-09-14 00:00:00' AND '2015-09-13 23:59:00'
GROUP BY
transactions.trans_timestamp DIV 86400
That should select transactions made over a year period, grouped by day (86,400 seconds).
The idea is the timestamp field, which contains the same value as datetime, but as a timestamp...I found this faster than UNIX_TIMESTAMP(trans_datetime), is divided by the amount of seconds I want to be in the time intervals.
The problem: the query is slow. I'm getting 4+ seconds processing time. Here is the result of EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE transactions ALL trans_datetime,trans_timestmp NULL NULL NULL 6162558 Using where; Using temporary; Using filesort
The question: is it possible to optimize this better? Is this structure or approach flawed? I have tried several approaches, and have only succeeded in making modest millisecond-type gains.
Most of the data in the table is for the last 12 months? So you need to touch most of the table? Then there is no way to speed that query up. However, you can get the same output orders of magnitude faster...
Create a summary table. It would have a DATE as the PRIMARY KEY, and the columns would be effectively the fields mentioned in your SELECT.
Once you have initially populated the summary table, then maintain it by adding a new row each night for the day's transactions. More in my blog.
Then the query to get the desired output would hit this Summary Table (with only a few hundred rows), not the table with millions or rows.
I have the following table in MySQL:
CREATE TABLE `history` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`timestamp` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`code` CHAR(32) NOT NULL,
`value` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `timestamp_code` (`timestamp`, `code`),
INDEX `code` (`code`),
INDEX `timestamp` (`timestamp`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;
I would like to know what is the best practice in order to access the last available value before a certain date for a certain set of codes the most efficiently?
So far I came up with the following query:
SELECT h.* FROM history h
JOIN (
SELECT code, MAX(timestamp) as 'last_ts'
FROM history WHERE
timestamp < '2015-09-04 13:50:00' AND
code IN ('119813249', '12087792', '12087797',
'127012151', '131014335', '131014378',
'132757371', '15016059', '15016062',
'150250238', '153462747', '155802712',
'156974389', '162277696', '166330444',
'166483001', '167220356', '167264923',
'167867931', '172283682', '177539478',
'177583937', '177648754', '177649011',
'187532416', '189230667', '70273253',
'70342790', '79342386', '82460282',
'98693280', '98693380')
GROUP BY code) last_price
ON last_price.last_ts = h.timestamp
AND last_price.code = h.code
The query above works, but becomes slow as the number of entries in the table grows (100'000'000 rows).
You can download sample data to populate the table.
Create an index by code, timestamp - rather than timestamp, code. This would let mysql sort out codes before looking for the max timestamp per code - and should be much faster. Use explain for verifying that the index is used.
And if you create that index - you should not have to modify your query.
I'm running a table that has built up to 600 million rows and is rapidly growing, which has been slowing down queries that need to run as quickly as possible. Current schema is:
CREATE TABLE `user_history` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `userId` (`userId`,`asin`,`dateSent`),
KEY `dateSent` (`dateSent`,`asin`),
KEY `asin` (`asin`,`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Everything I've read about partitioning suggested that this was a prime candidate for partitioning by date range. We only tend to use the last 14 days data, but the client doesn't want to delete old data. The new schema looks like:
CREATE TABLE `user_history_partitioned` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`dateSent`,`asin`,`userId`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY RANGE ( UNIX_TIMESTAMP(dateSent) ) (
PARTITION Apr2013 VALUES LESS THAN (UNIX_TIMESTAMP('2013-05-01')),
etc...
PARTITION Mar2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-04-01')),
PARTITION Apr2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-05-01')),
PARTITION May2014 VALUES LESS THAN (UNIX_TIMESTAMP('2014-06-01')),
PARTITION Future VALUES LESS THAN MAXVALUE);
The idea of the Future partition is because a REORGANIZE PARTITION run on a populated partition was taking a long time to complete. So Future will always be empty and can reorganized into new partitions instantly. And other queries using this table have been reordered to use the primary key only, to reduce the number of indexes on the table.
The time-critical query is apropos of:
SELECT SQL_NO_CACHE *
FROM books B
WHERE (non-relevant stuff deleted)
AND NOT EXISTS
(
SELECT 1 FROM user_history H
WHERE
H.userId=$userId
AND H.asin=B.ASIN
AND dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
)
AND (non-relevant stuff deleted)
LIMIT 1
So we're avoid duplicates that have already been selected for the same user in the last 14 days. And this currently returns in < 0.1 secs, which is okay but slower than it used to be on the current schema.
For the new schema, the inner SELECT has been reordered to:
SELECT 1 FROM user_history_partitioned H
WHERE dateSent > DATE_SUB(NOW(), INTERVAL 14 DAY)
AND H.asin=B.ASIN
AND H.userId=$userId
And it's taking 5 minutes per query. and I can't see why. The idea is that the current partition and indexes should be in memory (or maybe the previous month too, at some times of the month), and the primary index covers the WHERE clause. But from the time it's taking, it could be performing a full table scan on asin or userId. Which is difficult to identify from EXPLAIN because it's an inner query.
What am I missing? Do I need another combined index for (asin, userID)? If so, why?
Thanks,
PS: Tried wrapping the DATE_SUB(...) as UNIX_TIMESTAMP(DATE_SUB(...)) just in case it was a type conversion issue, but made no difference.