Slow subquery: group by a groupwise maximum - mysql

I have two tables:
CREATE TABLE share_prices (
price_id int(10) unsigned NOT NULL AUTO_INCREMENT,
price_date date NOT NULL,
company_id int(10) NOT NULL,
high decimal(20,2) DEFAULT NULL,
low decimal(20,2) DEFAULT NULL,
close decimal(20,2) DEFAULT NULL,
PRIMARY KEY (price_id),
UNIQUE KEY price_date (price_date,company_id),
KEY company_id (company_id),
KEY price_date_2 (price_date)
) ENGINE=InnoDB AUTO_INCREMENT=368586 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And
CREATE TABLE rating_lookup (
rating_id int(11) NOT NULL,
start_date date DEFAULT NULL,
start_price decimal(10,2) DEFAULT NULL,
broker_id int(11) DEFAULT NULL,
company_id int(11) DEFAULT NULL,
end_date date DEFAULT NULL,
PRIMARY KEY (rating_id),
KEY idx_rating_lookup_company_id (company_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is the current query:
SELECT broker_id, count(rating_id)
FROM (
SELECT rating_lookup.*,
share_prices.company_id as correct_company,
share_prices.price_date,
max(high) as peak_gain,
( ( ( max(high) - rating_lookup.start_price ) / rating_lookup.start_price ) * 100 ) as percent_gain
FROM rating_lookup, share_prices
WHERE share_prices.price_date > rating_lookup.start_date
AND share_prices.price_date < ifnull(end_date, curdate())
AND share_prices.company_id = rating_lookup.company_id
GROUP BY rating_id
HAVING percent_gain > 5
) correct
GROUP BY broker_id
Currently this query takes 10.969 sec.
The isolated subquery takes 0.391 sec (duration) / 10.438 sec (fetch)
Query objective:
Get the total amount of correct ratings per broker_id.
A correct rating is defined as a rating that has a reached + 5% since its start_price.
I am looking to drastically decrease the query time, even if restructuring the database is the only way.
Appendix
Explain of above query:
+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | | | | | 3894800 | Using temporary; Using filesort |
| 2 | DERIVED | rating_lookup | index | PRIMARY,idx_rating_lookup_company_id | PRIMARY | 4 | | 18200 | Using where |
| 2 | DERIVED | share_prices | ref | price_date,company_id,price_date_2 | company_id | 4 | brokermetrics.rating_lookup.company_id | 214 | Using where |
+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+
share_prices ~ 375,000 rows
rating_lookup ~ 18,000 rows with around 46 unique brokers

I assume that share prices are inserted once per day after the market closes (or a couple of times per day if you cover multiple markets).
If you don't manage to tune the query sufficiently you can pre-calculate the result. Run the query after every time you finish loading a batch of new stock prices. Insert the result in a new table. Reading the pre-calculated data should be fast enough.

PRIMARY KEY (price_id), -- useless
UNIQUE KEY price_date (price_date,company_id), -- could/should be PK
KEY company_id (company_id),
KEY price_date_2 (price_date) -- redundant
-->
PRIMARY KEY(price_date, company_id),
KEY company_id (company_id)
decimal(20,2) consumes 9 bytes, no existing stock is likely to exceed 6 digits to the left of the decimal point, and does not handle low-priced stocks that need more than two decimals. Consider DECIMAL(8,2) (4 bytes) or (10,4) (5 bytes). FLOAT (4 bytes) avoids most of the issues, but is limited to 7 significant digits.
Smaller --> more cacheable --> less I/O --> faster.
Don't SELECT stuff you don't need. All you need is
SELECT rating_id, broker_id
and move the expression to the HAVING:
HAVING ((( max(high)... *100) > 5
Please use the JOIN..ON syntax:
FROM rating_lookup, share_prices
WHERE share_prices.company_id = rating_lookup.company_id
AND ...
-->
FROM rating_lookup AS r
JOIN share_prices AS p
ON p.company_id = r.company_id
WHERE ...

Extending Klas' answer, below is a schema of a "summary" table that could be populated with pre-calculated record per-broker, per-company, per-day basis.
Disclaimer: haven't tested on real data but should work.
CREATE TABLE `price_summary` (
`price_id` int(10) NOT NULL,
`broker_id` int(10) NOT NULL DEFAULT '0',
`company_id` int(10) NOT NULL DEFAULT '0',
`start_date` int(10) NOT NULL DEFAULT '0',
`end_date` int(10) NOT NULL DEFAULT '0',
`peak_gain` int(10) NOT NULL DEFAULT '0',
`max_price` int(10) NOT NULL DEFAULT '0',
`percentage_gain` decimal(10,0) NOT NULL DEFAULT '0',
`updated_on` int(10) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Indexes for dumped tables
--
--
-- Indexes for table `price_summary`
--
ALTER TABLE `price_summary`
ADD PRIMARY KEY (`price_id`),
ADD UNIQUE KEY `broker_company_date` (`broker_id`,`company_id`,`start_date`) USING BTREE,
ADD KEY `broker_id` (`broker_id`),
ADD KEY `company_id` (`company_id`),
ADD KEY `start_date` (`start_date`),
ADD KEY `end_date` (`end_date`),
ADD KEY `peak_gain` (`peak_gain`),
ADD KEY `max_price` (`max_price`),
ADD KEY `percentage_gain` (`percentage_gain`);
ALTER TABLE `price_summary`
MODIFY `price_id` int(10) NOT NULL AUTO_INCREMENT;
And a sample query to retrieve desired records.
SELECT
broker_id,
count(company_id) as company_count
FROM
price_summary
WHERE
start_date > {input_timestamp}
AND
end_date < {input_timestamp/now()}
AND
percentage_gain > {input_percentage}
GROUP BY
broker_id

Related

Need help optimizing sql JOIN query and indexes on large tables

I have a query with a JOIN on three tables that is taking a very long time to run. I created an index on one of my tables for the foreign key (user_shared_url_id) and two columns (event_result, enabled) in the WHERE clause, so it's an index of three columns total. There seems to be no different from when I simply use an index of the foreign key (user_shared_url_id). The other two tables are using single column indexes. My users table has about 20,000 rows, but the other two tables are quite large, with ~20 million rows. I can't get a query that takes less than a minute or so to finish. Can anyone think of any potential optimizations I can make to speed this up? Are there other indexes or improvements to my custom index that I can work with?
The tables:
CREATE TABLE `users` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`roles` varchar(500) DEFAULT NULL,
`first_name` varchar(200) DEFAULT NULL,
`last_name` varchar(100) DEFAULT NULL,
`org_id` int(11) unsigned NOT NULL,
`user_email` varchar(100) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
KEY `org_id` (`org_id`),
KEY `status` (`status`),
KEY `org_id_user_id` (`org_id`,`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=162524 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `user_shared_urls` (
`user_id` int(11) unsigned NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_shared_url_id` int(11) NOT NULL AUTO_INCREMENT,
`target_url` text,
PRIMARY KEY (`user_shared_url_id`),
KEY `user_id` (`user_id`),
KEY `user_id_usu_id` (`user_id`,`user_shared_url_id`)
) ENGINE=InnoDB AUTO_INCREMENT=62449105 DEFAULT CHARSET=utf8 |
CREATE TABLE `user_share_events` (
`user_share_event_id` int(11) NOT NULL AUTO_INCREMENT,
`event_result` tinyint(1) unsigned DEFAULT NULL,
`user_shared_url_id` int(11) NOT NULL,
`enabled` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`user_share_event_id`),
KEY `user_shared_url_id` (`user_shared_url_id`),
KEY `usuid_enabled_result` (`user_shared_url_id`,`enabled`,`event_result`)
) ENGINE=InnoDB AUTO_INCREMENT=35067339 DEFAULT CHARSET=utf8 |
My indexes:
CREATE INDEX org_id_user_id ON users(org_id, user_id);
CREATE INDEX user_id_usu_id ON user_shared_urls(user_id, user_shared_url_id);
CREATE INDEX usuid_enabled_result ON user_share_events(user_shared_url_id,enabled,event_result);
My query:
SELECT
users.user_id,
users.user_email "user_email",
users.roles "role",
CONCAT(users.first_name, ' ', users.last_name) "name",
usus.target_url
FROM
users
JOIN user_shared_urls usus ON usus.user_id = users.user_id
JOIN user_share_events uses ON usus.user_shared_url_id = uses.user_shared_url_id
WHERE
users.org_id = 1523
AND
uses.enabled = '1'
AND
uses.event_result = 1
Explain output of the above query:
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| 1 | SIMPLE | users | ref | PRIMARY,org_id,org_id_user_id | org_id | 4 | const | 1235 | NULL |
| 1 | SIMPLE | usus | ref | PRIMARY,user_id,user_id_usu_id | user_id_usu_id | 4 | luster.users.user_id | 213 | NULL |
| 1 | SIMPLE | uses | ref | user_shared_url_id,user_and_service,result_service_occurred,usuid_enabled_result | user_shared_url_id | 4 | luster.usus.user_shared_url_id | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
3 rows in set (0.00 sec)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
Change that index you added to
INDEX(user_shared_url_id, -- = and used for the JOIN
enabled, -- =
event_result) -- Last (not an = test)
The order of columns in an INDEX is important. Start with the columns that are tested for = (or IS NULL).
Then remove the FORCE INDEX and run the EXPLAIN again.
Are these tables in a 1:many relationship? Tell us which way.
Another comment: If event_result really has only two values (true/false) and you are using NULL for false, then change the query from
uses.event_result IS NOT NULL
to
uses.event_result = 1
The point is that the Optimizer likes to optimize =, but sees NOT NULL as being any of 256 possible values; very far from =. With this query change, your index should work. And even be picked without using FORCE.
For this query:
SELECT u.user_id, u.user_email, u.roles "role",
CONCAT(u.first_name, ' ', u.last_name) "name",
usu.target_url
FROM user_shared_urls usu JOIN
users u
ON usu.user_id = u.user_id JOIN
user_share_events usev
ON usus.user_shared_url_id = usev.user_shared_url_id
WHERE u.org_id = 1010 AND
usev.event_result IS NOT NULL AND
usev.enabled = 1;
Probably the best indexes are:
users(org_id, user_id)
user_shared_urls(user_id, user_shared_url_id)
user_share_events(user_shared_url_id, enabled, event_result)
This assumes that the filtering on org_id is more selective than the other filters.

How to improve query speed in mysql query

I'm trying to optimize my query speed as much as possible. A side problem is that I cannot see the exact query speed, because it is rounded to a whole second. The query does get the expected result and takes about 1 second. The final query should be extended even more and for this reason i am trying to improve it. How can this query be improved?
The database is constructed as an electricity utility company. The query should eventually calculate an invoice. I basically have 4 tables, APX price, powerdeals, powerload, eans_power.
APX price is an hourly price, powerload is a quarterly hour volume. First step is joining these two together for each quarter of an hour.
Second step is that I currently select the EAN that is indicated in the table eans_power.
Finally I will join the Powerdeals that currently consist only of a single line and indicates from which hour, until which hour and weekday from/until it should be applicable. It consist of an hourly volume and price. Currently it is only joined on the hours, but it will be extended to weekdays as well.
MYSQL Query:
SELECT l.DATE, l.PERIOD_FROM, a.PRICE, l.POWERLOAD,
SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
FROM timeseries.powerload l
INNER JOIN timeseries.apxprice a ON l.DATE = a.DATE
INNER JOIN contracts.eans_power c ON l.ean = c.ean
LEFT OUTER JOIN timeseries.powerdeals d ON d.period_from <= l.period_from
AND d.period_until >= l.period_until
WHERE l.PERIOD_FROM >= a.PERIOD_FROM
AND l.PERIOD_FROM < a.PERIOD_UNTIL
AND l.DATE >= '2018-01-01'
AND l.DATE <= '2018-12-31'
GROUP BY l.date
Explain:
1 SIMPLE c NULL system PRIMARY,ean NULL NULL NULL 1 100.00 Using temporary; Using filesort
1 SIMPLE l NULL ref EAN EAN 21 const 35481 11.11 Using index condition
1 SIMPLE d NULL ALL NULL NULL NULL NULL 1 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE a NULL ref DATE DATE 4 timeseries.l.date 24 11.11 Using index condition
Create table queries:
apxprice
CREATE TABLE `apxprice` (
 `apx_id` int(11) NOT NULL AUTO_INCREMENT,
 `date` date DEFAULT NULL,
 `period_from` time DEFAULT NULL,
 `period_until` time DEFAULT NULL,
 `price` decimal(10,2) DEFAULT NULL,
 PRIMARY KEY (`apx_id`),
 KEY `DATE` (`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=29664 DEFAULT CHARSET=latin1
powerdeals
CREATE TABLE `powerdeals` (
 `deal_id` int(11) NOT NULL AUTO_INCREMENT,
 `date_deal` date NOT NULL,
 `start_date` date NOT NULL,
 `end_date` date NOT NULL,
 `weekday_from` int(11) NOT NULL,
 `weekday_until` int(11) NOT NULL,
 `period_from` time NOT NULL,
 `period_until` time NOT NULL,
 `hourly_volume` int(11) NOT NULL,
 `price` int(11) NOT NULL,
 `type_deal_id` int(11) NOT NULL,
 `contract_id` int(11) NOT NULL,
 PRIMARY KEY (`deal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1
powerload
CREATE TABLE `powerload` (
 `powerload_id` int(11) NOT NULL AUTO_INCREMENT,
 `ean` varchar(18) DEFAULT NULL,
 `date` date DEFAULT NULL,
 `period_from` time DEFAULT NULL,
 `period_until` time DEFAULT NULL,
 `powerload` int(11) DEFAULT NULL,
 PRIMARY KEY (`powerload_id`),
 KEY `EAN` (`ean`,`date`,`period_from`,`period_until`)
) ENGINE=MyISAM AUTO_INCREMENT=61039 DEFAULT CHARSET=latin1
eans_power
CREATE TABLE `eans_power` (
 `ean` char(19) NOT NULL,
 `contract_id` int(11) NOT NULL,
 `invoicing_id` int(11) NOT NULL,
 `street` varchar(255) NOT NULL,
 `number` int(11) NOT NULL,
 `affix` char(11) NOT NULL,
 `postal` char(6) NOT NULL,
 `city` varchar(255) NOT NULL,
 PRIMARY KEY (`ean`),
 KEY `ean` (`ean`,`contract_id`,`invoicing_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Sample data tables
apx_prices
apx_id,date,period_from,period_until,price
1,2016-01-01,00:00:00,01:00:00,23.86
2,2016-01-01,01:00:00,02:00:00,22.39
powerdeals
deal_id,date_deal,start_date,end_date,weekday_from,weekday_until,period_from,period_until,hourly_volume,price,type_deal_id,contract_id
1,2019-05-15,2018-01-01,2018-12-31,1,5,08:00:00,20:00:00,1000,50,3,1
powerload
powerload_id,ean,date,period_from,period_until,powerload
1,871688520000xxxxxx,2018-01-01,00:00:00,00:15:00,9
2,871688520000xxxxxx,2018-01-01,00:15:00,00:30:00,11
eans_power
ean,contract_id,invoicing_id,street,number,affix,postal,city
871688520000xxxxxx,1,1,road,14,postal,city
Result, without sum() and group by:
DATE,PERIOD_FROM,PRICE,POWERLOAD,a.PRICE*l.POWERLOAD,d.hourly_volume/4,
2018-01-01,00:00:00,27.20,9,244.80,NULL
2018-01-01,00:15:00,27.20,11,299.20,NULL
Result, with sum() and group by:
DATE, PERIOD_FROM, PRICE, POWERLOAD, SUM(a.PRICE*l.POWERLOAD), SUM(d.hourly_volume/4)
2018-01-01,08:00:00,26.33,21,46193.84,12250.0000
2018-01-02, 08:00:00,47.95,43,90623.98,12250.0000
Preliminary optimizations:
Use InnoDB, not MyISAM.
Use CHAR only for constant-lenght strings
Use consistent datatypes (see ean, for example)
For an alternative to using time-to-the-second, check out the Handler counts .
Because range tests (such as l.PERIOD_FROM >= a.PERIOD_FROM AND l.PERIOD_FROM < a.PERIOD_UNTIL) are essentially impossible to optimize, I recommend you expand the table to have one entry per hour (or 1 per quarter hour, if necessary). Looking up a row via a key is much faster than doing a scan of "ALL" the table. 9K rows for an entire year is trivial.
When you get past these recommendations (and the Comments), I will have more tips on optimizing the indexes, especially InnoDB's PRIMARY KEY.

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

Mysql sub query with MAX

I'm trying to speed up a query which currently takes 1.2 seconds to run. The query is:
select
process.exchange,
process.market,
process.volume,
process.bid,
process.ask,
(select MAX(last) FROM a where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE LIMIT 1) as Ask1,
Ask2,
((Ask2 / (select MAX(last) FROM a where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE LIMIT 1 ))
* 100) - 100 as percentage
FROM process
WHERE process.exchange IN('BLAH','BLAH2')
ORDER BY percentage ASC
Table structures:
CREATE TABLE `a` (
`id` int(10) UNSIGNED NOT NULL,
`exchange` varchar(15) NOT NULL,
`market` varchar(15) NOT NULL,
`volume` double UNSIGNED NOT NULL,
`bid` double NOT NULL,
`ask` double UNSIGNED NOT NULL,
`last` double NOT NULL,
`created_date` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `a`
ADD PRIMARY KEY (`id`),
ADD KEY `market` (`market`),
ADD KEY `exchange` (`exchange`),
ADD KEY `created_date` (`created_date`);
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
CREATE TABLE `process` (
`id` int(11) NOT NULL,
`exchange` varchar(20) NOT NULL,
`market` varchar(10) NOT NULL,
`volume` double NOT NULL,
`bid` double NOT NULL,
`ask` double NOT NULL,
`Ask2` double NOT NULL,
`created_date` datetime NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
ALTER TABLE `process`
ADD PRIMARY KEY (`id`),
ADD KEY `created_date` (`created_date`),
ADD KEY `exchange` (`exchange`),
ADD KEY `market` (`market`);
ALTER TABLE `process`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
I need to work out the percentage change between Ask1 and Ask2. Ask1 information is in another table called a that has lots of rows of the prices for each market and exchange.
This is how the data in table "a" looks like:
Exchange Market Volume Bid Ask Last Created Date
BLAH APL 3000 1.2 1.3 1.3 2017-07-26 16:31:00
BLAH APL 3000 1.4 1.5 1.45 2017-07-26 16:30:00
I need Ask1 to get information from here getting the highest value in the last 5 minutes. So the sub query I got is:
select MAX(last)
FROM a
where a.exchange = process.exchange
AND a.market = process.market
AND a.created_date > NOW() - INTERVAL 5 MINUTE
LIMIT 1
I need to select it and also do a calculation with it so at the moment I have to have the sub query there twice.
The MAX seems to slow down the sub query a lot.
This is what EXPLAIN says:
id select type table partitions type possible keys key key_len ref rows filtered Extra
1 PRIMARY process NULL ALL exchange NULL NULL NULL 272 99.63 Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY a NULL ref market,exchange,created_date market 47 cms.process.market 1603 0.17 Using index condition; Using where
3 DEPENDENT SUBQUERY a NULL ref market,exchange,created_date market 47 cms.process.market 1603 0.17 Using index condition; Using where
The "a" table has about a million rows and "process" table has about 630 rows.
Is it possible to speed it up?
Thanks in advance for your help.

Slow MySql query with order by limit with index

I have a query generated by Entity Framework, that looks like this:
SELECT
`Extent1`.`Id`,
`Extent1`.`Name`,
`Extent1`.`ExpireAfterUTC`,
`Extent1`.`FileId`,
`Extent1`.`FileHash`,
`Extent1`.`PasswordHash`,
`Extent1`.`Size`,
`Extent1`.`TimeStamp`,
`Extent1`.`TimeStampOffset`
FROM `files` AS `Extent1` INNER JOIN `containers` AS `Extent2` ON `Extent1`.`ContainerId` = `Extent2`.`Id`
ORDER BY
`Extent1`.`Id` ASC LIMIT 0,10
It runs painfully slow.
I have indexes on files.Id (PK), files.ContainerId(FK), containers.Id(PK) and I don't understand why mysql seems to be doing a full sort before returning the required records, even though there already is an index on the Id column.
Further more, this data is displayed in a grid which supports filters, sorts and pagination and a good use of the indexes is highly required.
Here are the table definitions:
CREATE TABLE `files` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`FileId` varchar(100) NOT NULL,
`ContainerId` int(11) NOT NULL,
`ContainerGuid` binary(16) NOT NULL,
`Guid` binary(16) NOT NULL,
`Name` varchar(1000) NOT NULL,
`ExpireAfterUTC` datetime DEFAULT NULL,
`PasswordHash` binary(32) DEFAULT NULL,
`FileHash` tinyblob NOT NULL,
`Size` bigint(20) NOT NULL,
`TimeStamp` double NOT NULL,
`TimeStampOffset` double NOT NULL,
`FilePostId` int(11) NOT NULL,
`FilePostGuid` binary(16) NOT NULL,
`AttributeId` int(11) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `FileId_UNIQUE` (`FileId`),
KEY `Files_ContainerId_FK` (`ContainerId`),
KEY `Files_AttributeId_FK` (`AttributeId`),
KEY `Files_FileId_index` (`FileId`),
KEY `Files_FilePostId_index` (`FilePostId`),
KEY `Files_Guid_index` (`Guid`),
CONSTRAINT `Files_AttributeId_FK` FOREIGN KEY (`AttributeId`) REFERENCES `attributes` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_ContainerId_FK` FOREIGN KEY (`ContainerId`) REFERENCES `containers` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `Files_FilePostsId_FK` FOREIGN KEY (`FilePostId`) REFERENCES `fileposts` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=977942 DEFAULT CHARSET=utf8;
CREATE TABLE `containers` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) NOT NULL,
`Guid` binary(16) NOT NULL,
`AesKey` binary(32) NOT NULL,
`FileCount` int(10) unsigned NOT NULL DEFAULT '0',
`Size` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`Id`),
KEY `Containers_Guid_index` (`Guid`),
KEY `Containers_Name_index` (`Name`)
) ENGINE=InnoDB AUTO_INCREMENT=76 DEFAULT CHARSET=utf8;
You will notice there are some other relationships in the files table, which I have left out just to simplify the query without affecting the observed behavior.
Here is also an output from EXPLAIN EXTENDED:
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | Extent2 | index | PRIMARY | Containers_Guid_index | 16 | NULL | 9 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | Extent1 | ref | Files_ContainerId_FK | Files_ContainerId_FK | 4 | netachmentgeneraltest.Extent2.Id | 73850 | 100.00 | |
+----+-------------+---------+-------+----------------------+-----------------------+---------+----------------------------------+-------+----------+----------------------------------------------+
Files table has ~900000 records (and counting) and containers has 9.
This issue only occurs when ORDER BY is present.
Also, I can't do much in terms of modifying the query because it is generated by Entity Framework. I did as much as I could with the LINQ query in order to simplify it (at first it had some horrible sub queries which executed even slower).
Query hints (as in force index) are not a solution here either, because EF does not support such features.
I am mostly hoping to find some database level optimizations to do.
For those who didn't spot the tags, the database in question is MySql.
MySQL only uses one index per table. Right now, it's preferring to use the foreign key index so the join is efficient, but that means that the sort is not using an index.
Try creating a compound index on ContainerId, filedID
This is essentially your query:
SELECT e1.*
FROM `files` e1 INNER JOIN
`containers` e2
ON e1.`ContainerId` = e2.`Id`
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
You can try an index on files(id, ContainerId). This might inspire MySQL to use the composite index, focused on the order by.
It would probably be more likely if the query were phrased as:
SELECT e1.*
FROM `files` e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
ORDER BY e1.`Id` ASC
LIMIT 0, 10;
There is one way that does work to use the indexes. However, it depends on something in MySQL that is not documented to work (although it does in practice). The following will read the data in order, but it incurs the overhead of materializing the subquery -- but not for a sort:
SELECT e1.*
FROM (SELECT e1.*
FROM files e1
ORDER BY e1.id ASC
) e1
WHERE EXISTS (SELECT 1 FROM containers e2 WHERE e1.`ContainerId` = e2.`Id`)
LIMIT 0, 10;