Related
I have the following query and plan:
SELECT data.*
FROM data
WHERE channel_id = 1
AND timestamp >= IFNULL((
SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp) / 1000, "%Y-%m-%d"), INTERVAL 1 day)) * 1000
FROM aggregate
WHERE type = '3' AND aggregate.channel_id = data.channel_id
), 0)
AND timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
'1' 'PRIMARY' 'data' NULL 'ref' 'data_unique IDX_ADF3F36372F5A1AA' 'IDX_ADF3F36372F5A1AA' '5' 'const' '860512' '11.11' 'Using where'
'2' 'DEPENDENT SUBQUERY' 'aggregate' NULL 'ref' 'aggregate_unique IDX_B77949FF72F5A1AA' 'aggregate_unique' '7' 'volkszaehler.data.channel_id const' '1473' '100.00' 'Using index'
The data table has a couple of million rows, all tables are indexed:
data: `channel_id`, `timestamp`
aggregate: `type`, `channel_id`, `timestamp`
The query becomes fast when the aggregate.channel_id = data.channel_id is replaced with the actual value of the channel_id of the outer query. And the dependent subquery becomes a simple subquery.
However, I'd rather not do this to allow the query to operate on >1 channel_ids at a time.
Why doesn't MySQL (5.7 homebrew) recognize that this subquery really is not dependent (or is it?) and how could it be optimized?
I've already verified that removing the IFNULL function or pushing it inwards does not cure the problem. I was also not successful in pushing down the subquery another level as suggested in Can i force mysql to perform subquery first? since the data.channel_id reference is no longer known then.
I am using MySQL 5.1 on a Windows Server 2008 (with 4GB RAM) and have the following configuration:
I have 2 MyISAM tables. One is in 1 database (DB1) and has 14 columns, which are mostly varchar. This table has about 5,000,000 rows and is the DB1.games table below. It has a primary key on GameNumber (int(10)).
The other table is the DB2.gameposition and consists of the columns GameNumber (links to
DB1.games) and PositionCode (int(10)). This table has about 400,000,000 rows and there is an index IX_PositionCode on PositionCode.
These 2 databases are on the same server.
I want to run a query on DB2.gameposition to find a particular PositionCode, and have the results sorted by the linking DB1.games.Yr field (smallint(6) - this represents a Year). This sorting of results I only introduced recently. There is an index on this Yr field in DB1.games.
In my stored procedure, I perform the following:
CREATE TEMPORARY TABLE tblGameNumbers(GameNumber INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO tblGameNumbers(GameNumber)
SELECT DISTINCT gp.GameNumber
FROM DB2.gameposition gp
WHERE PositionCode = var_PositionCode LIMIT 1000;
I just get 1000 to make it quicker
And then join it to the DB1.games table.
In order to generate an EXPLAIN from that, I took out the temporary table (I use in the stored procedure) and refactored it as seen in the inner subquery below:
EXPLAIN
SELECT *
FROM DB1.games g
INNER JOIN (SELECT DISTINCT gp.GameNumber
FROM DB2.gameposition gp
WHERE PositionCode = 669312116 LIMIT 1000
) B ON g.GameNumber = B.GameNumber
ORDER BY g.Yr DESC
LIMIT 0,28
Running the EXPLAIN above, I see the following:
1, 'PRIMARY', '', 'ALL', '', '', '', '', 1000, 'Using temporary; Using filesort'
1, 'PRIMARY', 'g', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'B.GameNumber', 1, ''
2, 'DERIVED', 'gp', 'ref', 'IX_PositionCode', 'IX_PositionCode', '4', '', 1889846, 'Using temporary'
The query used to be almost instant before I introduced the ORDER BY clause. Now, sometimes it is quick (depending on different PositionCode), but other times it can take up to 10 seconds to return the rows. Before I introduced the sorting, it was always virtually instantaneous. Unfortunately, I am not too proficient in interpreting the EXPLAIN output. Or how to make the query faster.
Any help would be greatly appreciated!
Thanks in advance,
Tim
Without the order by, your limit means the first 28 results are returned and then the query stops. With order by, all results need to be retrieved so they can be sorted and the first 28 returned.
The explain shows what MySql is doing:
sort 5000000 games records by yr
for each games record from sorted list
get the games record by primary key (to get all the columns)
read gamepositions by position code
if it does not match gamenumber, discard it
when 1000 matches found, stop reading
end read
end for
Try this instead:
select distinct ... from gameposition gp
inner join games g on g.gamenumber = gp.gamenumber
where gp.positioncode = ...
order by g.yr limit ...
Suppose I do
EXPLAIN SELECT * FROM xyz e
JOIN abc cs ON e.rss = 'text' AND e.rdd = cs.xid
JOIN def c ON cs.cid = c.xid
JOIN jkl s ON c.sid = s.nid
WHERE s.flag = 0;
This would reveal:
1, 'SIMPLE', 's', 'ref', 'PRIMARY,Index_8', 'x1', '1', 'const', 1586, 'Using index; Using temporary'
1, 'SIMPLE', 'c', 'ref', 'PRIMARY,sid', 'x2', '4', 's.nid', 40, 'Using index'
1, 'SIMPLE', 'cs', 'ref', 'PRIMARY,cid', 'x3', '4', 'c.nid', 1, 'Using index'
1, 'SIMPLE', 'e', 'ref', 'rss,rdd', 'x4', '141', 'const,cs.nid', 12, 'Using where; Using index; Distinct'
However, suppose I do
EXPLAIN SELECT * FROM xyz e
JOIN abc cs ON e.rss = 'text' AND e.rdd = cs.xid
JOIN def c ON cs.cid = c.xid
JOIN jkl s ON c.sid = s.nid
WHERE s.flag = 0 AND c.range_field <= 10;
This would reveal
1, 'SIMPLE', 'c', 'ALL', 'PRIMARY,school_nid,Index_5', '', '', '', 56074, 'Using where; Using temporary'
1, 'SIMPLE', 's', 'eq_ref', 'PRIMARY,Index_8', 'PRIMARY', '4', 'c.school_nid', 1, 'Using where'
1, 'SIMPLE', 'cs', 'ref', 'PRIMARY,cid', 'x3', '4', 'c.nid', 1, 'Using index'
1, 'SIMPLE', 'e', 'ref', 'rss,rdd', 'x4', '141', 'const,cs.nid', 12, 'Using where; Using index; Distinct'
ie. while the first query is only scannding 1586 rows, this one is scanning over 56074 rows
This is despite the fact that the second query is supposed to return a SUBSET of the first query's results.
Ie. out of the 1586 results of the first query, return those who have c.range_field <= 10.
Is there a way to modify this query so that the number of rows scanned will be <=1586 since the result of this second query is just a subset of the result of the first query
The fact that the 2nd query is a subset of the 1st one does not matter from the performance perspective.
In the first query, there's no filter involved for the c table, while in the 2nd one there's one on c.range_field.
As you can see in the 1st explain plan (Using index), the first query can compute the resultset ONLY using the index, which is a fast operation (from the index, mysql can deduce the location of the wanted rows and only read these ones which explains the lower amount of scans). In the 2nd explain plan, MYSQL has to compute the resultset using common database hd blocks which is a slow operation (full table scan: the rows are read one by one and evaluated that way).
The solution for you is to evaluate the possibility of including the c.range_field column to one of the possible keys indices commented in the c column of the 2nd explain plan.
As you are filtering by c.range_field and def c is the third table in your FROM clause, the filtering happens on the result set of joining set of the three tables as there are no indexes. I would suggest you go by Sebas' answer and create an index on c.range_field.
An alternative to this, which I would use myself, is to set def as the driving table. This means, start your FROM clause with def table, preferably followed by jkl. This would filter the rows on the first and second tables before joining them with the third and the fourth.
I have a report that pulls information from a summary table and ideally will pull from two periods at once, the current period and the previous period. My table is structured thusly:
report_table
item_id INT(11)
amount Decimal(8,2)
day DATE
The primary key is item_id, day. This table currently holds 37k records with 92 different items and 1200 different days. I am using Mysql 5.1.
Here is my select statement:
SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`,
sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day`
FROM `client_location_item` AS `cla`
INNER JOIN `client_location` AS `cl`
INNER JOIN `report_item_day` AS `r`
INNER JOIN `report_item_day` AS `r2`
WHERE (r.item_id = cla.item_id)
AND (cla.location_id = cl.location_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
AND (cl.location_code = 'LOCATION')
group by month(r.day);
At present this query takes 2.2 seconds in my environment. The explain plan is:
'1', 'SIMPLE', 'cl', 'ALL', 'PRIMARY', NULL, NULL, NULL, '33', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'cla', 'ref', 'PRIMARY,location_id,location_id_idxfk', 'location_id', '4', 'cl.location_id', '1', 'Using index'
'1', 'SIMPLE', 'r', 'ref', 'PRIMARY', 'PRIMARY', '4', cla.asset_id', '211', 'Using where'
'1', 'SIMPLE', 'r2', 'ALL', NULL, NULL, NULL, NULL, '37602', 'Using where; Using join buffer'
If I add an index to the "day" column, instead of my query running faster, it runs in 2.4 seconds. The explain plan for the query at that time is:
'1', 'SIMPLE', 'r2', 'range', 'report_day_day_idx', 'report_day_day_idx', '3', NULL, '1092', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'r', 'range', 'PRIMARY,report_day_day_idx', 'report_day_day_idx', '3', NULL, '1180', 'Using where; Using join buffer'
'1', 'SIMPLE', 'cla', 'eq_ref', 'PRIMARY,location_id,location_id_idxfk', 'PRIMARY', '4', 'r.asset_id', '1', 'Using where'
'1', 'SIMPLE', 'cl', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', cla.location_id', '1', 'Using where'
According to the MySQL documentation the most efficient group by execution is when there is an index to retrieve the grouping columns. But it also states that the only functions that can really make use of the indexes are min() and max(). Does anyone have any ideas what I can do to further optimize my query? Or why, my 'indexed' version runs more slowly despite having fewer rows overall than the non-indexed version?
Create table:
CREATE TABLE `report_item_day` (
`item_id` int(11) NOT NULL,
`amount` decimal(8,2) DEFAULT NULL,
`day` date NOT NULL,
PRIMARY KEY (`item_id`,`day`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Of course the other option I have is to make 2 db calls, one for each time period. If I do that, straight away the query for each drops to 0.031s. Still I feel like there should be a way to optimize this query to achieve comparable results.
Three things:
1) I don't see in the WHERE clause something for r2.item_id. Without it, r2 is factored in via a Cartesian Product and will sum up other item_ids as well.
Change your original query to look like this:
SELECT r.day
,sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`
,sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day`
FROM `client_location_item` AS `cla`
INNER JOIN `client_location` AS `cl`
INNER JOIN `report_item_day` AS `r`
INNER JOIN `report_item_day` AS `r2`
WHERE (r.item_id = cla.item_id) AND (r2.item_id = cla.item_id) AND (cla.location_id = cl.location_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
AND (cl.location_code = 'LOCATION')
group by month(r.day);
See if the EXPLAIN PLAN changes after this.
2) Do this : ALTER TABLE report_itme_day ADD INDEX (date,item_id);
This will index scan the date instead of the item id.
See if the EXPLAIN PLAN changes after this.
3) Last resort : Refactor the query
SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`, sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day` FROM
(SELECT CLA.item_id FROM client_location CL,client_location_item CLA WHERE CLA.location_code = 'LOCATION' AND CLA.location_id=CL.location_id) A,
report_item_day r,
report_item_day r2,
WHERE (r.item_id = A.item_id)
AND (r2.item_id = A.item_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
group by month(r.day);
This can definitely be refactored further. I just refactorted it a littte.
Give it a Try !!!
Why you are selecting day when you are grouping on month? I don't entirely what you would like the output of your query to look like.
I hate MySQL for allowing that!
I will show you two approaches to query for 2 periods in one go. The first one is a union all query. It should do what your 2-query approach already does. It will return 2 rows, one for each period.
select sum(r.amount) / (count(distinct r.item_id) * count(r.day) ) as curr_avg
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
union all
select sum(r.amount) / (count(distinct r.item_id) * count(r.day) ) as prev_avg
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and r.day between from_unixtime(1291161600) and from_unixtime(1293839999)
The following approach is potentially faster than the above, but it is much uglier and harder to read.
select period
,sum(amount) / (count(distinct item_id) * count(day) ) as avg_day
from (select case when r.day between from_unixtime(1293840000) and from_unixtime(1296518399) then 'Current'
when r.day between from_unixtime(1291161600) and from_unixtime(1293839999) then 'Previous'
end as period
,r.amount
,r.item_id
,r.day
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and ( r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
or r.day between from_unixtime(1291161600) and from_unixtime(1293839999)
)
) v
group
by period;
Note 1: You didn't give us DDL, so I can't test if the syntax is correct
Note 2: Consider creating a calendar table, keyed by DATE. Add appropriate columns such as MONTH, WEEK, FINANCIAL_YEAR etcetera, to be able to support the reporting you are doing. The queries will be much much easier to write and understand.
First of all (and this might be just aesthetics), why aren't you using ON / USING clauses in your INNER JOIN ? Why make the JOIN on the WHERE clause instead of the actual part, in the FROM?
Second, my guess with the indexed vs non-indexed issue is that now it has to check against an index first for the records that matches said range, whereas in the non-indexed version memory goes faster than disk. But I can't be too sure.
Now, for the query. Here's part of the doc. on JOINs:
The `conditional_expr` used with ON is any conditional expression of the form
that can be used in a WHERE clause. Generally, you should use the ON clause for
conditions that specify how to join tables, and the WHERE clause to restrict
which rows you want in the result set.
So yeah, move the join conditions to the FROM clause. Also, you might be interested in the Index hint syntax: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
And lastly, you could try using a view, but be wary of performance issues: http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/
Good luck.
I have a slow MySQl query that takes about 15 seconds to run. So I did some investigation and discovered I can use the EXPLAIN statement to see where the bottleneck is. So I did that, but really can't decipher these results.
If I had to take a stab, I would say the first line is a problem as there are null values for the keys. However, if that is so, I can't understand why as the classType1 table IS indexed on the appropriate columns.
Could someone please offer some explanation as to where the problems might lay?
Thanks so much.
EDIT: Ok, I have added the query as well hoping that it might offer some more light to the issues. Unfortunately I just won't be able to explain to you what it's doing, so if any help could be offered based on what's provided, that would be great.
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'PRIMARY', 'classType1', 'system', 'PRIMARY', '', '', '', 1, 'Using temporary; Using filesort'
1, 'PRIMARY', 'user', 'const', 'PRIMARY', 'PRIMARY', '4', 'const', 1, 'Using index'
1, 'PRIMARY', 'class1', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'const', 3, 'Using where'
1, 'PRIMARY', 'classVal1', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'ccms.class1.classificationValue', 1, 'Using where; Using index'
1, 'PRIMARY', 'class2', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationValue', '4', 'ccms.class1.classificationValue', 368, 'Using where'
1, 'PRIMARY', 'album', 'eq_ref', 'PRIMARY,IX_albumType,IX_albumIsDisabled,IX_albumIsActive,IX_albumCSI,IX_albumOwner,IX_albumPublishDate', 'PRIMARY', '4', 'ccms.class2.classificationObjectId', 1, 'Using where'
1, 'PRIMARY', 'profile', 'eq_ref', 'PRIMARY,IX_profileUserId', 'PRIMARY', '4', 'ccms.album.albumOwnerId', 1, 'Using where'
1, 'PRIMARY', 'albumUser', 'eq_ref', 'PRIMARY,IX_userIsAccountPublic', 'PRIMARY', '4', 'ccms.profile.profileUserId', 1, 'Using where'
1, 'PRIMARY', 'photo', 'eq_ref', 'PRIMARY,FK_photoAlbumId', 'PRIMARY', '8', 'ccms.album.albumCoverPhotoId', 1, 'Using where'
2, 'DEPENDENT SUBQUERY', 'class3', 'ref', 'IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'ccms.class2.classificationObjectId', 1, 'Using where'
3, 'DEPENDENT SUBQUERY', 'class4', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'const', 3, 'Using where'
Query is...
SELECT profileDisplayName,albumPublishDate,profileId,albumId,albumPath,albumName,albumCoverPhotoId,photoFilename,fnAlbumGetNudityClassification(albumId) AS albumNudityClassification,fnAlbumGetNumberOfPhotos(albumId,1,0) AS albumNumberOfPhotos,albumDescription,albumCSD,albumUSD,photoId,fnGetAlbumPhotoViewCount(albumId) AS albumNumberOfPhotoViews
FROM user
-- Join User Classifications
INNER JOIN classification class1
ON class1.classificationObjectId = user.userId AND class1.classificationObjectType = 1
INNER JOIN classificationType classType1
ON class1.classificationType = classType1.classificationTypeId
INNER JOIN classificationTypeValue classVal1
ON class1.classificationValue = classVal1.classificationTypeValueId
-- Join Album Classifications
INNER JOIN classification class2
ON class2.classificationObjectType = 3
AND class1.classificationType = class2.classificationType AND class1.classificationValue = class2.classificationValue
INNER JOIN album
ON album.albumId = class2.classificationObjectId
AND albumIsActive = 1
AND albumIsDisabled = 0
LEFT JOIN profile
ON albumOwnerId = profileId AND albumOwnerType = 0
LEFT JOIN user albumUser
ON albumUser.userId = profileUserId
AND albumUser.userIsAccountPublic = 1
LEFT JOIN photo
ON album.albumId = photo.photoAlbumId AND photo.photoId = album.albumCoverPhotoId
WHERE 0 =
(
SELECT COUNT(*)
FROM classification class3
WHERE class3.classificationObjectType = 3
AND class3.classificationObjectId = class2.classificationObjectId
AND NOT EXISTS
(
SELECT 1
FROM classification class4
WHERE class4.classificationObjectType = 1
AND class4.classificationObjectId = user.userId
AND class4.classificationType = class3.classificationType AND class4.classificationValue = class3.classificationValue
)
)
AND class1.classificationObjectId = 8
AND (albumPublishDate <= {ts '2011-01-28 20:48:39'} || albumCSI =
8)
AND album.albumType NOT IN (1)
AND fnAlbumGetNumberOfPhotos(albumId,1,0) > 0
AND albumUser.userIsAccountPublic IS NOT NULL
ORDER BY albumPublishDate DESC
LIMIT 0, 15
without seeing the actual structure or query, I would look for 2 things...
I know you said they are... but... make sure all the appropriate fields are indexed
example: you have an index on field "active" (to filter out only active records) and another one (let's say primary key index) on id_classType1... unless you do a unique index on "id_classType1, active", a query similar to this:
SELECT * FROM classType1 WHERE id_classType1 IN (1,2,3) AND active = 1
... would need to either combine those indexes or look them up separately. However, if you have an index on both, id_classType1 AND active (and that index is a type UNIQUE), SQL will use it and find the combinations much quicker.
secondly, you seem to have dependent subqueries in your EXPLAIN statement, which can alone slow your query down quite a lot... have a look here for a possible workaround: http://forums.mysql.com/read.php?115,128477,128477
my first try would be to replace those subqueries by JOINs and then perhaps try to optimize it further by removing them altogether (if possible) or making a separate queries for those subqueries
EDIT
this query is more complex that any other I've ever seen, so take these as somehow limited tips:
try removing subqueries (just put anything you know will work and give results there temporarily)
I see a lot of INNER JOINS in the query, which can be quite slow, as they need to join all rows from both tables (source: http://dev.mysql.com/doc/refman/5.0/en/join.html) - maybe there's a way to replace them somehow?
also - and this is something I remember from the past (might not be true or relevant) - shouldn't WHERE-like statements be in the WHERE clause, not the JOIN clause? for example, I would put the following from the 1st JOIN into a WHERE section: class1.classificationObjectType = 1
that's just about all - and one question: how many rows do those table have? no need for exact numbers, just trying to see on approx. how many records the query runs, as it takes so long
Ok ,so through process of elimination I managed to find the issue.
In my column list, I had a call: fnGetAlbumPhotoViewCount(albumId) AS albumNumberOfPhotoViews
One of the tables joined in that call had a column that was not indexed. Simple enough.
Now my question is, EXPLAIN could not show me that. If you look, there is in fact no reference to the pageview table or columns anywhere in the EXPLAIN output.
So what tool could I have used to weed out this issue??
Thanks