I have a slow MySQl query that takes about 15 seconds to run. So I did some investigation and discovered I can use the EXPLAIN statement to see where the bottleneck is. So I did that, but really can't decipher these results.
If I had to take a stab, I would say the first line is a problem as there are null values for the keys. However, if that is so, I can't understand why as the classType1 table IS indexed on the appropriate columns.
Could someone please offer some explanation as to where the problems might lay?
Thanks so much.
EDIT: Ok, I have added the query as well hoping that it might offer some more light to the issues. Unfortunately I just won't be able to explain to you what it's doing, so if any help could be offered based on what's provided, that would be great.
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, 'PRIMARY', 'classType1', 'system', 'PRIMARY', '', '', '', 1, 'Using temporary; Using filesort'
1, 'PRIMARY', 'user', 'const', 'PRIMARY', 'PRIMARY', '4', 'const', 1, 'Using index'
1, 'PRIMARY', 'class1', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'const', 3, 'Using where'
1, 'PRIMARY', 'classVal1', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'ccms.class1.classificationValue', 1, 'Using where; Using index'
1, 'PRIMARY', 'class2', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationValue', '4', 'ccms.class1.classificationValue', 368, 'Using where'
1, 'PRIMARY', 'album', 'eq_ref', 'PRIMARY,IX_albumType,IX_albumIsDisabled,IX_albumIsActive,IX_albumCSI,IX_albumOwner,IX_albumPublishDate', 'PRIMARY', '4', 'ccms.class2.classificationObjectId', 1, 'Using where'
1, 'PRIMARY', 'profile', 'eq_ref', 'PRIMARY,IX_profileUserId', 'PRIMARY', '4', 'ccms.album.albumOwnerId', 1, 'Using where'
1, 'PRIMARY', 'albumUser', 'eq_ref', 'PRIMARY,IX_userIsAccountPublic', 'PRIMARY', '4', 'ccms.profile.profileUserId', 1, 'Using where'
1, 'PRIMARY', 'photo', 'eq_ref', 'PRIMARY,FK_photoAlbumId', 'PRIMARY', '8', 'ccms.album.albumCoverPhotoId', 1, 'Using where'
2, 'DEPENDENT SUBQUERY', 'class3', 'ref', 'IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'ccms.class2.classificationObjectId', 1, 'Using where'
3, 'DEPENDENT SUBQUERY', 'class4', 'ref', 'IX_classificationType,IX_classificationValue,IX_classificationObjectType,IX_classificationObjectId', 'IX_classificationObjectId', '8', 'const', 3, 'Using where'
Query is...
SELECT profileDisplayName,albumPublishDate,profileId,albumId,albumPath,albumName,albumCoverPhotoId,photoFilename,fnAlbumGetNudityClassification(albumId) AS albumNudityClassification,fnAlbumGetNumberOfPhotos(albumId,1,0) AS albumNumberOfPhotos,albumDescription,albumCSD,albumUSD,photoId,fnGetAlbumPhotoViewCount(albumId) AS albumNumberOfPhotoViews
FROM user
-- Join User Classifications
INNER JOIN classification class1
ON class1.classificationObjectId = user.userId AND class1.classificationObjectType = 1
INNER JOIN classificationType classType1
ON class1.classificationType = classType1.classificationTypeId
INNER JOIN classificationTypeValue classVal1
ON class1.classificationValue = classVal1.classificationTypeValueId
-- Join Album Classifications
INNER JOIN classification class2
ON class2.classificationObjectType = 3
AND class1.classificationType = class2.classificationType AND class1.classificationValue = class2.classificationValue
INNER JOIN album
ON album.albumId = class2.classificationObjectId
AND albumIsActive = 1
AND albumIsDisabled = 0
LEFT JOIN profile
ON albumOwnerId = profileId AND albumOwnerType = 0
LEFT JOIN user albumUser
ON albumUser.userId = profileUserId
AND albumUser.userIsAccountPublic = 1
LEFT JOIN photo
ON album.albumId = photo.photoAlbumId AND photo.photoId = album.albumCoverPhotoId
WHERE 0 =
(
SELECT COUNT(*)
FROM classification class3
WHERE class3.classificationObjectType = 3
AND class3.classificationObjectId = class2.classificationObjectId
AND NOT EXISTS
(
SELECT 1
FROM classification class4
WHERE class4.classificationObjectType = 1
AND class4.classificationObjectId = user.userId
AND class4.classificationType = class3.classificationType AND class4.classificationValue = class3.classificationValue
)
)
AND class1.classificationObjectId = 8
AND (albumPublishDate <= {ts '2011-01-28 20:48:39'} || albumCSI =
8)
AND album.albumType NOT IN (1)
AND fnAlbumGetNumberOfPhotos(albumId,1,0) > 0
AND albumUser.userIsAccountPublic IS NOT NULL
ORDER BY albumPublishDate DESC
LIMIT 0, 15
without seeing the actual structure or query, I would look for 2 things...
I know you said they are... but... make sure all the appropriate fields are indexed
example: you have an index on field "active" (to filter out only active records) and another one (let's say primary key index) on id_classType1... unless you do a unique index on "id_classType1, active", a query similar to this:
SELECT * FROM classType1 WHERE id_classType1 IN (1,2,3) AND active = 1
... would need to either combine those indexes or look them up separately. However, if you have an index on both, id_classType1 AND active (and that index is a type UNIQUE), SQL will use it and find the combinations much quicker.
secondly, you seem to have dependent subqueries in your EXPLAIN statement, which can alone slow your query down quite a lot... have a look here for a possible workaround: http://forums.mysql.com/read.php?115,128477,128477
my first try would be to replace those subqueries by JOINs and then perhaps try to optimize it further by removing them altogether (if possible) or making a separate queries for those subqueries
EDIT
this query is more complex that any other I've ever seen, so take these as somehow limited tips:
try removing subqueries (just put anything you know will work and give results there temporarily)
I see a lot of INNER JOINS in the query, which can be quite slow, as they need to join all rows from both tables (source: http://dev.mysql.com/doc/refman/5.0/en/join.html) - maybe there's a way to replace them somehow?
also - and this is something I remember from the past (might not be true or relevant) - shouldn't WHERE-like statements be in the WHERE clause, not the JOIN clause? for example, I would put the following from the 1st JOIN into a WHERE section: class1.classificationObjectType = 1
that's just about all - and one question: how many rows do those table have? no need for exact numbers, just trying to see on approx. how many records the query runs, as it takes so long
Ok ,so through process of elimination I managed to find the issue.
In my column list, I had a call: fnGetAlbumPhotoViewCount(albumId) AS albumNumberOfPhotoViews
One of the tables joined in that call had a column that was not indexed. Simple enough.
Now my question is, EXPLAIN could not show me that. If you look, there is in fact no reference to the pageview table or columns anywhere in the EXPLAIN output.
So what tool could I have used to weed out this issue??
Thanks
Related
I have an SQL query (MySQL) I use for gathering the details of new cases (jobs) that are generated from clients who are referred by a particular referring company. Importantly, we need to only select those where it is the client's first case, otherwise repeat clients would register as being referred over and over and that's not what we're trying to get. In our system we have clients and cases tables and they are connected by an m:n table (in practice is just 1:n), so that is used to relate cases with their corresponding clients.
The requirement for only returning values where it is the client's first case is giving me trouble. To do that, I have a subquery in the WHERE clause that checks if a particular case is the client's first by looking for any other cases by that client. This gives correct output, but makes the query run quite slowly and I'm not sure what to do about it, which is why I turn to you StackOverflow to find a better way. If I remove that subquery, it runs instantly. I have tried altering the subquery to check COUNT(*) = 0 instead of NOT EXISTS. I have also altered it to check for any lesser case_ids instead of checking for earlier case created dates. I've tried tweaking other things and in each case I got similar slow results (~45 seconds vs instant). I don't know how to rework things to make it not a dependent subquery. One alternative I've thought of is to just put in a simple field into the cases table denoting if it's the client's first case or not, but that brings up other problems and isn't what I want to do if possible.
Note: I can't rule out clients if they have more than one case, since I need the first one. I can't
I was going to simplify the query for you but then I realized that I would also have to figure out how that would come out in the EXPLAIN results to modify those also so I didn't. We have a clients and a contacts table and contacts are children of clients, and the contacts are the ones with cases and have the referred by value saved, but we're going by clients for purposes of determining if they had a case previously.
Try 1:
SELECT c2.case_id AS Case_ID, [other stuff]
FROM client_contact_cases c1 LEFT JOIN cases c2 ON (c1.case_id = c2.case_id)
LEFT JOIN client_contact c3 ON (c1.client_contact_id = c3.client_contact_id)
WHERE c2.case_created_date > '2013-05-01 00:00:00' AND c2.case_created_date < '2013-10-31 23:59:59'
AND c3.refer_by = 'Referring Partner #1'
AND NOT EXISTS (
SELECT c2_a.case_id FROM client_contact_cases c1_a LEFT JOIN cases c2_a ON (c1_a.case_id = c2_a.case_id)
WHERE c1_a.client_id = c1.client_id AND c2_a.case_created_date < c2.case_created_date
)
ORDER BY Case_ID ASC
EXPLAIN Result:
'1', 'PRIMARY', 'c3', 'ALL', 'PRIMARY', NULL, NULL, NULL, '29340', 'Using where; Using temporary; Using filesort'
'1', 'PRIMARY', 'c1', 'ref', 'client_has_cases_FKIndex1,client_contact_has_cases_FKIndex2', 'client_has_cases_FKIndex1', '4', 'prod1_cases_clients.c3.client_contact_id', '1', 'Using index'
'1', 'PRIMARY', 'c2', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'prod1_cases_clients.c1.case_id', '1', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'c1_a', 'index', 'client_contact_has_cases_FKIndex2', 'client_contact_has_cases_FKIndex2', '4', NULL, '33682', 'Using where; Using index'
'2', 'DEPENDENT SUBQUERY', 'c2_a', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'prod1_cases_clients.c1_a.case_id', '1', 'Using where'
Here is the EXPLAIN result if I change the subquery to:
...SELECT c1_a.case_id FROM client_contact_cases c1_a
WHERE c1_a.client_id = c1.client_id AND c1_a.case_id < c2.case_id
EXPLAIN:
'1', 'PRIMARY', 'c3', 'ALL', 'PRIMARY', NULL, NULL, NULL, '29340', 'Using where; Using temporary; Using filesort'
'1', 'PRIMARY', 'c1', 'ref', 'client_contact_has_cases_FKIndex1,client_contact_has_cases_FKIndex2', 'client_contact_has_cases_FKIndex1', '4', 'prod1_cases_clients.c3.client_contact_id', '1', 'Using index'
'1', 'PRIMARY', 'c2', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'prod1_cases_clients.c1.case_id', '1', 'Using where'
'2', 'DEPENDENT SUBQUERY', 'c1_a', 'ALL', 'client_contact_has_cases_FKIndex2', NULL, NULL, NULL, '33682', 'Range checked for each record (index map: 0x4)'
What is up with the 'Range checked for each record (index map: 0x4)'? There should be an index on everything. Any help is greatly appreciated!
Aha, I figured out a subquery to use that's not dependent! I am instead checking for case_ids that are in a list of clients' 1st cases in the subquery. Now it runs in less than half a second. I would love to put something in the WHERE clause here to whittle it down more. I can't put in the date range as it would prevent the query from checking for previous cases and give me slightly more results than it should, but I did later add in a c3_a.refer_by = 'Referring Partner #1'.
The subquery is now:
AND c2.case_id IN (
SELECT MIN(c2_a.case_id)
FROM client_contact_cases c1_a LEFT JOIN cases c2_a ON (c1_a.case_id = c2_a.case_id)
GROUP BY c1_a.client_id
)
Suppose I do
EXPLAIN SELECT * FROM xyz e
JOIN abc cs ON e.rss = 'text' AND e.rdd = cs.xid
JOIN def c ON cs.cid = c.xid
JOIN jkl s ON c.sid = s.nid
WHERE s.flag = 0;
This would reveal:
1, 'SIMPLE', 's', 'ref', 'PRIMARY,Index_8', 'x1', '1', 'const', 1586, 'Using index; Using temporary'
1, 'SIMPLE', 'c', 'ref', 'PRIMARY,sid', 'x2', '4', 's.nid', 40, 'Using index'
1, 'SIMPLE', 'cs', 'ref', 'PRIMARY,cid', 'x3', '4', 'c.nid', 1, 'Using index'
1, 'SIMPLE', 'e', 'ref', 'rss,rdd', 'x4', '141', 'const,cs.nid', 12, 'Using where; Using index; Distinct'
However, suppose I do
EXPLAIN SELECT * FROM xyz e
JOIN abc cs ON e.rss = 'text' AND e.rdd = cs.xid
JOIN def c ON cs.cid = c.xid
JOIN jkl s ON c.sid = s.nid
WHERE s.flag = 0 AND c.range_field <= 10;
This would reveal
1, 'SIMPLE', 'c', 'ALL', 'PRIMARY,school_nid,Index_5', '', '', '', 56074, 'Using where; Using temporary'
1, 'SIMPLE', 's', 'eq_ref', 'PRIMARY,Index_8', 'PRIMARY', '4', 'c.school_nid', 1, 'Using where'
1, 'SIMPLE', 'cs', 'ref', 'PRIMARY,cid', 'x3', '4', 'c.nid', 1, 'Using index'
1, 'SIMPLE', 'e', 'ref', 'rss,rdd', 'x4', '141', 'const,cs.nid', 12, 'Using where; Using index; Distinct'
ie. while the first query is only scannding 1586 rows, this one is scanning over 56074 rows
This is despite the fact that the second query is supposed to return a SUBSET of the first query's results.
Ie. out of the 1586 results of the first query, return those who have c.range_field <= 10.
Is there a way to modify this query so that the number of rows scanned will be <=1586 since the result of this second query is just a subset of the result of the first query
The fact that the 2nd query is a subset of the 1st one does not matter from the performance perspective.
In the first query, there's no filter involved for the c table, while in the 2nd one there's one on c.range_field.
As you can see in the 1st explain plan (Using index), the first query can compute the resultset ONLY using the index, which is a fast operation (from the index, mysql can deduce the location of the wanted rows and only read these ones which explains the lower amount of scans). In the 2nd explain plan, MYSQL has to compute the resultset using common database hd blocks which is a slow operation (full table scan: the rows are read one by one and evaluated that way).
The solution for you is to evaluate the possibility of including the c.range_field column to one of the possible keys indices commented in the c column of the 2nd explain plan.
As you are filtering by c.range_field and def c is the third table in your FROM clause, the filtering happens on the result set of joining set of the three tables as there are no indexes. I would suggest you go by Sebas' answer and create an index on c.range_field.
An alternative to this, which I would use myself, is to set def as the driving table. This means, start your FROM clause with def table, preferably followed by jkl. This would filter the rows on the first and second tables before joining them with the third and the fourth.
The following query below executes in 17 seconds in a view. There are 450,000 rows. I have an index on the two columns being joined and they are FK. The join columns are BIGINTS. Is there anyway to speed this guy up?
SELECT c.id, sce.user_id
FROM sims_classroom c
JOIN sims_class_enrollment sce ON c.id = sce.classroom_id
EXPLAIN
'1', 'SIMPLE', 'c', 'index', 'PRIMARY', 'PRIMARY', '8', NULL, '211213', 'Using index'
'1', 'SIMPLE', 'sce', 'ref', 'fk_class_enrollment_classroom_id', 'fk_class_enrollment_classroom_id', '9', 'ngsp.c.id', '1', 'Using where'
ROWS
sims_classroom = 200100
sims_class_enrollment = 476396
It will slow down writes a little but since you're only one column short of having everything you need in your index, I would do a two column index for sce:
classroom_id, user_id
This would result in mysql not even needing to go to the actual table (both would be 'Using index' in the explain).
I have a report that pulls information from a summary table and ideally will pull from two periods at once, the current period and the previous period. My table is structured thusly:
report_table
item_id INT(11)
amount Decimal(8,2)
day DATE
The primary key is item_id, day. This table currently holds 37k records with 92 different items and 1200 different days. I am using Mysql 5.1.
Here is my select statement:
SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`,
sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day`
FROM `client_location_item` AS `cla`
INNER JOIN `client_location` AS `cl`
INNER JOIN `report_item_day` AS `r`
INNER JOIN `report_item_day` AS `r2`
WHERE (r.item_id = cla.item_id)
AND (cla.location_id = cl.location_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
AND (cl.location_code = 'LOCATION')
group by month(r.day);
At present this query takes 2.2 seconds in my environment. The explain plan is:
'1', 'SIMPLE', 'cl', 'ALL', 'PRIMARY', NULL, NULL, NULL, '33', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'cla', 'ref', 'PRIMARY,location_id,location_id_idxfk', 'location_id', '4', 'cl.location_id', '1', 'Using index'
'1', 'SIMPLE', 'r', 'ref', 'PRIMARY', 'PRIMARY', '4', cla.asset_id', '211', 'Using where'
'1', 'SIMPLE', 'r2', 'ALL', NULL, NULL, NULL, NULL, '37602', 'Using where; Using join buffer'
If I add an index to the "day" column, instead of my query running faster, it runs in 2.4 seconds. The explain plan for the query at that time is:
'1', 'SIMPLE', 'r2', 'range', 'report_day_day_idx', 'report_day_day_idx', '3', NULL, '1092', 'Using where; Using temporary; Using filesort'
'1', 'SIMPLE', 'r', 'range', 'PRIMARY,report_day_day_idx', 'report_day_day_idx', '3', NULL, '1180', 'Using where; Using join buffer'
'1', 'SIMPLE', 'cla', 'eq_ref', 'PRIMARY,location_id,location_id_idxfk', 'PRIMARY', '4', 'r.asset_id', '1', 'Using where'
'1', 'SIMPLE', 'cl', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', cla.location_id', '1', 'Using where'
According to the MySQL documentation the most efficient group by execution is when there is an index to retrieve the grouping columns. But it also states that the only functions that can really make use of the indexes are min() and max(). Does anyone have any ideas what I can do to further optimize my query? Or why, my 'indexed' version runs more slowly despite having fewer rows overall than the non-indexed version?
Create table:
CREATE TABLE `report_item_day` (
`item_id` int(11) NOT NULL,
`amount` decimal(8,2) DEFAULT NULL,
`day` date NOT NULL,
PRIMARY KEY (`item_id`,`day`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Of course the other option I have is to make 2 db calls, one for each time period. If I do that, straight away the query for each drops to 0.031s. Still I feel like there should be a way to optimize this query to achieve comparable results.
Three things:
1) I don't see in the WHERE clause something for r2.item_id. Without it, r2 is factored in via a Cartesian Product and will sum up other item_ids as well.
Change your original query to look like this:
SELECT r.day
,sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`
,sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day`
FROM `client_location_item` AS `cla`
INNER JOIN `client_location` AS `cl`
INNER JOIN `report_item_day` AS `r`
INNER JOIN `report_item_day` AS `r2`
WHERE (r.item_id = cla.item_id) AND (r2.item_id = cla.item_id) AND (cla.location_id = cl.location_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
AND (cl.location_code = 'LOCATION')
group by month(r.day);
See if the EXPLAIN PLAN changes after this.
2) Do this : ALTER TABLE report_itme_day ADD INDEX (date,item_id);
This will index scan the date instead of the item id.
See if the EXPLAIN PLAN changes after this.
3) Last resort : Refactor the query
SELECT r.day, sum(r.amount)/(count(distinct r.item_id)*count(r.day)) AS `current_avg_day`, sum(r2.amount)/(count(distinct r2.item_id)*count(r2.day)) AS `previous_avg_day` FROM
(SELECT CLA.item_id FROM client_location CL,client_location_item CLA WHERE CLA.location_code = 'LOCATION' AND CLA.location_id=CL.location_id) A,
report_item_day r,
report_item_day r2,
WHERE (r.item_id = A.item_id)
AND (r2.item_id = A.item_id)
AND (r.day between from_unixtime(1293840000) and from_unixtime(1296518399))
AND (r2.day between from_unixtime(1291161600) and from_unixtime(1293839999))
group by month(r.day);
This can definitely be refactored further. I just refactorted it a littte.
Give it a Try !!!
Why you are selecting day when you are grouping on month? I don't entirely what you would like the output of your query to look like.
I hate MySQL for allowing that!
I will show you two approaches to query for 2 periods in one go. The first one is a union all query. It should do what your 2-query approach already does. It will return 2 rows, one for each period.
select sum(r.amount) / (count(distinct r.item_id) * count(r.day) ) as curr_avg
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
union all
select sum(r.amount) / (count(distinct r.item_id) * count(r.day) ) as prev_avg
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and r.day between from_unixtime(1291161600) and from_unixtime(1293839999)
The following approach is potentially faster than the above, but it is much uglier and harder to read.
select period
,sum(amount) / (count(distinct item_id) * count(day) ) as avg_day
from (select case when r.day between from_unixtime(1293840000) and from_unixtime(1296518399) then 'Current'
when r.day between from_unixtime(1291161600) and from_unixtime(1293839999) then 'Previous'
end as period
,r.amount
,r.item_id
,r.day
from report_item_day r
join client_location_item cla using(item_id)
join client_location cl using(location_id)
where cl.location_code = 'LOCATION'
and ( r.day between from_unixtime(1293840000) and from_unixtime(1296518399)
or r.day between from_unixtime(1291161600) and from_unixtime(1293839999)
)
) v
group
by period;
Note 1: You didn't give us DDL, so I can't test if the syntax is correct
Note 2: Consider creating a calendar table, keyed by DATE. Add appropriate columns such as MONTH, WEEK, FINANCIAL_YEAR etcetera, to be able to support the reporting you are doing. The queries will be much much easier to write and understand.
First of all (and this might be just aesthetics), why aren't you using ON / USING clauses in your INNER JOIN ? Why make the JOIN on the WHERE clause instead of the actual part, in the FROM?
Second, my guess with the indexed vs non-indexed issue is that now it has to check against an index first for the records that matches said range, whereas in the non-indexed version memory goes faster than disk. But I can't be too sure.
Now, for the query. Here's part of the doc. on JOINs:
The `conditional_expr` used with ON is any conditional expression of the form
that can be used in a WHERE clause. Generally, you should use the ON clause for
conditions that specify how to join tables, and the WHERE clause to restrict
which rows you want in the result set.
So yeah, move the join conditions to the FROM clause. Also, you might be interested in the Index hint syntax: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
And lastly, you could try using a view, but be wary of performance issues: http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/
Good luck.
I'm joining two tables.
Table unique_nucleosome_re has about 600,000 records.
Another table has 20,000 records.
The strange thing is the performance and the answer from EXPLAIN is different depending
on the condition in the WHERE clause.
When it was
WHERE n.chromosome = 'X'
it took about 3 minutes.
When it was
WHERE n.chromosome = '2L'
it took more than 30 minutes and the connection is gone.
SELECT n.name , t.transcript_start , n.start
FROM unique_nucleosome_re AS n
INNER JOIN tss_tata_range AS t
ON t.chromosome = n.chromosome
WHERE (t.transcript_start > n.end AND t.ts_minus_250 < n.start )
AND n.chromosome = 'X'
ORDER BY t.transcript_start
;
This is the answer from EXPLAIN.
when the WHERE is n.chromosome = 'X'
'1', 'SIMPLE', 'n', 'ALL', 'start_idx,end_idx,chromo_start', NULL, NULL, NULL, '606096', '48.42', 'Using where; Using join buffer'
when the WHERE is n.chromosome = '2L'
'1', 'SIMPLE', 'n', 'ref', 'start_idx,end_idx,chromo_start', 'chromo_start', '17', 'const', '68109', '100.00', 'Using where'
The number of records for X or 2L are almost the same.
I spent last couple days but I couldn't figure it out. It may be a simple mistake I can't see or might be a bug.
Could you help me?
First, without seeing any index information, I would have an index on your TSS_TData_Range on the Chromosome key and transcript_start (but a minimum of the chromosome key). I would also assume there is an index on chromosome on your unique_nucleosome_re table. Then, it appears the TSS is your SHORT table, so I would move THAT into the FIRST position of the query and invoke use of the "STRAIGHT_JOIN" clause...
SELECT STRAIGHT_JOIN
n.name,
t.transcript_start,
n.start
FROM
tss_tdata_range t,
unique_nucleosome_re n
where
t.chromosome = 'X'
and t.chromosome = n.chromosome
and t.transcript_start > n.end
and t.ts_minus_250 < n.start
order by
t.transcript_start
I'd be interested in the performance too if it works well for you...