I have the following query and plan:
SELECT data.*
FROM data
WHERE channel_id = 1
AND timestamp >= IFNULL((
SELECT UNIX_TIMESTAMP(DATE_ADD(FROM_UNIXTIME(MAX(timestamp) / 1000, "%Y-%m-%d"), INTERVAL 1 day)) * 1000
FROM aggregate
WHERE type = '3' AND aggregate.channel_id = data.channel_id
), 0)
AND timestamp < UNIX_TIMESTAMP(DATE_FORMAT(NOW(), "%Y-%m-%d")) * 1000
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
'1' 'PRIMARY' 'data' NULL 'ref' 'data_unique IDX_ADF3F36372F5A1AA' 'IDX_ADF3F36372F5A1AA' '5' 'const' '860512' '11.11' 'Using where'
'2' 'DEPENDENT SUBQUERY' 'aggregate' NULL 'ref' 'aggregate_unique IDX_B77949FF72F5A1AA' 'aggregate_unique' '7' 'volkszaehler.data.channel_id const' '1473' '100.00' 'Using index'
The data table has a couple of million rows, all tables are indexed:
data: `channel_id`, `timestamp`
aggregate: `type`, `channel_id`, `timestamp`
The query becomes fast when the aggregate.channel_id = data.channel_id is replaced with the actual value of the channel_id of the outer query. And the dependent subquery becomes a simple subquery.
However, I'd rather not do this to allow the query to operate on >1 channel_ids at a time.
Why doesn't MySQL (5.7 homebrew) recognize that this subquery really is not dependent (or is it?) and how could it be optimized?
I've already verified that removing the IFNULL function or pushing it inwards does not cure the problem. I was also not successful in pushing down the subquery another level as suggested in Can i force mysql to perform subquery first? since the data.channel_id reference is no longer known then.
Related
The sql throws timeout exception in the PRD environment.
SELECT
COUNT(*) totalCount,
SUM(IF(t.RESULT_FLAG = 'success', 1, 0)) successCount,
SUM(IF(b.ERROR_CODE = 'Y140', 1, 0)) unrecognizedCount,
SUM(IF(b.ERROR_CODE LIKE 'Y%' OR b.ERROR_CODE = 'E008', 1, 0)) connectCall,
SUM(IF(b.ERROR_CODE = 'N004', 1, 0)) hangupUnconnect,
SUM(IF(b.ERROR_CODE = 'Y001', 1, 0)) hangupConnect
FROM
lbl_his b LEFT JOIN lbl_error_code t ON b.TASK_ID = t.TASK_ID AND t.CODE = b.ERROR_CODE
WHERE
b.TASK_ID = "5f460e4ffa99f51697ad4ae3"
AND b.CREATE_TIME BETWEEN "2020-07-01 00:00:00" AND "2020-10-28 00:00:00"
The size of table lbl_his is super large. About 20,000,000 rows data which occupied 20GB disk.
The size of table lbl_error_code is small. Only 305 rows.
The indexes of table lbl_his:
TASK_ID
UPDATE_TIME
CREATE_TIME
RECORD_ID
The union indexes of table lbl_his:
TASK_ID, ERROR_CODE, UPDATE_TIME
TASK_ID, CREATE_TIME
There are no index created for table lbl_error_code.
I ran EXPLAIN SELECT and found the sql hit the index of lbl_his.TASK_ID and lbl_error_code.primary.
How to avoid to execute timeout?
For an index solution on lbl_his, try putting a non-clustered index on
firstly the things you filter on by exact match
then the things you filter on as ranges (or inexact matches)
e.g., the initial part of the index should be TASK_ID then CREATE_TIME. Putting these first is very important as it means the engine can do one seek to get the data.
Then include any other fields in use (either as part of index, or includes - doesn't matter) - in this case, ERROR_CODE. This makes your index a covering index.
Therefore your final new non-clustered index on lbl_his should be (TASK_ID, CREATE_TIME, ERROR_CODE)
My problem is this query:
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,
TIME,TEMP_ID,'ID_AUTO',PREDAJCA,VYTVORIL,MAIL,TEMP_ID_HASH,ID_SEND
FROM `load_send_calc`
WHERE `TEMP_ID` LIKE '$find%'
AND ACTIVE = 1 AND TEMP_ID > 0)
UNION ALL
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,TIME,'',
ID_AUTO,'','','','',''
FROM `temp`
WHERE `ID_AUTO` LIKE '$find%'
AND `ID_AUTO` NOT IN (SELECT TEMP_ID
FROM `load_send_calc`
WHERE `load_send_calc`.ACTIVE = 1)
)
ORDER BY TIME DESC LIMIT $limitFrom,$limitTo;
There are 18000 records in table load_send_calc and 3000 table temp. The query itself take more than 2 minutes to execute. Is there any way to optimize this time?
I already tried to put order into each subqueries but it didnt help significantly. I am really desperate so I really appreciate any kind of help.
EDIT:
Here is EXPLAIN result :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
2 UNION temp ALL NULL NULL NULL NULL 1918 Using where
3 DEPENDENT SUBQUERY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL Using filesort
Thanks for adding your explain output - it tells us a lot. The query isn't using a single index, which is very bad for performance. A very simple optimisation would be add indexes on the fields that are used in the join, and also in the where clauses. In your case those fields would be:
load_send_calc.temp_id
load_send_calc.active
temp.id_auto
In addition to these, you have an unnecessary AND TEMP_ID > 0, since you are already limiting on the same field with WHERE TEMP_ID LIKE '$find%'
3 things to speed it up:
INDEX(active, temp_id) -- significantly better than two separate indexes
IN ( SELECT ... ) performs poorly, especially in old versions of MySQL. Turn it into a JOIN.
Add a LIMIT to each SELECT. For example:
( SELECT ... ORDER BY ... LIMIT 80 )
UNION ALL
( SELECT ... ORDER BY ... LIMIT 80 )
ORDER BY ... LIMIT 70, 10;
The inner ones have a limit of the max needed -- the outer's offset + limit.
very simple but quite confusing to me:
SELECT `start`, `stop` FROM loadtime
WHERE utilisateur_id = '202931999'
AND `type` = 'stat'
AND `stop` !=0
ORDER BY id DESC
LIMIT 0,1
12 sec
SELECT `start`, `stop` FROM loadtime
WHERE utilisateur_id = '202931999'
AND `type` = 'stat'
AND `stop` !=0
ORDER BY id DESC
LIMIT 0,2 or xxx
0.07 sec
explain says
limit 0, 1
id select_type table type possible_keys
1 SIMPLE loadtime index utilisateur_id_2,utilisateur_id,type
key key_len ref rows Extra
PRIMARY 4 NULL 10089 Using where
limit 0, x
id select_type table type possible_keys
1 SIMPLE loadtime ref utilisateur_id_2,utilisateur_id,type
key key_len ref rows Extra
utilisateur_id 62 const 12103 Using index condition; Using where; Using filesort
so the first query doesn't use the index naturally.
server is on MySQL 5.5.32
The problem is simple: for some reason (which can be many) MySQL server decides that in the limit 0,1 case it is a better idea (i.e. the server expects that it would take less time and computation) to do a full table scan in order to give you the right result, instead of using an index.
Since this is a prediction, it could happen that it may be wrong and using an index would work better (as in the limit 0,x case). Since you noticed the performance difference, you can force the query to use the index you think is the best, as follows (and as suggested here):
SELECT `start`, `stop` FORCE INDEX (condition) FROM loadtime
WHERE utilisateur_id = '202931999'
AND `type` = 'stat'
AND `stop` !=0
ORDER BY id DESC
LIMIT 0,1
I am using MySQL 5.1 on a Windows Server 2008 (with 4GB RAM) and have the following configuration:
I have 2 MyISAM tables. One is in 1 database (DB1) and has 14 columns, which are mostly varchar. This table has about 5,000,000 rows and is the DB1.games table below. It has a primary key on GameNumber (int(10)).
The other table is the DB2.gameposition and consists of the columns GameNumber (links to
DB1.games) and PositionCode (int(10)). This table has about 400,000,000 rows and there is an index IX_PositionCode on PositionCode.
These 2 databases are on the same server.
I want to run a query on DB2.gameposition to find a particular PositionCode, and have the results sorted by the linking DB1.games.Yr field (smallint(6) - this represents a Year). This sorting of results I only introduced recently. There is an index on this Yr field in DB1.games.
In my stored procedure, I perform the following:
CREATE TEMPORARY TABLE tblGameNumbers(GameNumber INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO tblGameNumbers(GameNumber)
SELECT DISTINCT gp.GameNumber
FROM DB2.gameposition gp
WHERE PositionCode = var_PositionCode LIMIT 1000;
I just get 1000 to make it quicker
And then join it to the DB1.games table.
In order to generate an EXPLAIN from that, I took out the temporary table (I use in the stored procedure) and refactored it as seen in the inner subquery below:
EXPLAIN
SELECT *
FROM DB1.games g
INNER JOIN (SELECT DISTINCT gp.GameNumber
FROM DB2.gameposition gp
WHERE PositionCode = 669312116 LIMIT 1000
) B ON g.GameNumber = B.GameNumber
ORDER BY g.Yr DESC
LIMIT 0,28
Running the EXPLAIN above, I see the following:
1, 'PRIMARY', '', 'ALL', '', '', '', '', 1000, 'Using temporary; Using filesort'
1, 'PRIMARY', 'g', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'B.GameNumber', 1, ''
2, 'DERIVED', 'gp', 'ref', 'IX_PositionCode', 'IX_PositionCode', '4', '', 1889846, 'Using temporary'
The query used to be almost instant before I introduced the ORDER BY clause. Now, sometimes it is quick (depending on different PositionCode), but other times it can take up to 10 seconds to return the rows. Before I introduced the sorting, it was always virtually instantaneous. Unfortunately, I am not too proficient in interpreting the EXPLAIN output. Or how to make the query faster.
Any help would be greatly appreciated!
Thanks in advance,
Tim
Without the order by, your limit means the first 28 results are returned and then the query stops. With order by, all results need to be retrieved so they can be sorted and the first 28 returned.
The explain shows what MySql is doing:
sort 5000000 games records by yr
for each games record from sorted list
get the games record by primary key (to get all the columns)
read gamepositions by position code
if it does not match gamenumber, discard it
when 1000 matches found, stop reading
end read
end for
Try this instead:
select distinct ... from gameposition gp
inner join games g on g.gamenumber = gp.gamenumber
where gp.positioncode = ...
order by g.yr limit ...
I'm joining two tables.
Table unique_nucleosome_re has about 600,000 records.
Another table has 20,000 records.
The strange thing is the performance and the answer from EXPLAIN is different depending
on the condition in the WHERE clause.
When it was
WHERE n.chromosome = 'X'
it took about 3 minutes.
When it was
WHERE n.chromosome = '2L'
it took more than 30 minutes and the connection is gone.
SELECT n.name , t.transcript_start , n.start
FROM unique_nucleosome_re AS n
INNER JOIN tss_tata_range AS t
ON t.chromosome = n.chromosome
WHERE (t.transcript_start > n.end AND t.ts_minus_250 < n.start )
AND n.chromosome = 'X'
ORDER BY t.transcript_start
;
This is the answer from EXPLAIN.
when the WHERE is n.chromosome = 'X'
'1', 'SIMPLE', 'n', 'ALL', 'start_idx,end_idx,chromo_start', NULL, NULL, NULL, '606096', '48.42', 'Using where; Using join buffer'
when the WHERE is n.chromosome = '2L'
'1', 'SIMPLE', 'n', 'ref', 'start_idx,end_idx,chromo_start', 'chromo_start', '17', 'const', '68109', '100.00', 'Using where'
The number of records for X or 2L are almost the same.
I spent last couple days but I couldn't figure it out. It may be a simple mistake I can't see or might be a bug.
Could you help me?
First, without seeing any index information, I would have an index on your TSS_TData_Range on the Chromosome key and transcript_start (but a minimum of the chromosome key). I would also assume there is an index on chromosome on your unique_nucleosome_re table. Then, it appears the TSS is your SHORT table, so I would move THAT into the FIRST position of the query and invoke use of the "STRAIGHT_JOIN" clause...
SELECT STRAIGHT_JOIN
n.name,
t.transcript_start,
n.start
FROM
tss_tdata_range t,
unique_nucleosome_re n
where
t.chromosome = 'X'
and t.chromosome = n.chromosome
and t.transcript_start > n.end
and t.ts_minus_250 < n.start
order by
t.transcript_start
I'd be interested in the performance too if it works well for you...