I would need some help on how to optimize the query.
select * from transaction where id < 7500001 order by id desc limit 16
when i do an explain plan on this - the type is "range" and rows is "7500000"
According to the some online reference's this is explained as, it took the query 7,500,000 rows to scan and get the data.
Is there any way i can optimize so it uses less rows to scan and get the data. Also, id is the primary key column.
online reference's this is explained as, it took the query 7,500,000 rows to scan and get the data
not actually. it's the approximate (optimizer cannot say the correct number in many different cases) number of rows that potentially will be scanned. but you specified LIMIT - so only first 16 rows will be affected while query executed.
ps: i hope the used key in EXPLAIN is id?
I performed an explain with your query on a 8 million rows table
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE transaction range PRIMARY PRIMARY 8 NULL 4079100 Using where
The actual execution was fast, Execution Time : 00:00:00:044.
Related
I have table with around 60 000 rows. I have this two queries that are drastically different in speed. Can you explain why?
SELECT COUNT(id) FROM table;
300ms - 58936 rows
Explain:
id
select_type
table
partitions
type
possible_keys
key
key_length
ref
rows
filtered
Extra
1
SIMPLE
table
NULL
index
NULL
table_id_index
8
NULL
29325
100.00
using index
SELECT COUNT(id) FROM table WHERE dummy = 1;
50ms - 58936 rows
Explain:
id
select_type
table
partitions
type
possible_keys
key
key_length
ref
rows
filtered
Extra
1
SIMPLE
table
NULL
index
NULL
dummy_index
5
const
14662
100.00
using index
It depends.
COUNT(id) may be slower than COUNT(*). The former checks id for being NOT NULL; the latter simply counts the rows. (If id is the PRIMARY KEY, then this is unlikely to make any measurable difference.)
The Optimizer may decide to scan the entire table rather than use an index.
The Optimizer may pick an irrelevant index if not forced to by the WHERE clause. In your example, any index with id can be used for the first, and any index with both dummy and id for the second.
If you run the same query twice, it may run much faster the second time due to caching. This can happen even for a 'similar' query. I suspect this is the "answer". I often see a speedup of 10x if the first run was from disk and the second found everything needed in cache (the buffer_pool).
To get more insight, do EXPLAIN SELECT ...
The optimal index for your second query is INDEX(dummy, id).
I have a pretty long insert query that inserts data from a select query in a table. The problem is that the select query takes too long to execute. The table is MyISAM and the select locks the table which affects other users who also use the table. I have found that problem of the query is a join.
When I remove this part of the query, it takes less then a second to execute but when I leave this part the query takes more than 15 minutes:
LEFT JOIN enq_217 Pex_217
ON e.survey_panelId = Pex_217.survey_panelId
AND e.survey_respondentId = Pex_217.survey_respondentId
AND Pex_217.survey_respondentId != 0
db.table_1 contains 5,90,145 rows and e contains 4,703 rows.
Explain Output:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY e ALL survey_endTime,survey_type NULL NULL NULL 4703 Using where
1 PRIMARY Pex_217 ref survey_respondentId,idx_table_1 idx_table_1 8 e.survey_panelId,e.survey_respondentId 2 Using index
2 DEPENDENT SUBQUERY enq_11525_timing eq_ref code code 80 e.code 1
How can I edit this part of the query to be faster?
I suggest creating an index on the table db.table_1 for the fields panelId and respondentId
You want an index on the table. The best index for this logic is:
create index idx_table_1 on table_1(panelId, respondentId)
The order of these two columns in the index should not matter.
You might want to include other columns in the index, depending on what the rest of the query is doing.
Note: a single index with both columns is different from two indexes with each column.
Why is it a LEFT join?
How many rows in Pex_217?
Run ANALYZE TABLE on each table used. (This sometimes helps MyISAM; rarely is needed for InnoDB.)
Since the 'real problem' seems to be that the query "holds up other users", switch to InnoDB.
Tips on conversion
The JOIN is not that bad (with the new index -- note Using index): 4703 rows scanned, then reach into the other table's index about 2 times each.
Perhaps the "Dependent subquery" is the costly part. Let's see that.
I know there are similar questions on this but I've got a specific query / question around why this query
EXPLAIN SELECT DISTINCT RSubdomain FROM R_Subdomains WHERE EmploymentState IN (0,1) AND RPhone='7853932120'
gives me this output explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains index NULL RSubdomain 767 NULL 3278 Using where
with and index on RSubdomains
but if I add in a composite index on EmploymentState/RPhone
I get this output from explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains range EmploymentState EmploymentState 67 NULL 2 Using where; Using temporary
if I take away the distinct on RSubdomains it drops the Using temp from the explain output... but what I don't get is why, when I add in the composite key (and keeping the key on RSubdomain) does the distinct end up using a temp table and which index schema is better here? I see that the amount of rows scanned on the combined key is far less, but the query is of type range and it's also slower.
Q: why ... does the distinct end up using a temp table?
MySQL is doing a range scan on the index (i.e. reading index blocks) to locate the rows that satisfy the predicates (WHERE clause). Then MySQL has to lookup the value of the RSubdomain column from the underlying table (it's not available in the index.) To eliminate duplicates, MySQL needs to scan the values of RSubdomain that were retrieved. The "Using temp" indicates the MySQL is materializing a resultset, which is processed in a subsequent step. (Likely, that's the set of RSubdomain values that was retrieved; given the DISTINCT, it's likely that MySQL is actually creating a temporary table with RSubdomain as a primary or unique key, and only inserting non-duplicate values.
In the first case, it looks like the rows are being retreived in order by RSubdomain (likely, that's the first column in the cluster key). That means that MySQL needn't compare the values of all the RSubdomain values; it only needs to check if the last retrieved value matches the currently retrieved value to determine whether the value can be "skipped."
Q: which index schema is better here?
The optimum index for your query is likely a covering index:
... ON R_Subdomains (RPhone, EmploymentState, RSubdomain)
But with only 3278 rows, you aren't likely to see any performance difference.
FOLLOWUP
Unfortunately, MySQL does not provide the type of instrumentation provided in other RDBMS (like the Oracle event 10046 sql trace, which gives actual timings for resources and waits.)
Since MySQL is choosing to use the index when it is available, that is probably the most efficient plan. For the best efficiency, I'd perform an OPTIMIZE TABLE operation (for InnoDB tables and MyISAM tables with dynamic format, if there have been a significant number of DML changes, especially DELETEs and UPDATEs that modify the length of the row...) At the very least, it would ensure that the index statistics are up to date.
You might want to compare the plan of an equivalent statement that does a GROUP BY instead of a DISTINCT, i.e.
SELECT r.RSubdomain
FROM R_Subdomains r
WHERE r.EmploymentState IN (0,1)
AND r.RPhone='7853932120'
GROUP
BY r.Subdomain
For optimum performance, I'd go with a covering index with RPhone as the leading column; that's based on an assumption about the cardinality of the RPhone column (close to unique values), opposed to only a few different values in the EmploymentState column. That covering index will give the best performance... i.e. the quickest elimination of rows that need to be examined.
But again, with only a couple thousand rows, it's going to be hard to see any performance difference. If the query was examining millions of rows, that's when you'd likely see a difference, and the key to good performance will be limiting the number of rows that need to be inspected.
Using MySQL (5.1.66) explain says it will scan just 72 rows while the "slow log" reports the whole table was scanned (Rows_examined: 5476845)
How is this possible? I can't figure out what's wrong with the query
*name* is a string unique index and
*date* is just a regular int index
This is the EXPLAIN
EXPLAIN SELECT *
FROM table
WHERE name LIKE 'The%Query%'
ORDER BY date DESC
LIMIT 3;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table index name date 4 NULL 72 Using where
Output from Slow Log
# Query_time: 5.545731 Lock_time: 0.000083 Rows_sent: 1 Rows_examined: 5476845
SET timestamp=1360007079;
SELECT * FROM table WHERE name LIKE 'The%Query%' ORDER BY date DESC LIMIT 3;
The rows value that is returned from an EXPLAIN is an estimate of the number of rows that have to be examined to find results that match your query.
If you look, you will see that the key being chosen for the query execution is date, which is probably being picked because of your ORDER BY clause. Because the key being used in the query is unrelated to your WHERE clause, that's probably why the estimate is getting messed up. Even though your WHERE clause is doing a LIKE on the name column, the optimizer may decide not to use an index at all:
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.) source
In short, the optimizer is choosing not to use the name key, even though it would be the one that is the limiting factor of rows to be returned. You can try forcing the index to see if that improves the performance.
I have a relatively large table (5,208,387 rows, 400mb data/670mb index),
all columns i use to search with are indexes.
name and type are VARCHAR(255) BTREE INDEX
and sdate is an INTEGER column containing timestamps.
I fail to understand some issues,
first this query is very slow (5sec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello%my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 5191 Using where
while this one is very fast (5msec):
SELECT *
FROM `mytable`
WHERE `name` LIKE 'hello.my%big%text%thing%'
AND `type` LIKE '%'
ORDER BY `sdate` DESC LIMIT 3
EXPLAIN for the above:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable range name name 257 NULL 204 Using where
the amount of rows scanned different makes sense because of the indexes,
but having 5k of indexed rows take 5 seconds seems way too much.
also, ordering by name instead of sdate makes the queries very fast, but I need to order by the timestamp.
Second thing I do not understand is that before
adding the last column to the index,
the db had index of 1.4GB,
not after running an OPTIMIZE/REPAIR the size is just 670MB.
The problem is, only the portion before the first % can take advantage of the index, the rest of the like strings needs to process all rows which match hello% or hello.my% without the help of one. Also, ordering by another column then the index used, probably requires a second pass, or at least a scan rather then an already sorted index. Options to better performance (can be implemented independently from each other) are:
Using a full-text index on the name column and using a MATCH() AGAINST() search rather then LIKE with %'s.
Adding the sdate to in index combined (name,sdate) could very well speed up sorting.