I'm trying to learn the explain statement in MySQL but ran into a wall.
For my experiment, I created two tables (each having 10 rows) and ran explain over a simple join. Naturally, no indexes were used and 10*10 = 100 rows were scanned (I've added the output in images because the very long output of EXPLAIN was being wrapped on itself. The code is also in this pastebin):
I then added primary keys and indexes and reissued the explain command:
But as you can see, the users table is still being fully scanned by MySQL, as if there was no primary key. What is going wrong?
This is a bit long for a comment.
Basically, your tables are too small. You cannot get reasonable performance indications on such small data -- the query only needs to load two data pages into memory for the query. A nested loop join requires 100 comparisons. By comparison, loading indexes and doing the binary search is probably about the same amount of effort, if not more.
If you want to get a feel for explain, then use tables with a few tens of thousands of rows.
You seem to be asking about EXPLAIN, INDEXing, and optimizing particular SELECTs.
For this:
select u.name
from users as u
join accounts as a on u.id = a.user_id
where a.amount > 1000;
the optimizer will pick between users and accounts for which table to look at first. Then it will repeatedly reach into the other table.
Since you say a.amount > ... but nothing about u, the optimizer is very likely to pick a first.
If a.amount > 1000 is selective enough (less than, say, 20% of the rows) and there is INDEX(amount), it will use that index. Else it will do a table scan of a.
To reach into u, it needs some index starting with id. Keep in mind that a PRIMARY KEY is an index.
This, and many more basics, are covered in my index cookbook.
See also myxlpain for a discussion of EXPLAIN.
Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.
EXPLAIN FORMAT=JSON SELECT... is also somewhat cryptic, but it does have more details than a regular EXPLAIN.
well,
As your main filter has '>' comparison operator, it does full table scan Because it may or may not return all rows.
as you join the 'accounts' table with 'user_id' column, it shows the 'user_id' index in Possible Keys, but it doesn't use it, because of the FULL TABLE SCAN process.
Related
Overview:
I have a system that builds the query statements. Some of which must join some tables to others based on set parameters passed into the system. When running some performance tests on the queries created I noticed that some of the queries were doing FULL TABLE SCANS, which in many cases, from what I've read is not good for large tables.
What I'm trying to do:
1 - Remove the full table scans
2 - Speed up the Query
3 - Find out if there is a more efficient query I can have the system build instead
The Query:
SELECT a.p_id_one, b.p_id_two, b.fk_id_one, c.fk_id_two, d.fk_id_two,
d.id_three, d.fk_id_one
FROM ATable a
LEFT JOIN BTable b ON a.p_id_one = b.fk_id_one
LEFT JOIN CTable c ON b.p_id_two = c.fk_id_two
LEFT JOIN DTable d ON b.p_id_two = d.fk_id_two
WHERE a.p_id_one = 1234567890
The Explain
Query Time
Showing rows 0 - 10 (11 total, Query took 0.0016 seconds.)
Current issues:
1 - Query time for my system/DBMS (phpmyadmin) takes between 0.0013 seconds and 0.0017 seconds.
What have I done to fix?
The full table scans or 'ALL' type queries are being ran on tables ('BTable', 'DTable') so I've tried to use FORCE INDEX on the appropriate ids.
Using FORCE INDEX removes the full table scans but it doesn't speed up the
performance.
I double checked my fk_constraints and index relationships to ensure I'm not missing anything. So far everything checks out.
2 - Advisor shows multiple warnings a few relate back to the full table scans and the indexes.
Question(s):
Assume all indexes are available and created
1 - Is there a better way to perform this query?
2 - How many joins are too many joins?
3 - Could the joins be the problem?
4 - Does the issue rest within the WHERE clause?
5 - What optimize technique/tool could I have missed?
6 - How can I get this query to perform at a speed between 0.0008 and 0.0001?
If images and visuals are needed to help clarify my situation please do ask in a comment below. I appreciate any and all assistance.
Thank you =)
"p_id_one" does not tell us much. Is this an auto_increment? Real column names sometimes gives important clues of cardinality and intent. As Willem said, "there must be more to this issue" and "what is the overall problem".
LEFT -- do you need it? It prevents certain forms of optimizations; remove it if the 'right' table row is not optional.
WHERE a.p_id_one = 1234567890 needs INDEX(p_id_one). Is that the PRIMARY KEY already? In that case, an extra INDEX is not needed. (Please provide SHOW CREATE TABLE.)
Are those really the columns/expressions you are SELECTing? It can make a difference -- especially when suggesting a "covering index" as an optimization.
Please provide the output from EXPLAIN SELECT ... (That is not the discussion you did provide.) That output would help with clues of 1:many, cardinality, etc.
If these are FOREIGN KEYs, you already have indexes on b.fk_id_one, c.fk_id_two, d.fk_id_two; so that is nothing more to do there.
1.6ms is an excellent time for a query involving 4 tables. Don't plan on speeding it up significantly. You probably handle hundreds of connections doing thousands of similar queries per second. Do you need more than that?
Are you using InnoDB? That is better at concurrent access.
Your example does not seem to have any full table scans; please provide an example that does.
ALL on a 10-row table is nothing to worry about. On a million-row table it is a big deal. Will your tables grow significantly? You should note this when worrying about ALL: A full table scan is sometimes faster than using the 'perfect' index. The optimizer decide on the scan when the estimated number of rows is more than about 20% of the table. A table scan is efficient because it is scanning straight through the table, even if skipping 80% of the rows. Using an index is more complex -- the index is scanned, but for each row found in the index, a lookup is needed into the data to find the row. If you see ALL when you don't think you should, then probably the index is not very selective. Don't worry.
Don't use FORCE INDEX -- although it may help the query with today's values, it may hurt tomorrow's query.
How is it possible to have a good plan in EXPLAIN like below and have a slow query. With few rows, using index, no filesort.
The query is running in 9s. The main table has around 500k rows.
When I had 250k rows in that table, the query was running in < 1s.
Suggestions plz?
Query (1. fields commented can be enabled according user choice. 2. Without FORCE INDEX I got 14s. 3. SQL_NO_CACHE I use to prevent false results):
SELECT SQL_NO_CACHE
p.property_id
, lct.loc_city_name_pt
, lc.loc_community_name_pt
, lc.loc_community_image_num_default
, lc.loc_community_gmap_longitude
, lc.loc_community_gmap_latitude
FROM property as p FORCE INDEX (ix_order_by_perf)
INNER JOIN loc_community lc
ON lc.loc_community_id = p.property_loc_community_id
INNER JOIN loc_city lct FORCE INDEX (loc_city_id)
ON lct.loc_city_id = lc.loc_community_loc_city_id
INNER JOIN property_attribute pa
ON pa.property_attribute_property_id = p.property_id
WHERE p.property_published = 1
AND (p.property_property_type_id = '1' AND p.property_property_type_sale_id = '1')
AND p.property_property_housing_id = '1'
-- AND p.property_loc_community_id = '36'
-- AND p.property_bedroom_id = '2'
-- AND p.property_price >= '50000' AND p.property_price <= '150000'
-- AND lct.loc_city_id = '1'
-- AND p.property_loc_subcommunity_id IN(7,8,12)
ORDER BY
p.property_featured DESC
, p.property_ranking_date DESC
, p.property_ranking_total DESC
LIMIT 0, 15
Query Profile
The resultset always outputs 15 rows. But the table property and property_attribute has around 500k rows.
Thanks all,
Armando Miani
This really seems to be an odditity in EXPLAIN in this case. This doesn't occur on MySQL 4.x, but it does on MySQL 5.x.
What MySQL is really showing you is that MySQL is trying to use the forced index ix_order_by_perf for sorting the rows, and it's showing you 15 rows because you have LIMIT 15.
However, the WHERE clause is still scanning all 500K rows since it can't utilize an index for the criteria in your WHERE clause. If it were able to use the index for finding the required rows, you would see the forced index listed in the 'possible_keys' field.
You can prove this by keeping the FORCE INDEX clause and removing the ORDER BY clause. You'll see that MySQL now won't use any indexes, even the one you're forcing (because the index doesn't work for this purpose).
Try adding property_property_type_id, property_property_type_sale_id, property_property_housing_id, and any other columns that you refer to in your WHERE clause to the beginning of the index.
There's a moment when your query will be optimized around a model which might not be anymore valid for a given need.
A plan could be great but, even if the filters you are using in the where clause respect indexes definitions, it doesn't mean the parser doesn't parse may rows.
What you have to analize is how determinating are your indexes. For instance, if there's an index on "name, family name" in a "person" table, the performances are going to be poor if everybody has the same name and family name. The index is a real trap pulling down performances when it doesn't manage to be enough describing a certain segment of your datasets.
Based on the output of your explain the query, here are my initial thoughts:
This portion of your query (rewritten to excluded the unneeded parentheses):
p.property_published = 1
AND p.property_property_type_id = '1'
AND p.property_property_type_sale_id = '1'
AND p.property_property_housing_id = '1'
Put conditions so many conditions on the property table that it's unlikely any index you have can be used. Unless you have a single index that has all four of those attributes in it, you're forcing a full table scan on the query just to find the rows that meet those conditions (though, it's possible if you have an index on one of the attributes it could use that).
First, I'd add the following index (have not checked this for syntax errors):
CREATE INDEX property_published_type_sale_housing_idx
ON property (property_published,
property_property_type_id,
property_property_type_sale_id,
property_property_housing_id );
Then I'd re-run your EXPLAIN to see if you hit the index now. (Take off the FORCE INDEX on that part of the query).
Also, even given this issue, it's possible the slow down may be memory related. That is, you may have enough memory to process the table with a smaller number of rows, but it may be that when the table gets larger MySQL can't process the entire query in memory and is forced to start using disk to get the entire query handled. This would explain why there's a sudden drop off in performance.
If that's the case, then two things might help:
Adding more memory (and tune the mysql config file to take advantage of it) so that the number of rows that can br processed at once is larger. This is at best a temporary solution.
Tune the indexes (like I'm saying above) so that the number of rows that mysql needs to process is lower. If it can be more precise in picking the rows it selects for processing.
except a good plan you need to have enough resources to run query.
check buffers size and another critical parameters in your config.
And your query is?
I have this mysql query and I am not sure what are the implications of indexing all the fields in the query . I mean is it OK to index all the fields in the CASE statement, Join Statement and Where Statement? Are there any performance implications of indexing fields?
SELECT roots.id as root_id, root_words.*,
CASE
WHEN root_words.title LIKE '%text%' THEN 1
WHEN root_words.unsigned_title LIKE '%normalised_text%' THEN 2
WHEN unsigned_source LIKE '%normalised_text%' THEN 3
WHEN roots.root LIKE '%text%' THEN 4
END as priorities
FROM roots INNER JOIN root_words ON roots.id=root_words.root_id
WHERE (root_words.unsigned_title LIKE '%normalised_text%') OR (root_words.title LIKE '%text%')
OR (unsigned_source LIKE '%normalised_text."%') OR (roots.root LIKE '%text%') ORDER by priorities
Also, How can I further improve the speed of the query above?
Thanks!
You index columns in tables, not queries.
None of the search criteria you've specified will be able to make use of indexes (since the search terms begin with a wild card).
You should make sure that the id column is indexed, to speed the JOIN. (Presumably, it's already indexed as a PRIMARY KEY in one table and a FOREIGN KEY in the other).
To speed up this query you will need to use full text search. Adding indexes will not speed up this particular query and will cost you time on INSERTs, UPDATEs, and DELETEs.
Caveat: Indexes speed up retrieval time but cause inserts and updates to run slower.
To answer the implications of indexing every field, there is a performance hit when using indexes whenever the data that is indexed is modified, either through inserts, updates, or deletes. This is because SQL needs to maintain the index. It's a balance between how often the data is read versus how often it is modified.
In this specific query, the only index that could possibly help would be in your JOIN clause, on the fields roots.id and root_words.root_id.
None of the checks in your WHERE clause could be indexed, because of the leading '%'. This causes SQL to scan every row in these tables for a matching value.
If you are able to remove the leading '%', you would then benefit from indexes on these fields... if not, you should look into implementing full-text search; but be warned, this isn't trivial.
Indexing won't help when used in conjunction with LIKE '%something%'.
It's like looking for words in a dictionary that have ae in them somewhere. The dictionary (or Index in this case) is organised based on the first letter of the word, then the second letter, etc. It has no mechanism to put all the words with ae in them close together. You still end up reading the whole dictionary from beginning to end.
Indexing the fields used in the CASE clause will likely not help you. Indexing helps by making it easy to find records in a table. The CASE clause is about processing the records you have found, not finding them in the first place.
Optimisers can also struggle with optimising multiple unrelated OR conditions such as yours. The optimiser is trying to narrow down the amount of effort to complete your query, but that's hard to do when unrelated conditions could make a record acceptable.
All in all your query would benefit from indexes on roots(root_id) and/or roots(id), but not much else.
If you were to index additional fields though, the two main costs are:
- Increased write time (insert, update or delete) due to additional indexes to write to
- Increased space taken up on the disk
I currently have a summary table to keep track of my users' post counts, and I run SELECTs on that table to sort them by counts, like WHERE count > 10, for example. Now I know having an index on columns used in WHERE clauses speeds things up, but since these fields will also be updated quite often, would indexing provide better or worse performance?
If you have a query like
SELECT count(*) as rowcount
FROM table1
GROUP BY name
Then you cannot put an index on count, you need to put an index on the group by field instead.
If you have a field named count
Then putting an index in this query may speed up the query, it may also make no difference at all:
SELECT id, `count`
FROM table1
WHERE `count` > 10
Whether an index on count will speed up the query really depends on what percentage of the rows satisfy the where clause. If it's more than 30%, MySQL (or any SQL for that matter) will refuse to use an index.
It will just stubbornly insist on doing a full table scan. (i.e. read all rows)
This is because using an index requires reading 2 files (1 index file and then the real table file with the actual data).
If you select a large percentage of rows, reading the extra index file is not worth it and just reading all the rows in order will be faster.
If only a few rows pass the sets, using an index will speed up this query a lot
Know your data
Using explain select will tell you what indexes MySQL has available and which one it picked and (kind of/sort of in a complicated kind of way) why.
See: http://dev.mysql.com/doc/refman/5.0/en/explain.html
Indexes in general provide better read performance at the cost of slightly worse insert, update and delete performance. Usually the tradeoff is worth it depending on the width of the index and the number of indexes that already exist on the table. In your case, I would bet that the overall performance (reading and writing) will still be substantially better with the index than without but you would need to run tests to know for sure.
It will improve read performance and worsen write performance. If the tables are MyISAM and you have a lot of people posting in a short amount of time you could run into issues where MySQL is waiting for locks, eventually causing a crash.
There's no way of really knowing that without trying it. A lot depends on the ratio of reads to writes, storage engine, disk throughput, various MySQL tuning parameters, etc. You'd have to setup a simulation that resembles production and run before and after.
I think its unlikely that the write performance will be a serious issue after adding the index.
But note that the index won't be used anyway if it is not selective enough - if more than for example 10% of your users have count > 10 the fastest query plan might be to not use the index and just scan the entire table.
why defining index for mysql tables increase performance in queries haveing join?
If you are interested in a specific topic in a book, you go to the back of the book and find it alphabetically in the index. The index tells you the page number(s) where the topic is discussed. Then you jump straight to the pages that you are interested in. Much, much faster than reading the whole book.
It's the same in a database. The index means that you can jump to the joining rows instead of scanning every row in the table looking for a match.
Have a look at how a clustered index works (http://msdn.microsoft.com/en-us/library/ms177443.aspx). You can have one of those per table.
This artical explains how a non clustered index works (http://msdn.microsoft.com/en-us/library/ms177484.aspx). You can have as many of them as you want.
Both of these articles are about Microsoft Sql Server, but the theory behind indexes is the same across all relational database management systems.
Indexes do have an associated cost. Every time an insert/update is performed on the table, the effected index(es) may have to be updated also. And of course indexes take up space - but that is not really an issue for most of us. So you need to balance the performance benefits of faster joins or filtering against the costs of inserts and updates.
As a guide, you will generally want an index that matches each of the columns included in a join or where clause:
SELECT
*
FROM
Customer
WHERE
RegistrationDate > #registrationDate
AND RegistrationCountry = #registrationCountry;
So an index on the Customer table that includes the RegistrationDate and RegistrationCountry columns would speed up this query. Since we are using a ">" in our query, this would be a good candidate for a clustered index (the first article shows that a clustered index physically arranges the data in index order so range queries can very quickly isolate a range of the index).
SELECT
*
FROM
Customer c
INNER JOIN Order o
ON o.CustomerID = c.CustomerID
AND o.OrderType = #orderType
Here, we would want an index on the Customer table that contains the CustomerID column. And we'd want an index on the Order table that contains the CustomerID and the OrderType columns. Then both sides of the join will not need to do a table scan.
Typically there will only be a small number of ways that data is queried from a table, so you won't get index overload. Lots of indexes is sometimes a sign that your tables have mixed concerns and could be normalized.
You might want to read up on the basics of database indexes. Indexes are basically used to organize data.
I have found that sometimes it can be significantly faster to replace the JOIN query with two smaller queries, then join them in PHP or whatever language is calling MySQL. So try both and time them to see which is better for the particular situation, but bear in mind that the "fastest" solution may change as database size increases.