I have the following tables (example)
t1 (20.000 rows, 60 columns, primary key t1_id)
t2 (40.000 rows, 8 columns, primary key t2_id)
t3 (50.000 rows, 3 columns, primary key t3_id)
t4 (30.000 rows, 4 columns, primary key t4_id)
sql query:
SELECT COUNT(*) AS count FROM (t1)
JOIN t2 ON t1.t2_id = t2.t2_id
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id
I have created indexes on columns that affect the join (e.g on t1.t2_id) and foreign keys where necessary. The query is slow (600 ms) and if I put where clauses (e.g. WHERE t1.column10 = 1, where column10 doesn't have index), the query becomes much slower. The queries I do with select (*) and LIMIT are fast, and I can't understand count behaviour. Any solution?
EDIT: EXPLAIN SQL ADDED
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t4 index PRIMARY user_id 4 NULL 5259 Using index
1 SIMPLE t2 ref PRIMARY,t4_id t4_id 4 t4.t4_id 1 Using index
1 SIMPLE t1 ref t2_id t2_id 4 t2.t2_id 1 Using index
1 SIMPLE t3 ref PRIMARY PRIMARY 4 t2.t2_id 1 Using index
where user_id is a column of t4 table
EDIT: I changed from innodb to myisam and i had a speed increase, especially if i put where clauses. But i h still have times (100-150 ms) The reason i want count in my application, is to the the user who is processing a search form, the number of results he is expecting with ajax. May be there is a better solution in this, for example creating a temporary table, that is updated every one hour?
The count query is simply faster because of INDEX ONLY SCAN, as stated within query plan. The query you mention consists of only indexed columns, and thats why during execution there is no need to touch physical data - all query is performed on indexes. When you put some additional clause consisting of columns that are not indexed, or indexed in a way that prevents index usage there is a need to access data stored in a heap table by physical address - which is very slow.
EDIT:
Another important thing is that, those are PKs, so they are UNIQUE. Optimizer choses to perform INDEX RANGE SCAN on the first index, and only checks if keys exist in subsequent indexes (that's why the plan states there will be only one row returned).
EDIT2:
Thx to J. Bruni, in fact that is clustered index co the above isn't the "whole truth". There is probably full scan on the first table, and three subsequent INDEX ACCESSes to confirm the FK existance.
count iterate over whole result set and does not depends on indexes. Use EXPLAIN ANALYSE for your query to check how it is executed.
select + limit does not iterate whole result set, hence it's faster
Regarding the COUNT(*) slow performance: are you using InnoDB engine? See:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
"SELECT COUNT(*)" is slow, even with where clause
The main information seems to be: "InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages."
So, one possible solution is to create a separated index and force its usage through USE INDEX command in the SQL query. Look at this comment for a sample usage report:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/comment-page-1/#comment-529049
Regarding the WHERE issue, the query will perform better if you put the condition in the JOIN clause, like this:
SELECT COUNT(t1.t1_id) AS count FROM (t1)
JOIN t2 ON (t1.column10 = 1) AND (t1.t2_id = t2.t2_id)
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id
Related
I am executing queries where two tables authors and work_authors are being joined. I am using MySQL. authors and work_authors have a one-to-many relationship and authors contains around 7 million records and work_authors contains around 21 million records.
SELECT t2.author_id, t1.work_id, t2.name
FROM work_author AS t1,
authors AS t2
WHERE t1.author_id = t2.id AND
t2.name LIKE '%Tolkien%';
Firstly, there is no index at all in the database, i.e. no primary key or other indexes. Then the query on average takes 24 seconds on my machine. When adding index i.e. making the id-column of authors primary key, but not any index or primary key for the work_author table, the query takes on average 64 seconds on my machine.
Secondly, when I add an non-unique index to column author_id the work_author table the query on average takes 4 seconds on my machine.
How can this be the case? Why is the query taking longer when adding index to one of the attributes used to join than having no index at all? And then when adding index to the both join keys it executes faster than the other two alternatives. Anyone who can explain this?
No index can help LIKE '%Tolkien%'. However, ...
Consider adding a FULLTEXT index to authors.name and using
MATCH(name) AGAINST('+Tolkein' IN BOOLEAN MODE)
This will run a lot faster.
PRIMARY KEYs are important.
Using InnoDB is important.
Do use the JOIN...ON syntax instead of commajoin.
A many-to-many table is optimized by having two indexes as discussed here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
I have a pretty long insert query that inserts data from a select query in a table. The problem is that the select query takes too long to execute. The table is MyISAM and the select locks the table which affects other users who also use the table. I have found that problem of the query is a join.
When I remove this part of the query, it takes less then a second to execute but when I leave this part the query takes more than 15 minutes:
LEFT JOIN enq_217 Pex_217
ON e.survey_panelId = Pex_217.survey_panelId
AND e.survey_respondentId = Pex_217.survey_respondentId
AND Pex_217.survey_respondentId != 0
db.table_1 contains 5,90,145 rows and e contains 4,703 rows.
Explain Output:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY e ALL survey_endTime,survey_type NULL NULL NULL 4703 Using where
1 PRIMARY Pex_217 ref survey_respondentId,idx_table_1 idx_table_1 8 e.survey_panelId,e.survey_respondentId 2 Using index
2 DEPENDENT SUBQUERY enq_11525_timing eq_ref code code 80 e.code 1
How can I edit this part of the query to be faster?
I suggest creating an index on the table db.table_1 for the fields panelId and respondentId
You want an index on the table. The best index for this logic is:
create index idx_table_1 on table_1(panelId, respondentId)
The order of these two columns in the index should not matter.
You might want to include other columns in the index, depending on what the rest of the query is doing.
Note: a single index with both columns is different from two indexes with each column.
Why is it a LEFT join?
How many rows in Pex_217?
Run ANALYZE TABLE on each table used. (This sometimes helps MyISAM; rarely is needed for InnoDB.)
Since the 'real problem' seems to be that the query "holds up other users", switch to InnoDB.
Tips on conversion
The JOIN is not that bad (with the new index -- note Using index): 4703 rows scanned, then reach into the other table's index about 2 times each.
Perhaps the "Dependent subquery" is the costly part. Let's see that.
I'm trying to execute a select query over a fairly simple (but large) table and am getting over 10x slower performance when I don't join on a certain secondary table.
TableA is keyed on two columns, 'ID1' & 'ID2', and has a total of 10 numeric (int + dbl) columns.
TableB is keyed on 'ID1' and has a total of 2 numeric (int) columns.
SELECT
AVG(NULLIF(dollarValue, 0))
FROM
TableA
INNER JOIN
TableB
ON
TableA.ID1 = TableB.ID1
WHERE
TableA.ID2 = 5
AND
TableA.ID1 BETWEEN 15000 AND 20000
As soon as the join is removed, performance takes a major hit. The query above takes 0.016 seconds to run while the query below takes 0.216 seconds to run.
The end goal is to replace TableA's 'ID1' with TableB's 2nd (non-key) column and deprecate TableB.
SELECT
AVG(NULLIF(dollarValue, 0))
FROM
tableA
WHERE
ID2 = 5
AND
ID1 BETWEEN 15000 AND 20000
Both tables have indexes on their primary keys. The relationship between the two tables is One-to-Many. DB Engine is MyISAM.
Scenario 1 (fast):
id stype table type possKey key kln ref rws extra
1 SIMPLE TableB range PRIMARY PRIMARY 4 498 Using where; Using index
1 SIMPLE TableA eq_ref PRIMARY PRIMARY 8 schm.TableA.ID1,const 1
Scenario 2 (slow):
id stype table type possKey key key_len ref rows extra
1 SIMPLE TableA range PRIMARY PRIMARY 8 288282 Using where
Row count and lack of any mention of an index in scenario 2 definitely stand out, but why would that be the case?
700 results from both queries -- same data.
Given your query, I'd say an index like this might be useful:
CREATE INDEX i ON tableA(ID2, ID1)
A possible reason why your first query is much faster is because you probably only have few records in tableB, which makes the join predicate very selective, compared to the range predicate.
I suggest reading up on indexes. Knowing 2-3 details about them will help you easily tune your queries only by choosing better indexes.
I have a table in MySQL (5.5.31) which has about 20M rows. The following query:
SELECT DISTINCT mytable.name name FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0 ORDER BY mytable.date_modified DESC LIMIT 0,21
is causing full table scan, with explain saying type is ALL and extra info is Using where; Using temporary; Using filesort. Explain results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable ALL NULL NULL NULL NULL 19001156 Using where; Using temporary; Using filesort
1 SIMPLE mytable_c eq_ref PRIMARY PRIMARY 108 mytable.id 1 Using index
Without the join explain looks like:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE mytable index NULL mytablemod 9 NULL 21 Using where; Using temporary
id_c is the primary key for mytable_c and mytable_c does not have more than one row for every row in mytable. date_modified is indexed. But looks like MySQL does not understand that. If I remove the DISTINCT clause, then explain uses index and touches only 21 rows just as expected. If I remove the join it also does this. Is there any way to make it work without the full table scan with the join? explain shows mysql knows it needs only one row from mytable_c and it is using the primary key, but still does full scan on mytable.
The reason DISTINCT is there that the query is generated by the ORM system in which there might be cases where multiple rows may be produced by JOINs, but the values of SELECT fields will always be unique (i.e. if JOIN is against multi-value link only fields that are the same in every joined row will be in SELECT).
These are just generic comments, not mysql specific.
To find all the possible name values from mytable a full scan of either the table or an index needs to happen. Possible options:
full table scan
full index scan of an index starting with deleted (take advantage of the filter)
full index scan of an index starting with name (only column of concern for output)
If there was an index on deleted, the server could find all the deleted = 0 index entries and then look up the corresponding name value from the table. But if deleted has low cardinality or the statistics aren't there to say differently, it could be more expensive to do the double reads of first the index then the corresponding data item. In that case, just scan the table.
If there was an index on name, an index scan could be sufficient, but then the table needs to be checked for the filter. Again frequent hopping from index to table.
The join columns also need to be considered in a similar manner.
If you forget about the join part and had a multi-part index on columns name, deleted then an index scan would probably happen.
Update
To me the DISTINCT and ORDER BY parts are a bit confusing. Of which name record is the date_modified to be used for sorting? I think something like this would be a bit more clear:
SELECT mytable.name name --, MIN(mytable.date_modified)
FROM mytable
LEFT JOIN mytable_c ON mytable_c.id_c = mytable.id
WHERE mytable.deleted = 0
GROUP BY mytable.name
ORDER BY MIN(mytable.date_modified) DESC LIMIT 0,21
Either way, once the ORDER BY comes into play, a full scan needs to be done to find the order. Without the ORDER BY, the first 21 found could suffice.
Why do not you try to move condition mytable.deleted = 0 from WHERE to the JOIN ON ? You can also try FORCE INDEX (mytablemod)
I wanted to find all hourly records that have a successor in a ~5m row table.
I tried :
SELECT DISTINCT (date_time)
FROM my_table
JOIN (SELECT DISTINCT (DATE_ADD( date_time, INTERVAL 1 HOUR)) date_offset
FROM my_table) offset_dates
ON date_time = date_offset
and
SELECT DISTINCT(date_time)
FROM my_table
WHERE date_time IN (SELECT DISTINCT(DATE_ADD(date_time, INTERVAL 1 HOUR))
FROM my_table)
The first one completes in a few seconds, the seconds hangs for hours.
I can understand that the sooner is better but why such a huge performance gap?
-------- EDIT ---------------
Here are the EXPLAIN for both queries
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1710 Using temporary
1 PRIMARY my_table ref PRIMARY PRIMARY 8 offset_dates.date_offset 555 Using index
2 DERIVED my_table index NULL PRIMARY 13 NULL 5644204 Using index; Using temporary
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY my_table range NULL PRIMARY 8 NULL 9244 Using where; Using index for group-by
2 DEPENDENT SUBQUERY my_table index NULL PRIMARY 13 NULL 5129983 Using where; Using index; Using temporary
In general, a query using a join will perform better than an equivalent query using IN (...), because the former can take advantage of indexes while the latter can't; the entire IN list must be scanned for each row which might be returned.
(Do note that some database engines perform better than others in this case; for example, SQL Server can produce equivalent performance for both types of queries.)
You can see what the MySQL query optimizer intends to do with a given SELECT query by prepending EXPLAIN to the query and running it. This will give you, among other things, a count of rows the engine will have to examine for each step in a query; multiply these counts to get the overall number of rows the engine will have to visit, which can serve as a rough estimate of likely performance.
I would prefix both queries by explain, and then compare the difference in the access plans. You will probably find that the first query looks at far fewer rows than the second.
But my hunch is that the JOIN is applied more immediately than the WHERE clause. So, in the WHERE clause you are getting every record from my_table, applying an arithmetic function, and then sorting them because select distinct usually requires a sort and sometimes it creates a temporary table in memory or on disk. The # of rows examined is probably the product of the size of each table.
But in the JOIN clause, a lot of the rows that are being examined and sorted in the WHERE clause are probably eliminated beforehand. You probably end up looking at far fewer rows... and the database probably takes easier measures to accomplish it.
But I think this post answers your question best: SQL fixed-value IN() vs. INNER JOIN performance
'IN' clause is usually slow for huge tables. As far as I remember, for the second statement you printed out - it will simply loop through all rows of my_table (unless you have index there) checking each row for match of WHERE clause. In general IN is treated as a set of OR clauses with all set elements in it.
That's why, I think, using temporary tables that are created in background of JOIN query is faster.
Here are some helpful links about that:
MySQL Query IN() Clause Slow on Indexed Column
inner join and where in() clause performance?
http://explainextended.com/2009/08/18/passing-parameters-in-mysql-in-list-vs-temporary-table/
Another things to consider is that with your IN style, very little future optimization is possible compared to the JOIN. With the join you can possibly add an index, which, who knows, it depends on the data set, it might speed things up by a 2, 5, 10 times. With the IN, it's going to run that query.