How to optimize this complex query? - mysql

How i can optimize this query? for now it's executing in 0.0100 second.
SELECT comments.comment_content, comments.comment_votes, comments.comment_date,
users.user_login, users.user_level, users.user_avatar_source,
groups.group_safename
FROM comments
LEFT JOIN links ON comment_link_id=link_id
LEFT JOIN users ON comment_user_id=user_id
LEFT JOIN groups ON comment_group_id=link_group_id
WHERE comment_status='published' AND link_status='published'
ORDER BY comment_id DESC
EXPLAIN output:
Indexes:
Comment:
Users:
Groups:

Sub-twenty-millisecond query times aren't usually considered to be slow. As some folks have mentioned in the comments, it will be necessary for you to redo your optimization when your tables get larger, because MySQL's optimizer (and optimizers for other RDMSs) makes decisions based on index size.
I recommend you always qualify your column names in JOIN clauses with table names or aliases. For example, you will gain clarity and maintainability by using a style like this:
FROM comments AS c
LEFT JOIN links AS L ON c.comment_link_id=L.link_id
LEFT JOIN users AS u ON c.comment_user_id=u.user_id
LEFT JOIN groups AS g ON c.comment_group_id=g.link_group_id
This query selects a fairly broad subset of your tables, so it will run slower the larger your tables are. That's inevitable unless you can narrow the subset somehow.
Are the columns you're using for JOIN ... ON operations all declared NOT NULL? They should be.
Looking at how you are using the groups table: You're joining on link_group_id and retrieving group_safename. So, try a compound covering index on (link_group_id,group_safename). At a minimum, index link_group_id.
The users table: You've already got an index on user_id. When your tables get bigger a compound covering index on (user_id, user_login, user_level, user_avatar_source) may help. But that's a low-priority thing to try.
The links table: You're using link_status and link_id. Your LEFT JOIN for this table should be a plain inner JOIN because one of its columns shows up in your WHERE clause. If link_status can be NOT NULL in your application make sure it is declared that way. Then try a compound index on (link_status, link_id).
The comments table: You have no index on comment_status as far as I can see. Try adding one.
Then put a bunch of data in your tables, run OPTIMIZE LOCAL TABLE for each table, then try your query with EXPLAIN again.

Related

Does a JOIN query that selects on the joined table benefit from an index?

This is a pretty simple question, however I'm having trouble finding a straight answer for it on the internet.
Let's say I have two tables:
article - id, some other properties
localisation - id, articleId, locale, title, content
and 1 article has many localisations, and theres an index on locale and we want to filter by locale.
My question is, does querying by article and joining on localisation with a where clause, like this:
SELECT * FROM article AS a JOIN localisation AS l ON a.id = l.articleId WHERE l.locale = 5;
does benefit the same from the locale index as querying by the reverse:
SELECT * FROM localisation AS l JOIN article AS a ON l.articleId = a.id WHERE l.locale = 5;
Or do I need to do the latter to make proper use of my index? Assuming the cardinality is correct of course.
By default, the order you specify tables in your query isn't necessarily the order they will be joined.
Inner join is commutative. That is, A JOIN B and B JOIN A produce the same result.
MySQL's optimizer knows this fact, and it can reorder the tables if it estimates it would be a less expensive query if it joined the tables in the opposite order to that which you listed them in your query. You can specify an optimizer hint to prevent it from reordering tables, but by default this behavior is enabled.
Using EXPLAIN will tell you which table order the optimizer prefers use for a given query. There may be edge cases where the optimizer chooses something you didn't expect. Some of the optimizer's estimate depends on the frequency of data values in your table, so you should test in your own environment.
P.S.: I expect this query would probably benefit from a compound index on the pair of columns: (locale, articleId).

Should I create 2 indexes for the same column to speed up a join?

I am new to database index and I've just read about what an index is, differences between clustered and non clustered and what composite index is.
So for a inner join query like this:
SELECT columnA
FROM table1
INNER JOIN table2
ON table1.columnA= table2.columnA;
In order to speed up the join, should I create 2 indexes, one for table1.columnA and the other for table2.columnA , or just creating 1 index for table1 or table2?
One is good enough? I don't get it, for example, if I select some data from table2 first and based on the result to join on columnA, then I am looping through results one by one from table2, then an index from table2.columnA is totally useless here, because I don't need to find anything in table2 now. So I am needing a index for table1.columnA.
And vice versa, I need a table2.columnA if I select some results from table1 first and want to join on columnA.
Well, I don't know how in reality "select xxxx first then join based on ..." looks like, but that scenario just came into my mind. It would be much appreciated if someone could also give a simple example.
One index is sufficient, but the question is which one?
It depends on how the MySQL optimizer decides to order the tables in the join.
For an inner join, the results are the same for table1 INNER JOIN table2 versus table2 INNER JOIN table1, so the optimizer may choose to change the order. It is not constrained to join the tables in the order you specified them in your query.
The difference from an indexing perspective is whether it will first loop over rows of table1, and do lookups to find matching rows in table2, or vice-versa: loop over rows of table2 and do lookups to find rows in table1.
MySQL does joins as "nested loops". It's as if you had written code in your favorite language like this:
foreach row in table1 {
look up rows in table2 matching table1.column_name
}
This lookup will make use of the index in table2. An index in table1 is not relevant to this example, since your query is scanning every row of table1 anyway.
How can you tell which table order is used? You can use EXPLAIN. It will show you a row for each table reference in the query, and it will present them in the join order.
Keep in mind the presence of an index in either table may influence the optimizer's choice of how to order the tables. It will try to pick the table order that results in the least expensive query.
So maybe it doesn't matter which table you add the index to, because whichever one you put the index on will become the second table in the join order, because it makes it more efficient to do the lookup that way. Use EXPLAIN to find out.
90% of the time in a properly designed relational database, one of the two columns you join together is a primary key, and so should have a clustered index built for it.
So as long as you're in that case, you don't need to do anything at all. The only reason to add additional non-clustered indices is if you're also further filtering the join with a where clause at the end of your statement, you need to make sure both the join columns and the filtered columns are in a correct index together (ie correct sort order, etc).

Join vs subquery to count nested objects

Let's say my model contains 2 tables: persons and addresses. One person can have O, 1 or more addresses. I'm trying to execute a query that lists all persons and includes the number of addresses they have respectively. Here is the 2 queries that I have to achieve that:
SELECT
persons.*,
count(addresses.id) AS number_of_addresses
FROM `persons`
LEFT JOIN addresses ON persons.id = addresses.person_id
GROUP BY persons.id
and
SELECT
persons.*,
(SELECT COUNT(*)
FROM addresses
WHERE addresses.person_id = persons.id) AS number_of_addresses
FROM `persons`
And I was wondering if one is better than the other in term of performance.
The way to determine performance characteristics is to actually run the queries and see which is better.
If you have no indexes, then the first is probably better. If you have an index on addresses(person_id), then the second is probably better.
The reason is a little complicated. The basic reason is that group by (in MySQL) uses a sort. And, sorts are O(n * log(n)) in complexity. So, the time to do a sort grows faster than a data (not much faster, but a bit fast). The consequence is that a bunch of aggregations for each person is faster than one aggregation by person over all the data.
That is conceptual. In fact, MySQL will use the index for the correlated subquery, so it is often faster than the overall group by, which does not make use of an index.
I think the first query is optimum and more optimization can provided by changing table structure. For example define both person_id and address_id fields (order is important) as primary key in addresses table to join faster.
The mysql table storage structure is indexed organized table(clustered index) so the primary key index is very faster than normal index specially in join operation.

Which is faster subquery with filter and then join or join the queries and then filter in MYSQL

I have two table one is user table other is specialty table.
Fields in user table: userid, username, userLocation
Fields in specialty table: userid, userSpecialty
Now I want to join there two tables. Please let me know which approach will be better:
select * from ( select * from user where userLocation = 'value') u
inner join specialty s on u.userid = s.userid;
or
select * from user u inner join specialty s on u.userid = s.userid where userLocation = 'value';
Is it good practice to minimize the number of records where ever we can or SQL optimizer will do that automatically?
For best shot at optimal performance, give preference to the pattern in the second query, the query without the inline view.
For earlier version of MySQL (version 5.5 and earlier), the first query will require MySQL to run the inline view query, and materialize a derived table (u). Once that is done, the outer query will run against the derived table. And that table won't be indexed. For large sets, that can be a significant performance hit. For small sets, the performance impact for a single query isn't noticeable.
With the second query, the optimizer isn't forced to create and populate a derived table, so there's potential for better performance.
The existence of suitable indexes (or the non-existence of indexes) i0ndexes is going to have a much bigger impact on performance. And retrieving all columns including columns that aren't needed by the query (SELECT *) also has an impact on performance. Specifying a subset of the columns, the expressions that are actually needed, will give better performance, especially if a covering index is available to avoid lookups to the underlying data pages of the table.

Create composite index across tables

I am doing a JOIN and would like to speed it up by creating a composite index between the joining tables:
SELECT * FROM catalog_product_entity AS p
INNER JOIN catalog_product_flat_1 AS cpf
ON cpf.entity_id = p.entity_id`
in a way similar to this:
create index foo on catalog_product_flat_1 (entity_id,catalog_product_entity.entity_id);
The approach above generates a syntax error. What is the correct wat to create a composite index that uses cross-table columns?
When joining two tables, the server has to look up the information of one record on one side of the join in the table of the other side. Therefore, an index across two tables does not help in this regard. The index is only useful on that side of the join that is actually looked up.
Consequently, an index spanning multiple tables is not possible.
The query planner takes this into account and resolves the join condition in a way which uses the most efficient lookup. In your example, the query planner might first check for an index on cpf.entity_id and p.entity_id if there is no index, it will search the smaller table and try other optimizations. MySql's EXPLAIN can provide further insight.