Create composite index across tables - mysql

I am doing a JOIN and would like to speed it up by creating a composite index between the joining tables:
SELECT * FROM catalog_product_entity AS p
INNER JOIN catalog_product_flat_1 AS cpf
ON cpf.entity_id = p.entity_id`
in a way similar to this:
create index foo on catalog_product_flat_1 (entity_id,catalog_product_entity.entity_id);
The approach above generates a syntax error. What is the correct wat to create a composite index that uses cross-table columns?

When joining two tables, the server has to look up the information of one record on one side of the join in the table of the other side. Therefore, an index across two tables does not help in this regard. The index is only useful on that side of the join that is actually looked up.
Consequently, an index spanning multiple tables is not possible.
The query planner takes this into account and resolves the join condition in a way which uses the most efficient lookup. In your example, the query planner might first check for an index on cpf.entity_id and p.entity_id if there is no index, it will search the smaller table and try other optimizations. MySql's EXPLAIN can provide further insight.

Related

How can i speed up the left join in my query using indexes?

I am new to SQL. At the moment I am experiencing some slower MySQL queries. I think I need to improve my indexes but not sure how.
drop temporary table if exists temp ;
CREATE TEMPORARY TABLE temp
(index idx_a (EXTRACT_DATE, project_id, SERVICE_NAME) )
select distinct DATE(c.EXTRACT_DATETIME) as EXTRACT_DATE,p.project_id, p.project_name, c.CLUSTER_NAME, c.SERVICE_NAME,
UPPER(CONCAT(SUBSTRING_INDEX(c.ENV_NAME, '-', 1),'-',c.CLUSTER_NAME)) as CLUSTER_ID
from p
left join c
on p.project_id = c.project_id ;
The short answer is that you need indexes at least to optimize the lookups done by the JOIN. The explain shows that both tables you are joining are doing a full table scan, then joining them the hard was, using "block nested loop" which indicates it is not using an index.
It would help to at least create an index on c.project_id.
ALTER TABLE c ADD INDEX (project_id);
This would mean there is still a table-scan to read the p table (estimated 5720 rows), but at least when it needs to find the related rows in c, it only reads the rows it needs, without doing a table-scan of 287K rows for each row of p.
The query you posted in an earlier question had another condition:
where DAYNAME(c.EXTRACT_DATETIME) = 'Friday' ;
I don't know why you haven't included this condition in the new question you posted.
If this is still a condition you need to handle, this could help optimize the query further. MySQL 5.7 (which you said in the other question you are using) supports virtual columns, defined for an expression, and you can index virtual columns.
ALTER TABLE c
ADD COLUMN isFriday AS (DAYNAME(EXTRACT_DATETIME) = 'Friday'),
ADD INDEX (isFriday);
Then if you search on the new isFriday column, or even if you search on the same expression used for the virtual column definition, it will use the index.
So what you really need is an index on c that uses both columns, one for the join, and then for the additional condition.
ALTER TABLE c
ADD COLUMN isFriday AS (DAYNAME(EXTRACT_DATETIME) = 'Friday'),
ADD INDEX (project_id, isFriday);
You aren’t filtering on anything other than the outer join column. This leads me to expect that most of the rows in both tables are going to need reading. In order to do this only once, you may be best off using a hash join rather than a nested loop and index. A hash join will allow both tables to be read completely once rather than the back and forth approach of a nested loop which will likely mean the same pages read each time a row is looked up.
In order to use hash joins, you need to be running and a version of MySQL at least above version 8. It would be recommended to use the latest available stable release.

MYSQL search query optimization from two many-to-many tables

I have three tables.
tbl_post for a table of posts. (post_idx, post_created, post_title, ...)
tbl_mention for a table of mentions. (mention_idx, mention_name, mention_img, ...)
tbl_post_mention for a unique many-to-many relation between the two tables. (post_idx, mention_idx)
For example,
PostA can have MentionA and MentionB.
PostB can have MentionA and MentionC.
PostC cannot have MentionC and MentionC.
tbl_post has about million rows, tbl_mention has less than hundred rows, and tbl_post_mention has a couple of million rows. All three tables are heavily loaded with foreign keys, unique indices, etc.
I am trying to make two separate search queries.
Search for post ids with all the given mention ids[AND condition]
Search for post ids with any of the given mention ids[OR condition]
Then join with tbl_post and tbl_mention to populate with meaningful data, order the results, and return the top n. In the end, I hope to have a n list of posts with all the data required for my service to display on the front end.
Here are the respective simpler queries
SELECT post_idx
FROM
(SELECT post_idx, count(*) as c
FROM tbl_post_mention
WHERE mention_idx in (1,95)
GROUP BY post_idx) AS A
WHERE c >= 2;
The problem with this query is that it is already inefficient before the joins and ordering. This process alone takes 0.2 seconds.
SELECT DISTINCT post_idx
FROM tbl_post_mention
WHERE mention_idx in (1,95);
This is a simple index range scan, but because of the IN statement, the query becomes expensive again once you start joining it with other tables.
I tried more complex and "clever" queries and tried indexing different sets of columns with no avail. Are there special syntaxes that I could use in this case? Maybe a clever trick? Partitioning? Or am I missing some fundamental concept here... :(
Send help.
The query you want is this:
SELECT post_idx
FROM tbl_post_mention
WHERE mention_idx in (1,95)
GROUP BY post_idx
HAVING COUNT(*) >= 2
The HAVING clause does your post-GROUP BY filtering.
The index that will help you is this.
CREATE INDEX mentionsdex ON tbl_post_mention (mention_idx, post_idx);
It covers your query by allowing rapid lookup by mention_idx then grouping by post_idx.
Often so-called join tables with two columns -- like your tbl_post_mention -- work most efficiently when they have a pair of indexes with the columns in opposite orders.

Need some clarification on indexes (WHERE, JOIN)

We are facing some performance issues in some reports that work on millions of rows. I tried optimizing sql queries, but it only reduces the time of execution to half.
The next step is to analyse and modify or add some indexes, therefore i have some questions:
1- the sql queries contain a lot of joins: do i have to create an index for each foreignkey?
2- Imagine the request SELECT * FROM A LEFT JOIN B on a.b_id = b.id where a.attribute2 = 'someValue', and we have an index on the table A based on b_id and attribute2: does my request use this index for the where part ( i know if the two conditions were on the where clause the index will be used).
3- If an index is based on columns C1, C2 and C3, and I decided to add an index based on C2, do i need to remove the C2 from the first index?
Thanks for your time
You can use EXPLAIN query to see what MySQL will do when executing it. This helps a LOT when trying to figure out why its slow.
JOIN-ing happens one table at a time, and the order is determined by MySQL analyzing the query and trying to find the fastest order. You will see it in the EXPLAIN result.
Only one index can be used per JOIN and it has to be on the table being joined. In your example the index used will be the id (primary key) on table B. Creating an index on every FK will give MySQL more options for the query plan, which may help in some cases.
There is only a difference between WHERE and JOIN conditions when there are NULL (missing rows) for the joined table (there is no difference at all for INNER JOIN). For your example the index on b_id does nothing. If you change it to an INNER JOIN (e.g. by adding b.something = 42 in the where clause), then it might be used if MySQL determines that it should do the query in reverse (first b, then a).
No.. It is 100% OK to have a column in multiple indexes. If you have an index on (A,B,C) and you add another one on (A) that will be redundant and pointless (because it is a prefix of another index). An index on B is perfectly fine.

How to optimize this complex query?

How i can optimize this query? for now it's executing in 0.0100 second.
SELECT comments.comment_content, comments.comment_votes, comments.comment_date,
users.user_login, users.user_level, users.user_avatar_source,
groups.group_safename
FROM comments
LEFT JOIN links ON comment_link_id=link_id
LEFT JOIN users ON comment_user_id=user_id
LEFT JOIN groups ON comment_group_id=link_group_id
WHERE comment_status='published' AND link_status='published'
ORDER BY comment_id DESC
EXPLAIN output:
Indexes:
Comment:
Users:
Groups:
Sub-twenty-millisecond query times aren't usually considered to be slow. As some folks have mentioned in the comments, it will be necessary for you to redo your optimization when your tables get larger, because MySQL's optimizer (and optimizers for other RDMSs) makes decisions based on index size.
I recommend you always qualify your column names in JOIN clauses with table names or aliases. For example, you will gain clarity and maintainability by using a style like this:
FROM comments AS c
LEFT JOIN links AS L ON c.comment_link_id=L.link_id
LEFT JOIN users AS u ON c.comment_user_id=u.user_id
LEFT JOIN groups AS g ON c.comment_group_id=g.link_group_id
This query selects a fairly broad subset of your tables, so it will run slower the larger your tables are. That's inevitable unless you can narrow the subset somehow.
Are the columns you're using for JOIN ... ON operations all declared NOT NULL? They should be.
Looking at how you are using the groups table: You're joining on link_group_id and retrieving group_safename. So, try a compound covering index on (link_group_id,group_safename). At a minimum, index link_group_id.
The users table: You've already got an index on user_id. When your tables get bigger a compound covering index on (user_id, user_login, user_level, user_avatar_source) may help. But that's a low-priority thing to try.
The links table: You're using link_status and link_id. Your LEFT JOIN for this table should be a plain inner JOIN because one of its columns shows up in your WHERE clause. If link_status can be NOT NULL in your application make sure it is declared that way. Then try a compound index on (link_status, link_id).
The comments table: You have no index on comment_status as far as I can see. Try adding one.
Then put a bunch of data in your tables, run OPTIMIZE LOCAL TABLE for each table, then try your query with EXPLAIN again.

How do I tell the MySQL Optimizer to use the index on a derived table?

Suppose you have a query like this...
SELECT T.TaskID, T.TaskName, TAU.AssignedUsers
FROM `tasks` T
LEFT OUTER JOIN (
SELECT TaskID, GROUP_CONCAT(U.FirstName, ' ',
U.LastName SEPARATOR ', ') AS AssignedUsers
FROM `tasks_assigned_users` TAU
INNER JOIN `users` U ON (TAU.UserID=U.UserID)
GROUP BY TaskID
) TAU ON (T.TaskID=TAU.TaskID)
Multiple people can be assigned to a given task. The purpose of this query is to show one row per task, but with the people assigned to the task in a single column
Now... suppose you have the proper indexes setup on tasks, users, and tasks_assigned_users. The MySQL Optimizer will still not use the TaskID index when joining tasks to the derived table. WTF?!?!?
So, my question is... how can you make this query use the index on tasks_assigned_users.TaskID? Temporary tables are lame, so if that's the only solution... the MySQL Optimizer is stupid.
Indexes used:
tasks
PRIMARY - TaskID
users
PRIMARY - UserID
tasks_assigned_users
PRIMARY - (TaskID,UserID)
Additional index UNIQUE - (UserID,TaskID)
EDIT: Also, this page says that derived tables are executed/materialized before joins occur. Why not re-use the keys to perform the join?
EDIT 2: MySQL Optimizer won't let you put index hints on derived tables (presumably because there are no indexes on derived tables)
EDIT 3: Here is a really nice blog post about this: http://venublog.com/2010/03/06/how-to-improve-subqueries-derived-tables-performance/ Notice that Case #2 is the solution I'm looking for, but it appears that MySQL does not support this at this time. :(
EDIT 4: Just found this: "As of MySQL 5.6.3, the optimizer more efficiently handles subqueries in the FROM clause (that is, derived tables):... During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Seems promising...
There is a solution to this in MySQL Server 5.6 - the preview release (at the time of this writing).
http://dev.mysql.com/doc/refman/5.6/en/from-clause-subquery-optimization.html
Although, I'm not sure if the MySQL Optimizer will re-use indexes that already exist when it "adds indexes to the derived table"
Consider the following query:
SELECT * FROM t1
JOIN (SELECT * FROM t2) AS derived_t2 ON t1.f1=derived_t2.f1;
The documentation says: "The optimizer constructs an index over column f1 from derived_t2 if doing so would permit the use of ref access for the lowest cost execution plan."
OK, that's great, but does the optimizer re-use indexes from t2? In other words, what if an index existed for t2.f1? Does this index get re-used, or does the optimizer recreate this index for the derived table? Who knows?
EDIT: The best solution until MySQL 5.6 is to create a temporary table, create an index on that table, and then run the SELECT query on the temp table.
The problem I see is that by doing a subquery there is no underlying indexed table.
If you are having a performance I'd do the grouping at the end, something like this:
SELECT T.TaskID, T.TaskName, GROUP_CONCAT(U.FirstName, ' ', U.LastName SEPARATOR ', ') AS AssignedUsers
FROM `tasks` T
LEFT OUTER JOIN `tasks_assigned_users` TAU ON (T.TaskID=TAU.TaskID)
INNER JOIN `users` U ON (TAU.UserID=U.UserID)
GROUP BY T.TaskID, T.TaskName
I'm afraid, it's not possible. You have to create a temporary table or a view to use an index.