Indexing for joins in mysql database - mysql

Had a quick question about joins in mysql and indexing. If I had 2 tables:
table1
id
name
table2
table1id
table2title
And I join table2 and table1 using id and table1id would I add a index to id on table1 and table1id in table2? Or would I just add an index to one of the tables? Im using MySQL with MyISAM version 5.x

If table1.id is a primary key, then you don't need to index it. (Primary keys are automatically indexed)
If not, then you'll need to index table1.id and table2.table1id
Use "EXPLAIN" in you selects to see what indexes you are hitting.

Yes, add an index on both id columns (as another poster said, primary columns are indexed). Indices allow MySQL to quickly locate the row in a data file, instead of reading sequentially.
If you need rows from both tables given an id, index both tables for optimal performance. Else, the initial clause (SELECT...WHERE) will run quickly, and the JOIN will be slow (or vice versa), resulting in a slow query.

Related

Why does adding an index to one of the two keys used to join two tables make the execution time slower than not using any index at all?

I am executing queries where two tables authors and work_authors are being joined. I am using MySQL. authors and work_authors have a one-to-many relationship and authors contains around 7 million records and work_authors contains around 21 million records.
SELECT t2.author_id, t1.work_id, t2.name
FROM work_author AS t1,
authors AS t2
WHERE t1.author_id = t2.id AND
t2.name LIKE '%Tolkien%';
Firstly, there is no index at all in the database, i.e. no primary key or other indexes. Then the query on average takes 24 seconds on my machine. When adding index i.e. making the id-column of authors primary key, but not any index or primary key for the work_author table, the query takes on average 64 seconds on my machine.
Secondly, when I add an non-unique index to column author_id the work_author table the query on average takes 4 seconds on my machine.
How can this be the case? Why is the query taking longer when adding index to one of the attributes used to join than having no index at all? And then when adding index to the both join keys it executes faster than the other two alternatives. Anyone who can explain this?
No index can help LIKE '%Tolkien%'. However, ...
Consider adding a FULLTEXT index to authors.name and using
MATCH(name) AGAINST('+Tolkein' IN BOOLEAN MODE)
This will run a lot faster.
PRIMARY KEYs are important.
Using InnoDB is important.
Do use the JOIN...ON syntax instead of commajoin.
A many-to-many table is optimized by having two indexes as discussed here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Indexing for a Greater/Less than query

I am trying to do an update between two very large tables (millions of records)
Update TableA as T1
Inner Join TableB as T2
On T1.Field1=T2.Field1
And T1.Value >= T2.MinValue
And T1.Value <= T2.MaxValue
Set T1.Field2=T2.Field2
I have indexes on all the fields individually in both tables, Value fields included. They are all Normal/BTree (default).
Is there some better way to index the value fields to get better performance? It is adding a huge overhead to what is otherwise (using just =) a pretty fast update otherwise. My first test too 3 minutes to update 25 records and I have 5 million to update.
Generally, only one index per table can be used for a particular WHERE clause. So if it uses the index on Field1, it won't be able to use the index on Value, and vice versa. The solution is to use a composite index:
ALTER TABLE TableA ADD INDEX (Field1, Value);
Since an index on multiple columns is effectively an index on any prefix set of the columns, you can remove the individual index on Field1 when you add this index. But you should still keep the individual index on Value if that's needed for other queries.

sql count results query with joins perfomance

I have the following tables (example)
t1 (20.000 rows, 60 columns, primary key t1_id)
t2 (40.000 rows, 8 columns, primary key t2_id)
t3 (50.000 rows, 3 columns, primary key t3_id)
t4 (30.000 rows, 4 columns, primary key t4_id)
sql query:
SELECT COUNT(*) AS count FROM (t1)
JOIN t2 ON t1.t2_id = t2.t2_id
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id
I have created indexes on columns that affect the join (e.g on t1.t2_id) and foreign keys where necessary. The query is slow (600 ms) and if I put where clauses (e.g. WHERE t1.column10 = 1, where column10 doesn't have index), the query becomes much slower. The queries I do with select (*) and LIMIT are fast, and I can't understand count behaviour. Any solution?
EDIT: EXPLAIN SQL ADDED
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t4 index PRIMARY user_id 4 NULL 5259 Using index
1 SIMPLE t2 ref PRIMARY,t4_id t4_id 4 t4.t4_id 1 Using index
1 SIMPLE t1 ref t2_id t2_id 4 t2.t2_id 1 Using index
1 SIMPLE t3 ref PRIMARY PRIMARY 4 t2.t2_id 1 Using index
where user_id is a column of t4 table
EDIT: I changed from innodb to myisam and i had a speed increase, especially if i put where clauses. But i h still have times (100-150 ms) The reason i want count in my application, is to the the user who is processing a search form, the number of results he is expecting with ajax. May be there is a better solution in this, for example creating a temporary table, that is updated every one hour?
The count query is simply faster because of INDEX ONLY SCAN, as stated within query plan. The query you mention consists of only indexed columns, and thats why during execution there is no need to touch physical data - all query is performed on indexes. When you put some additional clause consisting of columns that are not indexed, or indexed in a way that prevents index usage there is a need to access data stored in a heap table by physical address - which is very slow.
EDIT:
Another important thing is that, those are PKs, so they are UNIQUE. Optimizer choses to perform INDEX RANGE SCAN on the first index, and only checks if keys exist in subsequent indexes (that's why the plan states there will be only one row returned).
EDIT2:
Thx to J. Bruni, in fact that is clustered index co the above isn't the "whole truth". There is probably full scan on the first table, and three subsequent INDEX ACCESSes to confirm the FK existance.
count iterate over whole result set and does not depends on indexes. Use EXPLAIN ANALYSE for your query to check how it is executed.
select + limit does not iterate whole result set, hence it's faster
Regarding the COUNT(*) slow performance: are you using InnoDB engine? See:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
"SELECT COUNT(*)" is slow, even with where clause
The main information seems to be: "InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages."
So, one possible solution is to create a separated index and force its usage through USE INDEX command in the SQL query. Look at this comment for a sample usage report:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/comment-page-1/#comment-529049
Regarding the WHERE issue, the query will perform better if you put the condition in the JOIN clause, like this:
SELECT COUNT(t1.t1_id) AS count FROM (t1)
JOIN t2 ON (t1.column10 = 1) AND (t1.t2_id = t2.t2_id)
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id

There is a way to index information on different tables in MySQL

My MySql schema looks like the following
create table TBL1 (id, person_id, ....otherData)
create table TBL2 (id, tbl1_id, month,year, ...otherData)
I am querying this schema as
select * from TBL1 join TBL2 on (TBL2.tbl1_id=TBL1.id)
where TBL1.person_id = ?
and TBL2.month=?
and TBL2.year=?
The current problem is that there is about 18K records on TBL1 associated with some person_id and there is also about 20K records on TBL2 associated with the same values of month/year.
For now i have two indexes.
index1 on TBL1(person_id) and other on index2 on TBL2(month,year)
when the database runs the query it uses index1 (ignoring month and year params) or index2 (ignoring person_id param). So, in both cases it scans about 20K records and doesn't perform as expected.
There is any way for me to create a single index on both tables or tell to mysql to merge de index on querying?
No, an index can belong to only one table. You will need to look at the EXPLAIN for this query to see if you can determine where the performance issue is coming from.
Do you have indexes on TBL2.tbl1_id and TBL1.id?
No. Indexes are on single tables.
You need compound indices on both table, to include the join column. If you add "ID" to both indices, the query optimizer should pick that up.
Can you post an "EXPLAIN"?

How to properly index a linking table for many-to-many connection in MySQL?

Lets say I have a simple many-to-many table between tables "table1" and "table2" that consists from two int fields: "table1-id" and "table2-id". How should I index this linking table?
I used to just make a composite primary index (table1-id,table2-id), but I read that this index might not work if you change order of the fields in the query. So what's the optimal solution then - make independent indexes for each field without a primary index?
Thanks.
It depends on how you search.
If you search like this:
/* Given a value from table1, find all related values from table2 */
SELECT *
FROM table1 t1
JOIN table_table tt ON (tt.table_1 = t1.id)
JOIN table2 t2 ON (t2.id = tt.table_2)
WHERE t1.id = #id
then you need:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_1, table_2)
In this case, table1 will be leading in NESTED LOOPS and your index will be usable only when table1 is indexed first.
If you search like this:
/* Given a value from table2, find all related values from table1 */
SELECT *
FROM table2 t2
JOIN table_table tt ON (tt.table_2 = t2.id)
JOIN table1 t1 ON (t1.id = tt.table_1)
WHERE t2.id = #id
then you need:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_2, table_1)
for the reasons above.
You don't need independent indices here. A composite index can be used everywhere where a plain index on the first column can be used. If you use independent indices, you won't be able to search efficiently for both values:
/* Check if relationship exists between two given values */
SELECT 1
FROM table_table
WHERE table_1 = #id1
AND table_2 = #id2
For a query like this, you'll need at least one index on both columns.
It's never bad to have an additional index for the second field:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 PRIMARY KEY (table_1, table_2)
CREATE INDEX ix_table2 ON table_table (table_2)
Primary key will be used for searches on both values and for searches based on value of table_1, additional index will be used for searches based on value of table_2.
As long as you are specifying both keys in the query, it doesn't matter what order they have in the query, nor does it matter what order you specify them in the index.
However, it's not unlikely that you will sometimes have only one or the other of the keys. If you sometimes have id_1 only, then that should be the first (but you still only need one index).
If you sometimes have one, sometimes the other, sometimes both, you'll need one index with both keys, and a second (non-unique) index with one field - the more selective of the two keys - and the primary composite index should start with the other key.
#Quassnoi, in your first query you're actually using only tt.table_1 key as we can see from the WHERE-clause: WHERE t1.id = #id. And in the second query - only tt.table_2.
So the multi-column index could be useful only in the third query because of WHERE table_1 = #id1 AND table_2 = #id2. If the queries of this kind are not going to be used, do you think it's worth to use two separate one-column indices instead?