What index should I use when using JOIN on PRIMARY KEY

What index should I use when using JOIN on PRIMARY KEY - mysql

I'm trying to optimise the following MySQL query
SELECT Hotel.HotelId, Hotel.Name, Hotel.Enabled, Hotel.IsClosed,
HotelRoom.HotelId, HotelRoom.RoomId, HotelRoom.Name AS RoomName
FROM Hotel
INNER JOIN
HotelRoom ON Hotel.HotelId = HotelRoom.HotelId
WHERE Hotel.IsClosed = 0
AND Hotel.Enabled = 1
AND HotelRoom.Deleted = 0
AND HotelRoom.Enabled = 1
AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL,
FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0
ORDER BY Hotel.HotelId ASC, HotelRoom.RoomId ASC
The PRIMARY KEYS are Hotel.Hotel and HotelRoom.RoomId, and I've got a FOREIGN KEY from HotelRoom.HotelId to Hotel.HotelId.
Should I be creating a INDEX for (Hotel.IsClosed, Hotel.Enabled) and (HotelRoom.Deleted, HotelRoom.Enabled) which is used in the WHERE clause, and should this index include the PRIMARY key so for example I should create a INDEX for (Hotel.HotelId, Hotel.IsClosed, Hotel.Enabled)
EDIT 1
I've added the following in the WHERE statement AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL, FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0 Should these also be included in INDEX
This is what the EXPLAIN statement is showing for this query
I added both INDEX suggestions but when I ran the EXPLAIN statement they both showed that no key was going to be used

There are two potential indexing strategies here, depending on which of the two tables appears on the left side of the inner join (either table could potentially appear on either side of the join). Given that the HotelRoom table likely contains many more records than the Hotel table, I would suggest placing the Hotel table on the left side of the join. This would imply that the Hotel table would be scanned, and the index used for the join to HotelRoom. Then, we can try using the following index on HotelRoom:
CREATE INDEX hotel_room_idx ON HotelRoom (HotelId, Deleted, Enabled, Name, RoomId);
This should speed up the join substantially, covers the WHERE clause, and also covers all columns in the select on HotelRoom. Note that the following simplified index might also be very effective:
CREATE INDEX hotel_room_idx ON HotelRoom (HotelId, Deleted, Enabled);
This just covers the join and WHERE clause, but MySQL might still choose to use it.

MySQL's Optimizer does not care which table comes first in a JOIN. It will look at statistics (etc) to decide for itself whether to start with Hotel or HotelRoom. So, you should write indexes for both cases, so as not to restrict the Optimizer.
MySQL almost always performs a JOIN by scanning one table. Then, for each row in that table, look up the necessary row(s) in the other table. See "Nested Loop Join" or "NLJ". This implies that the optimal indexes are (often) thus: For the 'first' table, columns of the WHERE clause involving the first table. For the second table, the columns from both the WHERE and ON clauses involving the second table.
Assuming that the Optimizer started with Hotel:
Hotel: INDEX(IsClosed, Enabled) -- in either order
HotelRoom: INDEX(Deleted, Enabled, HotelId) -- in any order
If it started with HotelRoom:
HotelRoom: INDEX(Deleted, Enabled) -- in either order
Hotel: PRIMARY KEY(HotelId) -- which you already have?
If there are a lot of closed/disabled hotels, then this may be beneficial:
Hotel: INDEX(IsClosed, Enabled, HotelId)
As Tim mentioned, it may be beneficial to augment an index to include the rest of the columns mentioned, thereby making the index "covering". (But don't do this with the PRIMARY KEY or any UNIQUE key.)
If you provide SHOW CREATE TABLE and the sizes of the tables, we might have further suggestions.

Related

MySQL query index

I am using MySQL 5.6 and try to optimize next query:
SELECT t1.field1,
...
t1.field30,
t2.field1
FROM Table1 t1
JOIN Table2 t2 ON t1.fk_int = t2.pk_int
WHERE t1.int_field = ?
AND t1.enum_filed != 'value'
ORDER BY t1.created_datetime desc;
A response can contain millions of records and every row consists of 31 columns.
Now EXPLAIN says in Extra that planner uses 'Using where'.
I tried to add next index:
create index test_idx ON Table1 (int_field, enum_filed, created_datetime, fk_int);
After that EXPLAIN says in Extra that planner uses "Using index condition; Using filesort"
"rows" value from EXPLAIN with index is less than without it. But in practice time of execution is longer.
So, the questions are next:
What is the best index for this query?
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
Should the fields from ORDER BY be in index?
Should the fields from JOIN be in index?

try refactor your index this way
avoid (o move to the right after fk_int) the created_datetime column.. and move fk_int before the enum_filed column .. the in this wahy the 3 more colums used for filter shold be use better )
create index test_idx ON Table1 (int_field, fk_int, enum_filed);
be sure you have also an specific index on table2 column pk_int. if you have not add
create index test_idx ON Table2 (int_field, fk_int, enum_filed);

What is the best index for this query?
Maybe (int_field, created_datetime) (See next Q&A for reason.)
Why EXPLAIN says that 'key_len' of query with index is 5. Shouldn't it be 4+1+8+4=17?
enum_filed != defeats the optimizer. If there is only one other value for that enum (and it is NOT NULL), then use = and the other value. And try INDEX(int_field, enum_field, created_datetime) The Optimizer is much happier with = than with any inequality.
"5" could be indicating 2 columns, or it could be indicating one INT that is Nullable. If int_field can be NULL, consider changing it to NOT NULL; then the "5" would drop to "4".
Should the fields from ORDER BY be in index?
Only if the index can completely handle the WHERE. This usually occurs only if all the WHERE tests are =. (Hence, my previous answer.)
Another case for including those columns is "covering"; see next Q&A.
Should the fields from JOIN be in index?
It depends. One thing that gives some performance benefit is to include all columns mentioned anywhere in the SELECT. This is called a "covering" index and is indicated in EXPLAIN by Using index (not Using index condition). There are too many columns in t1 to add a "covering" index. I think the practical limit is about 5 columns.

My guess for your question № 1:
create index my_idx on Table1(int_field, created_datetime desc, fk_int)
or one of these (but neither will probably be worthwhile):
create index my_idx on Table1(int_field, created_datetime desc, enum_filed, fk_int)
create index my_idx on Table1(int_field, created_datetime desc, fk_int, enum_filed)
I'm supposing 3 things:
Table2.pk_int is already a primary key, judging by the name
The where condition on Table1.int_field is only satisfied by a small subset of Table1
The inequality on Table1.enum_filed (I would fix the typo, if I were you) only excludes a small subset of Table1
Question № 2: the key_len refers to the keys used. Don't forget that there is one extra byte for nullable keys. In your case, if int_field is nullable, it means that this is the only key used, otherwise both int_field and enum_filed are used.
As for questions 3 and 4: If, as I suppose, it's more efficient to start the query plan from the where condition on Table1.int_field, the composite index, in this case also with the correct sort order (desc), enables a scan of the index to get the output rows in the correct order, without an extra sort step. Furthermore, adding also fk_int to the index makes the retrieval of any record of Table1 unnecessary unless a corresponding record is present in Table2. For a similar reason you could also add enum_filed to the index, but, if this doesn't considerably reduce the output record count, the increase in index size will make things worse instead of better. In the end, you will have to try it out (with realistic data!).
Note that if you put another column between int_field and created_datetime in the index, the index won't provide the created_datetime (for a given int_field) in the desired output order.

The issue was fixed by adding more filters (to where clause) to the query.
Regarding to indexes 2 proposed indexes were helpful:
From #WalterTross with next index for initial query:
(int_field, created_datetime desc, enum_filed, fk_int)
With my short comment: desc indexes is not supported at MySQL 5.6 - this key word just reserved.
From #RickJames with next index for modified query:
(int_field, created_datetime)
Thanks everyone who tried to help. I really appreciate it.

mysql index column order for join

I have two table (requests, results)
requests:
email
results:
email, processed_at
I now want to get all results that have a request with the same email and that have not been processed:
SELECT * FROM results
INNER JOIN requests ON requests.email = results.email
AND results.processed_at IS NULL
I have an index on each individual column, but the query is very slow. So I assume I need a multi column index on results:
I am just not sure which order the columns have to be:
ALTER TABLE results
ADD INDEX results_email_processed_at (email,processed_at)
ALGORITHM=INPLACE LOCK=NONE;
or
ALTER TABLE results
ADD INDEX results_processed_at_email (processed_at,email)
ALGORITHM=INPLACE LOCK=NONE;

Either composite index will be equally beneficial.
However, if you are fetching 40% of the table, then the Optimizer may choose to ignore any index and simply scan the table.
Is that SELECT the actual query? If not, please show us the actual query; a number of seemingly minor changes could make a big difference in optimization options.
Please provide EXPLAIN SELECT ... so we can see what it thinks with the current index(es). And please provide SHOW CREATE TABLE in case there are datatype issues that are relevant.

Not withstanding any indexing issues, you explicitly asked about all requests that WERE NOT processed. You have an INNER JOIN which means I WANT FROM BOTH Sides, so your NULL check in the where would never qualify.
You need a LEFT JOIN to the results table.
As for index, since the join is on the email, I would just have the EMAIL as the primary component of the index. By having a covering index and including the processed_at column would be faster as it would not have to go to the raw data page to qualify the results, but have index specifically ordered as (email, processed_at) so the EMAIL is first qualifier, THEN when it was processed comes along for the ride to complete the query requirement fields.

Need some clarification on indexes (WHERE, JOIN)

We are facing some performance issues in some reports that work on millions of rows. I tried optimizing sql queries, but it only reduces the time of execution to half.
The next step is to analyse and modify or add some indexes, therefore i have some questions:
1- the sql queries contain a lot of joins: do i have to create an index for each foreignkey?
2- Imagine the request SELECT * FROM A LEFT JOIN B on a.b_id = b.id where a.attribute2 = 'someValue', and we have an index on the table A based on b_id and attribute2: does my request use this index for the where part ( i know if the two conditions were on the where clause the index will be used).
3- If an index is based on columns C1, C2 and C3, and I decided to add an index based on C2, do i need to remove the C2 from the first index?
Thanks for your time

You can use EXPLAIN query to see what MySQL will do when executing it. This helps a LOT when trying to figure out why its slow.
JOIN-ing happens one table at a time, and the order is determined by MySQL analyzing the query and trying to find the fastest order. You will see it in the EXPLAIN result.
Only one index can be used per JOIN and it has to be on the table being joined. In your example the index used will be the id (primary key) on table B. Creating an index on every FK will give MySQL more options for the query plan, which may help in some cases.
There is only a difference between WHERE and JOIN conditions when there are NULL (missing rows) for the joined table (there is no difference at all for INNER JOIN). For your example the index on b_id does nothing. If you change it to an INNER JOIN (e.g. by adding b.something = 42 in the where clause), then it might be used if MySQL determines that it should do the query in reverse (first b, then a).
No.. It is 100% OK to have a column in multiple indexes. If you have an index on (A,B,C) and you add another one on (A) that will be redundant and pointless (because it is a prefix of another index). An index on B is perfectly fine.

Optimizing MySQL Left join query between 3 tables to reduce execution time

I have the following query:
SELECT region.id, region.world_id, min_x, min_y, min_z, max_x, max_y, max_z, version, mint_version
FROM minecraft_worldguard.region
LEFT JOIN minecraft_worldguard.region_cuboid
ON region.id = region_cuboid.region_id
AND region.world_id = region_cuboid.world_id
LEFT JOIN minecraft_srvr.lot_version
ON id=lot
WHERE region.world_id = 10
AND region_cuboid.world_id=10;
The Mysql slow query log tells me that it takes more than 5 seconds to execute, returns 2300 rows but examines 15'404'545 rows to return it.
The three tables each have bout 6500 rows only with unique keys on the id and lot fields as well as keys on the world_id fields. I tried to minimize the amount of rows examined by filtering both cuboid and world by their ID and the double WHERE on world_id, but it did not seem to help.
Any idea how I can optimize this query?
Here is the sqlfiddle with the indexes as of current status.

MySQL can't use index in this case because joined fields has different data types:
`lot` varchar(20) COLLATE utf8_unicode_ci NOT NULL
`id` varchar(128) COLLATE utf8_bin NOT NULL
If you change types of this fields to general type (for example, region.id to utf8_unicode_ci), MySQL uses primary key (fiddle).
According to docs:
Comparison of dissimilar columns (comparing a string column to a
temporal or numeric column, for example) may prevent use of indexes if
values cannot be compared directly without conversion.

You have joined the two tables "minecraft_worldguard.region" and "minecraft_worldguard.region_cuboid", on region.world_id and region_cuboid.world_id. So WHERE clause wouldn't require two conditions.
The two columns in the WHERE clause have been equated in the JOIN condition, hence you wouldn't require checking both the conditions in the WHERE clause. Remove one of them in the WHERE clause and add an index on the column that is remaining on the WHERE condition.
In your example, leave the WHERE clause as below:
WHERE region.world_id = 10
and add an index on the region.world_id column, that would improve the performance a bit.
NOTE: observe that I am suggesting you to discard "AND region_cuboid.world_id=10;" part of the WHERE clause.
Hope that helps.

First, when writing queries that have multiple tables, it is a very good thing to get used to "alias" references to the tables so you don't have to retype the entire long name throughout. Also, it is a really good idea to identify which tables the columns are coming from to allow users to better understand what is where which can also help improve performance (such as suggesting a covering index).
That said, I have applied aliases to your original query, but AM GUESSING the table per the respective columns, but you can obviously identify quickly and adjust.
SELECT
R.id,
R.world_id,
RC.min_x,
RC.min_y,
RC.min_z,
RC.max_x,
RC.max_y,
RC.max_z,
LV.version,
LV.mint_version
FROM
minecraft_worldguard.region R
LEFT JOIN minecraft_worldguard.region_cuboid RC
ON R.id = RC.region_id
AND R.world_id = RC.world_id
LEFT JOIN minecraft_srvr.lot_version LV
ON R.id = LV.lot
WHERE
R.world_id = 10
I also removed from the where clause your "region_cuboid.world_id = 10" as that is redundant as a result of the JOIN clause based on region AND world.
For suggestion of indexes, and if I have the proper alias references to the columns, I would suggest a covering index on the region table of
( world_id, id ). The "World_id" in the first position quickly qualifies the WHERE clause, and the "id" is there for the RC and LV tables.
For the region_cuboid table, I would also have an index on ( world_id, region_id) to match the region table being joined to it.
For the lot_version table, and index on (lot) or a covering index on (lot, version, mint_version)

How can I avoid a full table scan on this mysql query?

explain
select
*
from
zipcode_distances z
inner join
venues v
on z.zipcode_to=v.zipcode
inner join
events e
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
I'm trying to find all "events at venues within 5 miles of zipcode 92108", however, I am having a hard time optimizing this query.
Here is what the explain looks like:
id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, e, ALL, idx_venue_id, , , , 60024,
1, SIMPLE, v, eq_ref, PRIMARY,idx_zipcode, PRIMARY, 4, comedyworld.e.venue_id, 1,
1, SIMPLE, z, ref, idx_zip_from_distance,idx_zip_to_distance,idx_zip_from_to, idx_zip_from_to, 30, const,comedyworld.v.zipcode, 1, Using where; Using index
I'm getting a full table scan on the "e" table, and I can't figure out what index I need to create to get it to be fast.
Any advice would be appreciated
Thank you

Based on the EXPLAIN output in your question, you already have all the indexes the query should be using, namely:
CREATE INDEX idx_zip_from_distance
ON zipcode_distances (zipcode_from, distance, zipcode_to);
CREATE INDEX idx_zipcode ON venues (zipcode, id);
CREATE INDEX idx_venue_id ON events (venue_id);
(I'm not sure from your index names whether idx_zip_from_distance really includes the zipcode_to column. If not, you should add it to make it a covering index. Also, I've included the venues.id column in idx_zipcode for completeness, but, assuming it's the primary key for the table and that you're using InnoDB, it will be included automatically anyway.)
However, it looks like MySQL is choosing a different, and possibly suboptimal, query plan, where it scans through all events, finds their venues and zip codes, and only then filters the results on distance. This could be the optimal query plan, if the cardinality of the events table was low enough, but from the fact that you're asking this question I assume it's not.
One reason for the suboptimal query plan could be the fact that you have too many indexes which are confusing the planner. For instance, do you really need all three of those indexes on the zipcode table, given that the data it stores is presumably symmetric? Personally, I'd suggest only the index I described above, plus a unique index (which can also be the primary key, if you don't have an artificial one) on (zipcode_to, zipcode_from) (preferably in that order, so that any occasional queries on zipcode_to=? can make use of it).
However, based on some testing I did, I suspect the main issue why MySQL is choosing the wrong query plan comes simply down to the relative cardinalities of your tables. Presumably, your actual zipcode_distances table is huge, and MySQL isn't smart enough to realize quite how much the conditions in the WHERE clause really narrow it down.
If so, the best and simplest fix may be to simply force MySQL to use the indexes you want:
select
*
from
zipcode_distances z
FORCE INDEX (idx_zip_from_distance)
inner join
venues v
FORCE INDEX (idx_zipcode)
on z.zipcode_to=v.zipcode
inner join
events e
FORCE INDEX (idx_venue_id)
on v.id=e.venue_id
where
z.zipcode_from='92108' and
z.distance <= 5
With that query, you should indeed get the desired query plan. (You do need FORCE INDEX here, since with just USE INDEX the query planner could still decide to use a table scan instead of the suggested index, defeating the purpose. I had this happen when I first tested this.)
Ps. Here's a demo on SQLize, both with and without FORCE INDEX, demonstrating the issue.

Have indexed the columns in both tables?
e.id and v.venue_id
If you do not, creates indexes in both tables. If you already have, it could be that you have few records in one or more tables and analyzer detects that it is more efficient to perform a full scan rather than an indexed read.

You could use a subquery:
select * from zipcode_distances z, venues v, events e
where
z.id in (select id from zipcode z where z.zipcode_from='92108' and z.distance <= 5)
and z.zipcode_to=v.zipcode
and v.id=e.venue_id

You are selecting all columns from all tables (select *) so there is little point in the optimizer using an index when the query engine will then have to do a lookup from the index to the table on every single row.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008