mysql index column order for join - mysql

I have two table (requests, results)
requests:
email
results:
email, processed_at
I now want to get all results that have a request with the same email and that have not been processed:
SELECT * FROM results
INNER JOIN requests ON requests.email = results.email
AND results.processed_at IS NULL
I have an index on each individual column, but the query is very slow. So I assume I need a multi column index on results:
I am just not sure which order the columns have to be:
ALTER TABLE results
ADD INDEX results_email_processed_at (email,processed_at)
ALGORITHM=INPLACE LOCK=NONE;
or
ALTER TABLE results
ADD INDEX results_processed_at_email (processed_at,email)
ALGORITHM=INPLACE LOCK=NONE;

Either composite index will be equally beneficial.
However, if you are fetching 40% of the table, then the Optimizer may choose to ignore any index and simply scan the table.
Is that SELECT the actual query? If not, please show us the actual query; a number of seemingly minor changes could make a big difference in optimization options.
Please provide EXPLAIN SELECT ... so we can see what it thinks with the current index(es). And please provide SHOW CREATE TABLE in case there are datatype issues that are relevant.

Not withstanding any indexing issues, you explicitly asked about all requests that WERE NOT processed. You have an INNER JOIN which means I WANT FROM BOTH Sides, so your NULL check in the where would never qualify.
You need a LEFT JOIN to the results table.
As for index, since the join is on the email, I would just have the EMAIL as the primary component of the index. By having a covering index and including the processed_at column would be faster as it would not have to go to the raw data page to qualify the results, but have index specifically ordered as (email, processed_at) so the EMAIL is first qualifier, THEN when it was processed comes along for the ride to complete the query requirement fields.

Related

Should i create any indexes here to optimize my query?

now i'm trying to figure out, what should i do, to improve my query result.
Now, it's 47.55.
So, should i create any indexes for columns? Tell me please
SELECT bw.workloadId, lrer.lecturerId, lrer.lastname, lrer.name, lrer.fathername, bt.title, ac.activityname, cast(bw.exactday as char(45)) as "date", bw.exacttime as "time" FROM base_workload as bw
right join unioncourse as uc on uc.idunioncourse = bw.idunioncourse
right join basecoursea as bc on bc.idbasecoursea = uc.idbasecourse
right join lecturer as lrer on lrer.lecturerId = uc.lecturerId
right join basetitle as bt on bt.idbasetitle = bc.idbasetitle
right join activity as ac on ac.activityId = bc.activityId
where lrer.lecturerId is not null AND bc.idbasecoursea is not null and bw.idunioncourse != ""
ORDER BY bw.exactday, bw.exacttime ASC;
From MySQL 8.0 documentation:
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
MySQL use indexes for these operations:
To find the rows matching a WHERE clause quickly.
To eliminate rows from consideration.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.
To retrieve rows from other tables when performing joins.
To find the MIN() or MAX() value for a specific indexed column key_col.
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable index (for example, ORDER BY key_part1, key_part2).
In some cases, a query can be optimized to retrieve values without consulting the data rows.
As of your requirements, you could use index on the WHERE clause for faster data retrieval.
I think you can get rid of
lrer.lecturerId is not null
AND bc.idbasecoursea is not null
By changing the first 3 RIGHT JOINs to JOINs.
What is the datatype of exactday? What is the purpose of
cast(bw.exactday as char(45)) as "date"
The CAST may be unnecessary.
Re bw.exactday, bw.exacttime: It is usually better to use a single column for DATETIME instead of two columns (DATE and TIME).
What are the PRIMARY KEYs of the tables?
Please convert to LEFT JOIN if possible; I can't wrap my head around RIGHT JOINs.
This index on bw may help: INDEX(exactday, exacttime).

What index should I use when using JOIN on PRIMARY KEY

I'm trying to optimise the following MySQL query
SELECT Hotel.HotelId, Hotel.Name, Hotel.Enabled, Hotel.IsClosed,
HotelRoom.HotelId, HotelRoom.RoomId, HotelRoom.Name AS RoomName
FROM Hotel
INNER JOIN
HotelRoom ON Hotel.HotelId = HotelRoom.HotelId
WHERE Hotel.IsClosed = 0
AND Hotel.Enabled = 1
AND HotelRoom.Deleted = 0
AND HotelRoom.Enabled = 1
AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL,
FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0
ORDER BY Hotel.HotelId ASC, HotelRoom.RoomId ASC
The PRIMARY KEYS are Hotel.Hotel and HotelRoom.RoomId, and I've got a FOREIGN KEY from HotelRoom.HotelId to Hotel.HotelId.
Should I be creating a INDEX for (Hotel.IsClosed, Hotel.Enabled) and (HotelRoom.Deleted, HotelRoom.Enabled) which is used in the WHERE clause, and should this index include the PRIMARY key so for example I should create a INDEX for (Hotel.HotelId, Hotel.IsClosed, Hotel.Enabled)
EDIT 1
I've added the following in the WHERE statement AND IF(LENGTH(TRIM(sAuxiliaryIds)) > 0 AND sAuxiliaryIds IS NOT NULL, FIND_IN_SET(Hotel.AuxiliaryId, sAuxiliaryIds), 1=1) > 0 Should these also be included in INDEX
This is what the EXPLAIN statement is showing for this query
I added both INDEX suggestions but when I ran the EXPLAIN statement they both showed that no key was going to be used
There are two potential indexing strategies here, depending on which of the two tables appears on the left side of the inner join (either table could potentially appear on either side of the join). Given that the HotelRoom table likely contains many more records than the Hotel table, I would suggest placing the Hotel table on the left side of the join. This would imply that the Hotel table would be scanned, and the index used for the join to HotelRoom. Then, we can try using the following index on HotelRoom:
CREATE INDEX hotel_room_idx ON HotelRoom (HotelId, Deleted, Enabled, Name, RoomId);
This should speed up the join substantially, covers the WHERE clause, and also covers all columns in the select on HotelRoom. Note that the following simplified index might also be very effective:
CREATE INDEX hotel_room_idx ON HotelRoom (HotelId, Deleted, Enabled);
This just covers the join and WHERE clause, but MySQL might still choose to use it.
MySQL's Optimizer does not care which table comes first in a JOIN. It will look at statistics (etc) to decide for itself whether to start with Hotel or HotelRoom. So, you should write indexes for both cases, so as not to restrict the Optimizer.
MySQL almost always performs a JOIN by scanning one table. Then, for each row in that table, look up the necessary row(s) in the other table. See "Nested Loop Join" or "NLJ". This implies that the optimal indexes are (often) thus: For the 'first' table, columns of the WHERE clause involving the first table. For the second table, the columns from both the WHERE and ON clauses involving the second table.
Assuming that the Optimizer started with Hotel:
Hotel: INDEX(IsClosed, Enabled) -- in either order
HotelRoom: INDEX(Deleted, Enabled, HotelId) -- in any order
If it started with HotelRoom:
HotelRoom: INDEX(Deleted, Enabled) -- in either order
Hotel: PRIMARY KEY(HotelId) -- which you already have?
If there are a lot of closed/disabled hotels, then this may be beneficial:
Hotel: INDEX(IsClosed, Enabled, HotelId)
As Tim mentioned, it may be beneficial to augment an index to include the rest of the columns mentioned, thereby making the index "covering". (But don't do this with the PRIMARY KEY or any UNIQUE key.)
If you provide SHOW CREATE TABLE and the sizes of the tables, we might have further suggestions.

Index when using OR in query

What is the best way to create index when I have a query like this?
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
I know that only one index can be used in a query so I can't create two indexes, one for user_1 and one for user_2.
Also could solution for this type of query be used for this query?
WHERE ((user_1 = '$user_id' AND user_2 = '$friend_id') OR (user_1 = '$friend_id' AND user_2 = '$user_id'))
MySQL has a hard time with OR conditions. In theory, there's an index merge optimization that #duskwuff mentions, but in practice, it doesn't kick in when you think it should. Besides, it doesn't give as performance as a single index when it does.
The solution most people use to work around this is to split up the query:
SELECT ... WHERE user_1 = ?
UNION
SELECT ... WHERE user_2 = ?
That way each query will be able to use its own choice for index, without relying on the unreliable index merge feature.
Your second query is optimizable more simply. It's just a tuple comparison. It can be written this way:
WHERE (user_1, user_2) IN (('$user_id', '$friend_id'), ('$friend_id', '$user_id'))
In old versions of MySQL, tuple comparisons would not use an index, but since 5.7.3, it will (see https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html).
P.S.: Don't interpolate application code variables directly into your SQL expressions. Use query parameters instead.
I know that only one index can be used in a query…
This is incorrect. Under the right circumstances, MySQL will routinely use multiple indexes in a query. (For example, a query JOINing multiple tables will almost always use at least one index on each table involved.)
In the case of your first query, MySQL will use an index merge union optimization. If both columns are indexed, the EXPLAIN output will give an explanation along the lines of:
Using union(index_on_user_1,index_on_user_2); Using where
The query shown in your second example is covered by an index on (user_1, user_2). Create that index if you plan on running those queries routinely.
The two cases are different.
At the first case both columns needs to be searched for the same value. If you have a two column index (u1,u2) then it may be used at the column u1 as it cannot be used at column u2. If you have two indexes separate for u1 and u2 probably both of them will be used. The choice comes from statistics based on how many rows are expected to be returned. If returned rows expected few an index seek will be selected, if the appropriate index is available. If the number is high a scan is preferable, either table or index.
At the second case again both columns need to be checked again, but within each search there are two sub-searches where the second sub-search will be upon the results of the first one, due to the AND condition. Here it matters more and two indexes u1 and u2 will help as any field chosen to be searched first will have an index. The choice to use an index is like i describe above.
In either case however every OR will force 1 more search or set of searches. So the proposed solution of breaking using union does not hinder more as the table will be searched x times no matter 1 select with OR(s) or x selects with union and no matter index selection and type of search (seek or scan). As a result, since each select at the union get its own execution plan part, it is more likely that (single column) indexes will be used and finally get all row result sets from all parts around the OR(s). If you do not want to copy a large select statement to many unions you may get the primary key values and then select those or use a view to be sure the majority of the statement is in one place.
Finally, if you exclude the union option, there is a way to trick the optimizer to use a single index. Create a double index u1,u2 (or u2,u1 - whatever column has higher cardinality goes first) and modify your statement so all OR parts use all columns:
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
will be converted to:
... WHERE ((user_1 = '$user_id' and user_2=user_2) OR (user_1=user_1 and user_2 = '$user_id')) ...
This way a double index (u1,u2) will be used at all times. Please not that this will work if columns are nullable and bypassing this with isnull or coalesce may cause index not to be selected. It will work with ansi nulls off however.

Optimize query through the order of columns in index

I had a table that is holding a domain and id
the query is
select distinct domain
from user
where id = '1'
the index is using the order idx_domain_id is faster than idx_id_domain
if the order of the execution is
(FROM clause,WHERE clause,GROUP BY clause,HAVING clause,SELECT
clause,ORDER BY clause)
then the query should be faster if it use the sorted where columns than the select one.
at 15:00 to 17:00 it show the same query i am working on
https://serversforhackers.com/laravel-perf/mysql-indexing-three
the table has a 4.6 million row.
time using idx_domain_id
time after change the order
This is your query:
select distinct first_name
from user
where id = '1';
You are observing that user(first_name, id) is faster than user(id, firstname).
Why might this be the case? First, this could simply be an artifact of how your are doing the timing. If your table is really small (i.e. the data fits on a single data page), then indexes are generally not very useful for improving performance.
Second, if you are only running the queries once, then the first time you run the query, you might have a "cold cache". The second time, the data is already stored in memory, so it runs faster.
Other issues can come up as well. You don't specify what the timings are. Small differences can be due to noise and might be meaningless.
You don't provide enough information to give a more definitive explanation. That would include:
Repeated timings run on cold caches.
Size information on the table and the number of matching rows.
Layout information, particularly the type of id.
Explain plans for the two queries.
select distinct domain
from user
where id = '1'
Since id is the PRIMARY KEY, there is at most one row involved. Hence, the keyword DISTINCT is useless.
And the most useful index is what you already have, PRIMARY KEY(id). It will drill down the BTree to find id='1' and deliver the value of domain that is sitting right there.
On the other hand, consider
select distinct domain
from user
where something_else = '1'
Now, the obvious index is INDEX(something_else, domain). This is optimal for the WHERE clause, and it is "covering" (meaning that all the columns needed by the query exist in the index). Swapping the columns in the index will be slower. Meanwhile, since there could be multiple rows, DISTINCT means something. However, it is not the logical thing to use.
Concerning your title question (order of columns): The = columns in the WHERE clause should come first. (More details in the link below.)
DISTINCT means to gather all the rows, then de-duplicate them. Why go to that much effort when this gives the same answer:
select domain
from user
where something_else = '1'
LIMIT 1
This hits only one row, not all the 1s.
Read my Indexing Cookbook.
(And, yes, Gordon has a lot of good points.)

Need some clarification on indexes (WHERE, JOIN)

We are facing some performance issues in some reports that work on millions of rows. I tried optimizing sql queries, but it only reduces the time of execution to half.
The next step is to analyse and modify or add some indexes, therefore i have some questions:
1- the sql queries contain a lot of joins: do i have to create an index for each foreignkey?
2- Imagine the request SELECT * FROM A LEFT JOIN B on a.b_id = b.id where a.attribute2 = 'someValue', and we have an index on the table A based on b_id and attribute2: does my request use this index for the where part ( i know if the two conditions were on the where clause the index will be used).
3- If an index is based on columns C1, C2 and C3, and I decided to add an index based on C2, do i need to remove the C2 from the first index?
Thanks for your time
You can use EXPLAIN query to see what MySQL will do when executing it. This helps a LOT when trying to figure out why its slow.
JOIN-ing happens one table at a time, and the order is determined by MySQL analyzing the query and trying to find the fastest order. You will see it in the EXPLAIN result.
Only one index can be used per JOIN and it has to be on the table being joined. In your example the index used will be the id (primary key) on table B. Creating an index on every FK will give MySQL more options for the query plan, which may help in some cases.
There is only a difference between WHERE and JOIN conditions when there are NULL (missing rows) for the joined table (there is no difference at all for INNER JOIN). For your example the index on b_id does nothing. If you change it to an INNER JOIN (e.g. by adding b.something = 42 in the where clause), then it might be used if MySQL determines that it should do the query in reverse (first b, then a).
No.. It is 100% OK to have a column in multiple indexes. If you have an index on (A,B,C) and you add another one on (A) that will be redundant and pointless (because it is a prefix of another index). An index on B is perfectly fine.