How do I create one MySQL index for 2 SQL queries? - mysql

SELECT * FROM messages_messages WHERE (from_user_id=? AND to_user_id=?) OR (from_user_id=? AND to_user_id=?) ORDER BY created_at DESC
I have another query, which is this:
SELECT COUNT(*) FROM messages_messages WHERE from_user_id=? AND to_user_id=? AND read_at IS NULL
I want to index both of these queries, but I don't want to create 2 separate indexes.
Right now, I'm using 2 indexes:
[from_user_id, to_user_id, created_at]
[from_user_id, to_user_id, read_at]
I was wondering if I could do this with one index instead of 2?
These are the only 2 queries I have for this table.

The docs explain fairly completely how MySQL uses indices. In particular, its optimizer can use any left prefix of a multi-column index. Therefore, you could drop either of your two existing indices, and the other would be eligible for use in both queries, though it would be more selective / useful for one than for the other.
In principle, it could be more beneficial to keep your first index, provided that the created_at column was indexed in descending order. In practice, MySQL allows you to specify index column order, but in fact implements only ascending order. Therefore, having created_at in your index probably doesn't help very much.

No, you need both indexes for these two queries if you want to optimize fully.
Once you reach the column used for either sorting or range comparison (IS [NOT] NULL counts as a range predicate for this purpose), you don't get any benefit from putting more columns in the index. In other words, your index can have:
Some columns that are used in equality predicates
One column that is used either in a range predicate, or to avoid a filesort -- but not both.
Extra columns used in neither searching nor sorting, but only for the sake of a covering index.
So you cannot make a four-column index that serves both queries.
The only way you can reduce this to one index, as #JohnBollinger says, is to make an index that optimizes for one query, and uses a subset of the index for the second query. But that won't work as well.

Related

how to optimize this sql when use order by select

select id, col1,col2,col3,seq
from `table`
order by seq asc
i have already created index on 'seq', but i found that it doesn't use the index and use filesort when selecting , because the col1 may save some large data ,so i don't want to create the covering index in this table, so it is have some solutions to optimize this sql or table or index, thanks ,my English is not good😂😂😂
The SQL query optimizer apparently estimated the cost of using the index and concluded that it would be better to just do a table-scan and use a filesort of the result.
There is overhead to using a non-covering index. It reads the index in sorted order, but then has to dereference the primary key to get the other columns not covered by the row.
By analogy, this is like reading a book by looking up every word in alphabetical order in the index at the back, and then flipping back to the respect pages of the book, one word at a time. Time-consuming, but it's one way of reading the book in order by keyword.
That said, a filesort has overhead too. The query engine has to collect matching rows, and sort them manually, potentially using temporary files. This is expensive if the number of rows is large. You haven't described the size of your table in number of rows.
If the table you are testing has a small number of rows, the optimizer might have reasoned that it would be quick enough to do the filesort, so it would be unnecessary to read the rows by the index.
Testing with a larger table might give you different results from the optimizer.
The query optimizer makes the right choice in the majority of cases. But it's not infallible. If you think forcing it to use the index is better in this case, you can use the FORCE INDEX hint to make it believe that a table-scan is prohibitively expensive. Then if the index is usable at all, it'll prefer the index.
select id, col1,col2,col3,seq
from `table` FORCE INDEX(seq)
order by seq asc

how to order column in Multi-columns index for best performance in Mysql

Let's say I have transactions table in a mysql database, I want to create a multi-column index on 3 columns reference, kind and status.
I have this request that I am trying to speed up :
Transaction.where(parent_ref: merchant_ref, kind: 'OFFER',status: 1) which performs the following SQL :
SELECT `merchant_transactions`.* FROM `merchant_transactions`
WHERE `merchant_transactions`.`parent_ref` = '1-0001'
AND `merchant_transactions`.`kind` = 'BATCH_BET'
AND `merchant_transactions`.`status` = 1
The parent_ref column can take a really wide variety of values so if I have 1M records in that table I will have 500K different references. Status can only take 6 different values and kind only 3.
What will be the best order for the columns in my index for optimal performance.
Does the spread of values in my columns have an impact ? intuitively I would say that I would need to start with the column with the lowest spread of values. In that example I would thus do index(kind, status, reference).
Are there any other factors related to the values in my tables to take into account when figuring out the order of columns for my index ?
Okay, now that you've shared the query, we can see that you reference all three columns in your WHERE clause, all three predicates are doing equality comparisons, and the expression in the WHERE clause uses only AND operations.
There are no more exotic parts of the query like JOIN, GROUP BY, ORDER BY, DISTINCT, etc. to complicate the optimization of this query.
Given these conditions, my experience is that the order of columns hardly matters. If there's any difference, it's barely perceptible.
I'd put the column that is unique first, based on some assumption that it's most selective and therefore narrows down the search most effectively. But I'm not sure it would make any noticeable difference either way.
In your example, each of 3 columns is tested with =, and they are ANDd together. So build a 3-column composite with those 3 columns. The order of the columns will not matter for this query. Contrary to what others may say, "cardinality" of the individual columns does not matter in a composite INDEX.
See my indexing cookbook

MySQL: required indexes for multi-column ordering

Having a 10+ million table with three columns: one, two, three and SQL query like SELECT * FROM table ORDER BY one, two, three LIMIT 1 - do I really need to create a multi-column index using all three columns?
I know for sure that if one and two matches, there would be max 10 rows with distinct three.
Is it enough for fast SELECTs? -
CREATE INDEX MY_INDEX ON table (one, two);
With INDEX(one, two, three), the query will go straight down the BTree to the one (LIMIT 1) desired row.
With INDEX(one, two), the query will go straight down the BTree to the first such row, then scan forward the up-to-10 rows, save them to a tmp table, sort them (ORDER BY includes three) (probably done in memory), and deliver the first one. Although this sounds more complex it will not (in this example) be much slower.
It will not be a "table scan" ("ALL"), but perhaps a "range" scan. Use EXPLAIN SELECT ... to see.
If three is a bulky string, then the 3-col index will be bulkier; this has some impact on disk space and performance.
If you need only (one, two) for some other queries, then either index works reasonably well (barring the "bulky" comment).
If you do SELECT one, two, three FROM ..., the 3-part index will be better because it will be "covering". SELECT * won't have such a bonus.
Bottom line: Either index is "OK", many other factors factor in, making it hard to say for sure what to do.
You might think MySQL is clever enough to only read at most the first 10 rows using the index and then sort these. Unfortunately, it isn't (because the optimizer doesn't regard the limit at this point). You can verify that by using explain select ..., it will show that MySQL will do a full table scan ("ALL").
The documentation describes conditions to be able to use an index to optimize order by:
The index can also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause.
Your third column does not satisfy this. So this query will not use this index (which does not mean that it might not be usefull for something else).
Since MySQL 5.6, there is however the so-called filesort priority queue optimization to accommodate the limit: while MySQL will still read the whole table, it will not sort the whole table (which would be a time consuming process), but will stop when it knows what the first row will be, which makes your query acceptable fast.
But you can rewrite your query to do exactly what you are thinking of:
SELECT * FROM
(select * from table ORDER BY one, two LIMIT 10) sub
order by one, two, three limit 1;
This will read the first 10 rows using that index, and then just sort these. It will of course only work correctly if you are absolutely sure you will only have at most 10 rows.
A more general way to optimize your query independently from knowing the maximum number of possible rows would be e.g.
SELECT * FROM table
where one = (select min(one) from table)
order by one, two, three limit 1;
This will use the index to reduce the number of rows that have to be read and filesorted by looking up the lowest value for one first (using the index) and only considering these rows. You can similarly include a condition for two.
Or you can simply can use all three columns in your index (although depending on the size of your third column, it can make sense to not do this). These kind of optimizations tend to catch up at one point. If you e.g. use the first method, and in 2 year there will be 11 rows possible, you (or your successor) will have to remember that you have this implied condition in your code.

(Why) Can't MySQL use index in such cases?

1 - PRIMARY used in a secondary index, e.g. secondary index on (PRIMARY,column1)
2 - I'm aware mysql cannot continue using the rest of an index as soon as one part was used for a range scan, however: IN (...,...,...) is not considered a range, is it? Yes, it is a range, but I've read on mysqlperformanceblog.com that IN behaves differently than BETWEEN according to the use of index.
Could anyone confirm those two points? Or tell me why this is not possible? Or how it could be possible?
UPDATE:
Links:
http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/
http://www.mysqlperformanceblog.com/2006/08/14/mysql-followup-on-union-for-query-optimization-query-profiling/comment-page-1/#comment-952521
UPDATE 2: example of nested SELECT:
SELECT * FROM user_d1 uo
WHERE EXISTS (
SELECT 1 FROM `user_d1` ui
WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
AND ui.id=uo.id
)
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
So, the outer SELECT uses timestamp_lastonline for sorting, the inner either PK to connect with the outer or birthdate for filtering.
What other options rather than this query are there if MySQL cannot use index on a range scan and for sorting?
The column(s) of the primary key can certainly be used in a secondary index, but it's not often worthwhile. The primary key guarantees uniqueness, so any columns listed after it cannot be used for range lookups. The only time it will help is when a query can use the index alone
As for your nested select, the extra complication should not beat the simplest query:
SELECT * FROM user_d1 uo
WHERE uo.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
MySQL will choose between a birthdate index or a timestamp_lastonline index based on which it feels will have the best chance of scanning fewer rows. In either case, the column should be the first one in the index. The birthdate index will also carry a sorting penalty, but might be worthwhile if a large number of recent users will have birth dates outside of that range.
If you wish to control the order, or potentially improve performance, a (timestamp_lastonline, birthdate) or (birthdate, timestamp_lastonline) index might help. If it doesn't, and you really need to select based on the birthdate first, then you should select from the inner query instead of filtering on it:
SELECT * FROM (
SELECT * FROM user_d1 ui
WHERE ui.birthdate BETWEEN '1990-05-04' AND '1991-05-04'
) as uo
ORDER BY uo.timestamp_lastonline DESC
LIMIT 20
Even then, MySQL's optimizer might choose to rewrite your query if it finds a timestamp_lastonline index but no birthdate index.
And yes, IN (..., ..., ...) behaves differently than BETWEEN. Only the latter can effectively use a range scan over an index; the former would look up each item individually.
2.IN will obviously differ from BETWEEN. If you have an index on that column, BETWEEN will need to get the starting point and it's all done. If you have IN, it will look for a matching value in the index value by value thus it will look for the values as many times as there are values compared to BETWEEN's one time look.
yes #Andrius_Naruševičius is right the IN statement is merely shorthand for EQUALS OR EQUALS OR EQUALS has no inherent order whatsoever where as BETWEEN is a comparison operator with an implicit greater than or less than and therefore absolutely loves indexes
I honestly have no idea what you are talking about, but it does seem you are asking a good question I just have no notion what it is :-). Are you saying that a primary key cannot contain a second index? because it absolutely can. The primary key never needs to be indexed because it is ALWAYS indexed automatically, so if you are getting an error/warn (I assume you are?) about supplementary indices then it's not the second, third index causing it it's the PRIMARY KEY not needing it, and you mentioning that probably is the error. Having said that I have no idea what question you asked - it's my answer to my best guess as to your actual question.

Mysql: combine these two indexes into one?

I have the following query:
SELECT * FROM items
WHERE collection_id = 10
ORDER BY item_order ASC,id DESC
LIMIT 25
Right now I have two indexes, one on collection_id,id and another on collection_id,item_order.
item_order can be null if the user has not specified an order for the items, in which case I want them sorted by id.
Is my index setup optimal, or is there a way to have one three column index that handles both sorting by id and item_order? It seem redundant to index the "collection_id" column two times..
The optimal index for this query is (collection_id,id,item_order).
MySQL will only use one index per table per query, and it looks for matching indexes by order of columns in the query. The easiest way to determine what an index should look like for this query is by looking at the WHERE conditions followed by the ORDER BY conditions.
When in doubt, use EXPLAIN liberally and make sure it's not unnecessarily creating temporary tables or using filesort.
Using EXPLAIN before a select statement will tell you which of your indexes it is using. The official documentation is here:
MySQL 5: Using EXPLAIN
A good tutorial is here:
Optimizing MySQL Queries and Indexes
For the query above, the ideal index will be along the lines of (collection_id, item_order, id).
Indexing the same column multiple times is by no means a waste of time - so long as you don't end up with two identical indexes, or indexes which are never used.