Can a query only use one index per table? - mysql

I have a query like this:
( SELECT * FROM mytable WHERE author_id = ? AND seen IS NULL )
UNION
( SELECT * FROM mytable WHERE author_id = ? AND date_time > ? )
Also I have these two indexes:
(author_id, seen)
(author_id, date_time)
I read somewhere:
A query can generally only use one index per table when process the WHERE clause
As you see in my query, there is two separated WHERE clause. So I want to know, "only one index per table" means my query can use just one of those two indexes or it can use one of those indexes for each subquery and both indexes are useful?
In other word, is this sentence true?
"always one of those index will be used, and the other one is useless"

That statement about only using one index is no longer true about MySQL. For instance, it implements the index merge optimization which can take advantage of two indexes for some where clauses that have or. Here is a description in the documentation.
You should try this form of your query and see if it uses index mer:
SELECT *
FROM mytable
WHERE author_id = ? AND (seen IS NULL OR date_time > ? );
This should be more efficient than the union version, because it does not incur the overhead of removing duplicates.
Also, depending on the distribution of your data, the above query with an index on mytable(author_id, date_time, seen) might work as well or better than your version.

UNION combines results of subqueries. Each subquery will be executed independent of others and then results will be merged. So, in this case WHERE limits are applied to each subquery and not to all united result.
In answer to your question: yes, each subquery can use some index.

There are cases when the database engine can use more indexes for one select statement, however when filtering one set of rows really it not possible. If you want to use indexing on two columns then build one index on both columns instead of two indexes.

Every single subquery or part of composite query is itself a query can be evaluated as single query for performance and index access .. you can also force the use of different index for eahc query .. In your case you are using union and these are two separated query .. united in a resulting query
. you can have a brief guide how mysql ue index .. acccessing at this guide
http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

Related

SQL gets slow on a simple query with ORDER BY

I have problem with MySQL ORDER BY, it slows down query and I really don't know why, my query was a little more complex so I simplified it to a light query with no joins, but it stills works really slow.
Query:
SELECT
W.`oid`
FROM
`z_web_dok` AS W
WHERE
W.`sent_eRacun` = 1 AND W.`status` IN(8, 9) AND W.`Drzava` = 'BiH'
ORDER BY W.`oid` ASC
LIMIT 0, 10
The table has 946,566 rows, with memory taking 500 MB, those fields I selecting are all indexed as follow:
oid - INT PRIMARY KEY AUTOINCREMENT
status - INT INDEXED
sent_eRacun - TINYINT INDEXED
Drzava - VARCHAR(3) INDEXED
I am posting screenshoots of explain query first:
The next is the query executed to database:
And this is speed after I remove ORDER BY.
I have also tried sorting with DATETIME field which is also indexed, but I get same slow query as with ordering with primary key, this started from today, usually it was fast and light always.
What can cause something like this?
The kind of query you use here calls for a composite covering index. This one should handle your query very well.
CREATE INDEX someName ON z_web_dok (Drzava, sent_eRacun, status, oid);
Why does this work? You're looking for equality matches on the first three columns, and sorting on the fourth column. The query planner will use this index to satisfy the entire query. It can random-access the index to find the first row matching your query, then scan through the index in order to get the rows it needs.
Pro tip: Indexes on single columns are generally harmful to performance unless they happen to match the requirements of particular queries in your application, or are used for primary or foreign keys. You generally choose your indexes to match your most active, or your slowest, queries. Edit You asked whether it's better to create specific indexes for each query in your application. The answer is yes.
There may be an even faster way. (Or it may not be any faster.)
The IN(8, 9) gets in the way of easily handling the WHERE..ORDER BY..LIMIT completely efficiently. The possible solution is to treat that as OR, then convert to UNION and do some tricks with the LIMIT, especially if you might also be using OFFSET.
( SELECT ... WHERE .. = 8 AND ... ORDER BY oid LIMIT 10 )
UNION ALL
( SELECT ... WHERE .. = 9 AND ... ORDER BY oid LIMIT 10 )
ORDER BY oid LIMIT 10
This will allow the covering index described by OJones to be fully used in each of the subqueries. Furthermore, each will provide up to 10 rows without any temp table or filesort. Then the outer part will sort up to 20 rows and deliver the 'correct' 10.
For OFFSET, see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

Optimizing a "distinct where equals" query and indices

I'm trying to optimize a query that looks something like
SELECT DISTINCT(some_attribute)
FROM some_table
WHERE soft_deleted=0
I already have indices on some_attribute and soft_deleted individually.
The table from which I am pulling from is relatively large(>100GB), so this query can take tens of minutes. Would a multi-column index on some_attribute and soft_deleted make a significant impact or are there some other optimizations that I can make?
We are going to assume this table is using InnoDB storage engine, and assume that soft_deleted column is integer-ish datatype, and that some_attribute column is a smallish datatype column.
For the exact query text shown in the question, optimal execution plan will likely make use of an index with soft_deleted and some_attribute as the leading columns in that order, i.e.
... ON some_table (soft_deleted, some_attribute, ...)
The index will also contain the columns from the cluster index (even if they aren't listed), so we could also include the names of those columns in the index following the two leading columns. MySQL will also be able to make use of an index that includes additional columns, again, following the two leading columns.
Use EXPLAIN to see the execution plan.
I expect the optimal execution plan will include "Using index for GROUP BY" in the Extra column, and avoid a "Using filesort" operation.
With the index suggested above, compare the execution plan for this query:
SELECT t.some_attribute
FROM some_table t
WHERE t.soft_deleted = 0
GROUP
BY t.soft_deleted
, t.some_attribute
ORDER
BY NULL
If we already have an index defined with some_attribute as the leading column, and also including the soft_deleted column, e.g.
... ON some_table (some_attribute, soft_deleted, ... )
(an index on just the some_attribute column would be redundant, and could be dropped)
we might re-write the SQL and check the EXPLAIN output for a query like this:
SELECT t.some_attribute
FROM some_table t
GROUP
BY t.some_attribute
, IF(t.soft_deleted = 0,1,0)
HAVING t.soft_deleted = 0
ORDER
BY NULL
If we have a guarantee that soft_deleted only has two distinct values, then we could simplify to just
SELECT t.some_attribute
FROM some_table t
GROUP
BY t.some_attribute
, t.soft_deleted
HAVING t.soft_deleted = 0
ORDER
BY NULL
Optimal performance of a query against this table, to return the specified resultset, is likely going to be found in an execution plan that avoids a "Using filesort" operation and using an index to satisfy the DISTINCT/GROUP BY operation.
Note that DISTINCT is a keyword not a function. The parens around some_attribute have no effect, and can be omitted. (Including the spurious parens almost makes it look like we think DISTINCT is a function.)

Index when using OR in query

What is the best way to create index when I have a query like this?
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
I know that only one index can be used in a query so I can't create two indexes, one for user_1 and one for user_2.
Also could solution for this type of query be used for this query?
WHERE ((user_1 = '$user_id' AND user_2 = '$friend_id') OR (user_1 = '$friend_id' AND user_2 = '$user_id'))
MySQL has a hard time with OR conditions. In theory, there's an index merge optimization that #duskwuff mentions, but in practice, it doesn't kick in when you think it should. Besides, it doesn't give as performance as a single index when it does.
The solution most people use to work around this is to split up the query:
SELECT ... WHERE user_1 = ?
UNION
SELECT ... WHERE user_2 = ?
That way each query will be able to use its own choice for index, without relying on the unreliable index merge feature.
Your second query is optimizable more simply. It's just a tuple comparison. It can be written this way:
WHERE (user_1, user_2) IN (('$user_id', '$friend_id'), ('$friend_id', '$user_id'))
In old versions of MySQL, tuple comparisons would not use an index, but since 5.7.3, it will (see https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html).
P.S.: Don't interpolate application code variables directly into your SQL expressions. Use query parameters instead.
I know that only one index can be used in a query…
This is incorrect. Under the right circumstances, MySQL will routinely use multiple indexes in a query. (For example, a query JOINing multiple tables will almost always use at least one index on each table involved.)
In the case of your first query, MySQL will use an index merge union optimization. If both columns are indexed, the EXPLAIN output will give an explanation along the lines of:
Using union(index_on_user_1,index_on_user_2); Using where
The query shown in your second example is covered by an index on (user_1, user_2). Create that index if you plan on running those queries routinely.
The two cases are different.
At the first case both columns needs to be searched for the same value. If you have a two column index (u1,u2) then it may be used at the column u1 as it cannot be used at column u2. If you have two indexes separate for u1 and u2 probably both of them will be used. The choice comes from statistics based on how many rows are expected to be returned. If returned rows expected few an index seek will be selected, if the appropriate index is available. If the number is high a scan is preferable, either table or index.
At the second case again both columns need to be checked again, but within each search there are two sub-searches where the second sub-search will be upon the results of the first one, due to the AND condition. Here it matters more and two indexes u1 and u2 will help as any field chosen to be searched first will have an index. The choice to use an index is like i describe above.
In either case however every OR will force 1 more search or set of searches. So the proposed solution of breaking using union does not hinder more as the table will be searched x times no matter 1 select with OR(s) or x selects with union and no matter index selection and type of search (seek or scan). As a result, since each select at the union get its own execution plan part, it is more likely that (single column) indexes will be used and finally get all row result sets from all parts around the OR(s). If you do not want to copy a large select statement to many unions you may get the primary key values and then select those or use a view to be sure the majority of the statement is in one place.
Finally, if you exclude the union option, there is a way to trick the optimizer to use a single index. Create a double index u1,u2 (or u2,u1 - whatever column has higher cardinality goes first) and modify your statement so all OR parts use all columns:
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
will be converted to:
... WHERE ((user_1 = '$user_id' and user_2=user_2) OR (user_1=user_1 and user_2 = '$user_id')) ...
This way a double index (u1,u2) will be used at all times. Please not that this will work if columns are nullable and bypassing this with isnull or coalesce may cause index not to be selected. It will work with ansi nulls off however.

Math calculations in MySql WHERE

Can math calculations be done in the WHERE portion of a MySQL statement?
For example, lets say I have the following SQL statement:
SELECT
employee_id,
max_hours,
sum(hours) AS total_hours
FROM
some_table
WHERE
total_hours < (max_hours * 1.5)
I looked around and found that MySQL does have math functions, but all the examples are in the SELECT portion of the statement.
You can use any (supported) arithmetic you like in a where or join clause, as long as the final result is a boolean (true, false or NULL (where null is treat as false).
This will usually mean indexes can not be used as their structure only allows their use for direct equality, inequality, or range lookups. In the example you gave there will be no useful index you could define so the query runner would be forced to perform a table scan. For simple filtering clauses referring to one table an index will only get used if one side is a constant (or a variable that is constant for the run time of the query).
With joining clauses an index might be used for one side of the match, if that side is a direct column reference (i.e. no arithmetic) though if the join is likely to cover many rows a scan may still be used as in index (or even table) scan can be quicker than a great many index seeks.
You might try something like this...
SELECT
employee_id,
max_hours,
SUM(hours)
FROM
some_table
GROUP BY
employee_id
HAVING
SUM(hours) < (max_hours * 1.5)

Mysql: combine these two indexes into one?

I have the following query:
SELECT * FROM items
WHERE collection_id = 10
ORDER BY item_order ASC,id DESC
LIMIT 25
Right now I have two indexes, one on collection_id,id and another on collection_id,item_order.
item_order can be null if the user has not specified an order for the items, in which case I want them sorted by id.
Is my index setup optimal, or is there a way to have one three column index that handles both sorting by id and item_order? It seem redundant to index the "collection_id" column two times..
The optimal index for this query is (collection_id,id,item_order).
MySQL will only use one index per table per query, and it looks for matching indexes by order of columns in the query. The easiest way to determine what an index should look like for this query is by looking at the WHERE conditions followed by the ORDER BY conditions.
When in doubt, use EXPLAIN liberally and make sure it's not unnecessarily creating temporary tables or using filesort.
Using EXPLAIN before a select statement will tell you which of your indexes it is using. The official documentation is here:
MySQL 5: Using EXPLAIN
A good tutorial is here:
Optimizing MySQL Queries and Indexes
For the query above, the ideal index will be along the lines of (collection_id, item_order, id).
Indexing the same column multiple times is by no means a waste of time - so long as you don't end up with two identical indexes, or indexes which are never used.