I have a MySQL table which I'm trying to search a pair of columns for a single value. It's quite a large table, so I want the search time as fast as possible.
I have simplified the tables below for ease of understanding
SELECT * FROM clients WHERE name=? OR sirname=?
VS
SELECT * FROM clients WHERE ? IN (name, sirname)
with indexes on name and sirname
EXPLAIN on the former uses the indexes, but not on the latter
Is this accurate, or is there some optimisation going on under the hood which EXPLAIN doesn't catch?
Strongly related to Checking multiple columns for one value, but cannot discuss there due to age of thread.
Because MySQL generally uses one index per table reference in a query, you will probably have to do it this way:
SELECT * FROM clients WHERE name=?
UNION
SELECT * FROM clients WHERE sirname=?
This will count as two table references for purposes of index selection. The appropriate index will be used in each case.
Related
I just read part of an optimization article and segfaulted on the following statement:
When using SQL replace statements using OR with a UNION:
select username from users where company = ‘bbc’ or company = ‘itv’;
to:
select username from users where company = ‘bbc’ union
select username from users where company = ‘itv’;
From a quick EXPLAIN:
Using OR:
Using UNION:
Doesn't this mean UNION does in double the work?
While I appreciate UNION may be more performant for certain RDBMSes and certain table schemas, this is not categorically true as the author suggestions.
Question
Am I wrong?
Either the article you read used a bad example, or you misinterpreted their point.
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company for this query just fine. There's no need to do any UNION.
The more tricky case is where you have an OR condition that involves two different columns.
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company and a separate index on city. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company, it would still have to do a table-scan to find rows where city is London. If it uses the index on city, it would have to do a table-scan for rows where company is bbc.
The UNION solution is for this type of case.
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profiler and compare their performance.
Those are not the same query.
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
select distinct username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union
select username from users where company = ‘itv’;
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
select username from users where company = ‘bbc’ or company = ‘itv’;
and
select username from users where company = ‘bbc’
union all
select username from users where company = ‘itv’;
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
Also, UNION has some overhead, since it creates a reset set (no duplicates).
Each statement in the UNION should execute pretty quickly if company is indexed... not sure it'd really be doing double the work.
Bottom line
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
Update
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
select username from users where company in ('bbc', 'itv');
This my benchmark result
When use UNION - Query took 13.8699 seconds
row examined primary select type - 247685
when use OR - Query took 0.0126 seconds and row examined primary
select type - 495371
MySQL uses one index for a query, so when we are using or then mysql use one column index and scan full table for another column
another part union same work can 2 times
that's why or is faster then union
In almost all cases, the union or union all version is going to do two full table scans of the users table.
The or version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
The original statement just seems wrong, for just about any database and any situation.
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
It depends of the table's structure and needs generally but in large tables union gave to me better results.
What is the best way to create index when I have a query like this?
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
I know that only one index can be used in a query so I can't create two indexes, one for user_1 and one for user_2.
Also could solution for this type of query be used for this query?
WHERE ((user_1 = '$user_id' AND user_2 = '$friend_id') OR (user_1 = '$friend_id' AND user_2 = '$user_id'))
MySQL has a hard time with OR conditions. In theory, there's an index merge optimization that #duskwuff mentions, but in practice, it doesn't kick in when you think it should. Besides, it doesn't give as performance as a single index when it does.
The solution most people use to work around this is to split up the query:
SELECT ... WHERE user_1 = ?
UNION
SELECT ... WHERE user_2 = ?
That way each query will be able to use its own choice for index, without relying on the unreliable index merge feature.
Your second query is optimizable more simply. It's just a tuple comparison. It can be written this way:
WHERE (user_1, user_2) IN (('$user_id', '$friend_id'), ('$friend_id', '$user_id'))
In old versions of MySQL, tuple comparisons would not use an index, but since 5.7.3, it will (see https://dev.mysql.com/doc/refman/5.7/en/row-constructor-optimization.html).
P.S.: Don't interpolate application code variables directly into your SQL expressions. Use query parameters instead.
I know that only one index can be used in a query…
This is incorrect. Under the right circumstances, MySQL will routinely use multiple indexes in a query. (For example, a query JOINing multiple tables will almost always use at least one index on each table involved.)
In the case of your first query, MySQL will use an index merge union optimization. If both columns are indexed, the EXPLAIN output will give an explanation along the lines of:
Using union(index_on_user_1,index_on_user_2); Using where
The query shown in your second example is covered by an index on (user_1, user_2). Create that index if you plan on running those queries routinely.
The two cases are different.
At the first case both columns needs to be searched for the same value. If you have a two column index (u1,u2) then it may be used at the column u1 as it cannot be used at column u2. If you have two indexes separate for u1 and u2 probably both of them will be used. The choice comes from statistics based on how many rows are expected to be returned. If returned rows expected few an index seek will be selected, if the appropriate index is available. If the number is high a scan is preferable, either table or index.
At the second case again both columns need to be checked again, but within each search there are two sub-searches where the second sub-search will be upon the results of the first one, due to the AND condition. Here it matters more and two indexes u1 and u2 will help as any field chosen to be searched first will have an index. The choice to use an index is like i describe above.
In either case however every OR will force 1 more search or set of searches. So the proposed solution of breaking using union does not hinder more as the table will be searched x times no matter 1 select with OR(s) or x selects with union and no matter index selection and type of search (seek or scan). As a result, since each select at the union get its own execution plan part, it is more likely that (single column) indexes will be used and finally get all row result sets from all parts around the OR(s). If you do not want to copy a large select statement to many unions you may get the primary key values and then select those or use a view to be sure the majority of the statement is in one place.
Finally, if you exclude the union option, there is a way to trick the optimizer to use a single index. Create a double index u1,u2 (or u2,u1 - whatever column has higher cardinality goes first) and modify your statement so all OR parts use all columns:
... WHERE (user_1 = '$user_id' OR user_2 = '$user_id') ...
will be converted to:
... WHERE ((user_1 = '$user_id' and user_2=user_2) OR (user_1=user_1 and user_2 = '$user_id')) ...
This way a double index (u1,u2) will be used at all times. Please not that this will work if columns are nullable and bypassing this with isnull or coalesce may cause index not to be selected. It will work with ansi nulls off however.
In my script, I have a lot of SQL WHERE clauses, e.g.:
SELECT * FROM cars WHERE active=1 AND model='A3';
SELECT * FROM cars WHERE active=1 AND year=2017;
SELECT * FROM cars WHERE active=1 AND brand='BMW';
I am using different SQL clauses on same table because I need different data.
I would like to set index key on table cars, but I am not sure how to do it. Should I set separate keys for each column (active, model, year, brand) or should I set keys for groups (active,model and active,year and active,brand)?
WHERE a=1 AND y='m'
is best handled by INDEX(a,y) in either order. The optimal set of indexes is several pairs like that. However, I do not recommend having more than a few indexes. Try to limit it to queries that users actually make.
INDEX(a,b,c,d):
WHERE a=1 AND b=22 -- Index is useful
WHERE a=1 AND d=44 -- Index is less useful
Only the "left column(s)" of an index are used. Hence the second case, uses a, but stops because b is not in the WHERE.
You might be tempted to also have (active, year, model). That combination works well for active AND year, active AND year AND model, but not active AND model (but no year).
More on creating indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Since model implies a make, there is little use to put both of those in the same composite index.
year is not very selective, and users might want a range of years. These make it difficult to get an effective index on year.
How many rows will you have? If it is millions, we need to work harder to avoid performance problems. I'm leaning toward this, but only because the lookup would be more compact.
We use single indexing when we want to query for just one column, same asin your case and multiple group indexing when we have multiple condition in the same where clause.
Go for single indexing.
For more detailed explanation, refer this article: https://www.sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
I have the following question. I have two tables and I want to store them separately. However, in my application I need to regularly perform UNION operation on them, so I effectively need to treat them as one. I think this is not good in terms of performance. I was thinking about creating the following view:
CREATE VIEW merged_tables(fields from both tables) AS
SELECT * FROM table_a
UNION
SELECT * FROM table_b
How does it impact on the performance of SELECT query? Does it have any real impact (inner representation is different) or it is just a matter of using a simpler query to select?
Using a UNION inside a view will be no different in performance than using the UNION in a discrete SELECT statement. Since you are using SELECT * on both tables and require no inner complexity (e.g. no JOINS or WHERE clauses), there won't really be any way to further optimize it.
However, if the tables do hold similar data and you want to keep them logically separated, you might consider storing it all in one table with a boolean column that indicates whether a row would otherwise have been a resident of table_a or table_b. You gain a way to tell the rows apart and avoid the added confusion of the UNION then, and performance isn't significantly impacted either.
It is just a matter of using a simpler query to select. There will be no speed difference, however the cost of querying the union should not be much worse (in most cases) than if you kept all the data in a single table.
If the tables are really the same structure, you could also consider another design in which you used a single table for storage and two views to logically separate the records:
CREATE VIEW table_a AS SELECT * FROM table_all WHERE rec_type = 'A'
CREATE VIEW table_b AS SELECT * FROM table_all WHERE rec_type = 'B'
Because these are single-table, non-aggregated VIEWs you can use them like tables in INSERT, UPDATE, DELETE, and SELECT but you have the advantage of also being able to program against table_all when it makes sense. The advantage here over your solution is that you can update against either the table_a, table_b entities or the table_all entity, and you don't have two physical tables to maintain.
I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html