I have a table with 3 columns. This table contains many raws (millions). When I select rows from the table I frequently use the following where clauses:
where column2=value1 and column3=value2
where column1=value
To speed up the select query I want to declare column1 and column2 as indexes. My questions is if declaring the second column as an index will not reduce the positive effect of declaring the first column as index.
I also would like to ask if declaring the second column as index will speed up the queries of this type: where column2=value1 and column3=value2.
ADDED
The column1, column2, and column3 are entity, attribute, value. It's very general. As entities I use person, movies, cities, countries and so on. Attributes are things like: "located in", "date of birth", "produced by".
You should create indexes that support your queries. In this case you want to create an index on column2,column3 together (not two separate indexes, but one index for the combination of columns) to support the first query, and another on column1 to support the second query. More generally, if a query uses a set of columns, adding an index for all those columns will speed it up (although there are many exceptions, of course).
An index on column2 would speed up the query column2=value1 and column1=value2, and so would an index on column2,column3 (the important thing is that column2 is the first column in the index).
When working with indexes the EXPLAIN keyword is very useful. Prefix your queries with EXPLAIN (e.g. EXPLAIN SELECT * FROM table) to get a description of how the database is going to perform your query. It will tell you if it's going to use an index, and in that case which.
Seems like neither of your plans are going to work. Based on both of the where clauses I would suggest having the primary key on column1 and a second index column2,column3. This would speed up both of your queries.
Related
I have a table with two fields: a,b
Both fields are indexed separately -- no compound index.
While trying to run a select query with both fields:
select * from table where a=<sth> and b=<sth>
It took over 400ms. while
select * from table where a=<sth>
took only 30ms;
Do I need set a compound index for (a,b)?
Reasonably, if I have indexes on both a and b, it should be fast for queries of a AND b like above right?
For this query:
select *
from table
where a = <sth> and b = <sth>;
The best index is on table(a, b). This can also be used for your second query as well.
Usually (but not always).
In your case the number of different values in a (and b) and the number of columns you use in your select can change the way db decide to use index / table.
For example,
if in table you have,say, 100.000 records and 80.000 of them have the same value for a, when you query for:
SELECT * FROM table WHERE a=<your value>
db engine could decide to "scan" directly the table without using the index, while if you query
SELECT a, b FROM table WHERE a=<your value>
and in index you added column b too (in index directly or with INCLUDE) it's quite probable that db engine will use the index.
Try to give a look on internet for index tips and give a look too to How can I index these queries?
The SQLite documentation explains how index lookups work.
Once the database has used an index to look up some rows, the other index is no longer efficient to use (there is no easy method to filter the results of the first lookup because the other index refers to rows in the original table, not to entries in the first index). See Multiple AND-Connected WHERE-Clause Terms.
To make index lookups on two columns as fast as possible, you need Multi-Column Indices.
I have a table in MySQL with two columns
id int(11) unsigned NOT NULL AUTO_INCREMENT,
B varchar(191) CHARACTER SET utf8mb4 DEFAULT NULL,
The id being the PK.
I need to do a lookup in a query using either one of these. id in (:idList) or B in (:bList)
Would this query perform better if, there is a composite index with these two columns in them?
No, it will not.
Indexes can be used to look up values from the leftmost columns in an index:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
So, if you have a composite index on id, B fields (in this order), then the index can be used to look up values based on their id, or a combination of id and B values. But cannot be used to look up values based on B only. However, in case of an or condition that's what you need to do: look up values based on B only.
If both fields in the or condition are leftmost fields in an index, then MySQL attempts to do an index merge optimisation, so you may actually be better off having separate indexes for these two fields.
Note: if you use innodb table engine, then there is no point in adding the primary key to any multi column index because innodb silently adds the PK to every index.
For OR I dont think so.
Optimizer will try to find a match in the first side, if fail will try the second side. So Individual index for each search will be better.
For AND a composite index will help.
MySQL index TIPS
Of course you can always add the index and compare the explain plan.
MySQL Explain Plan
The trick for optimizing OR is to use UNION. (At least, it works well in some cases.)
( SELECT ... FROM ... WHERE id IN (...) )
UNION DISTINCT
( SELECT ... FROM ... WHERE B IN (...) )
Notes:
Need separate indexes on id and B.
No benefit from any composite index (unless it is also "covering").
Change DISTINCT to ALL if you know that there won't be any rows found by both the id and B tests. (This avoids a de-dup pass.)
If you need ORDER BY, add it after the SQL above.
If you need LIMIT, it gets messier. (This is probably not relevant for IN, but it often is with ORDER BY.)
If the rows are 'wide' and the resultset has very few rows, it may be further beneficial to do
Something like this:
SELECT t...
FROM t
JOIN (
( SELECT id FROM t WHERE id IN (...) )
UNION DISTINCT
( SELECT id FROM t WHERE B IN (...) )
) AS u USING(id);
Notes:
This needs PRIMARY KEY(id) and INDEX(B, id). (Actually there is no diff, as Michael pointed out.)
The UNION is cheaper here because of collecting only id, not the bulky columns.
The SELECTs in the UNION are faster because you should be able to provide "covering" indexes.
ORDER BY would go at the very end.
I've just heard the term covered index in some database discussion - what does it mean?
A covering index is an index that contains all of, and possibly more, the columns you need for your query.
For instance, this:
SELECT *
FROM tablename
WHERE criteria
will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.
However, if the index contained the columns column1, column2 and column3, then this sql:
SELECT column1, column2
FROM tablename
WHERE criteria
and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.
This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.
Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.
Covering index is just an ordinary index. It's called "covering" if it can satisfy query without necessity to analyze data.
example:
CREATE TABLE MyTable
(
ID INT IDENTITY PRIMARY KEY,
Foo INT
)
CREATE NONCLUSTERED INDEX index1 ON MyTable(ID, Foo)
SELECT ID, Foo FROM MyTable -- All requested data are covered by index
This is one of the fastest methods to retrieve data from SQL server.
Covering indexes are indexes which "cover" all columns needed from a specific table, removing the need to access the physical table at all for a given query/ operation.
Since the index contains the desired columns (or a superset of them), table access can be replaced with an index lookup or scan -- which is generally much faster.
Columns to cover:
parameterized or static conditions; columns restricted by a parameterized or constant condition.
join columns; columns dynamically used for joining
selected columns; to answer selected values.
While covering indexes can often provide good benefit for retrieval, they do add somewhat to insert/ update overhead; due to the need to write extra or larger index rows on every update.
Covering indexes for Joined Queries
Covering indexes are probably most valuable as a performance technique for joined queries. This is because joined queries are more costly & more likely then single-table retrievals to suffer high cost performance problems.
in a joined query, covering indexes should be considered per-table.
each 'covering index' removes a physical table access from the plan & replaces it with index-only access.
investigate the plan costs & experiment with which tables are most worthwhile to replace by a covering index.
by this means, the multiplicative cost of large join plans can be significantly reduced.
For example:
select oi.title, c.name, c.address
from porderitem poi
join porder po on po.id = poi.fk_order
join customer c on c.id = po.fk_customer
where po.orderdate > ? and po.status = 'SHIPPING';
create index porder_custitem on porder (orderdate, id, status, fk_customer);
See:
http://literatejava.com/sql/covering-indexes-query-optimization/
Lets say you have a simple table with the below columns, you have only indexed Id here:
Id (Int), Telephone_Number (Int), Name (VARCHAR), Address (VARCHAR)
Imagine you have to run the below query and check whether its using index, and whether performing efficiently without I/O calls or not. Remember, you have only created an index on Id.
SELECT Id FROM mytable WHERE Telephone_Number = '55442233';
When you check for performance on this query you will be dissappointed, since Telephone_Number is not indexed this needs to fetch rows from table using I/O calls. So, this is not a covering indexed since there is some column in query which is not indexed, which leads to frequent I/O calls.
To make it a covered index you need to create a composite index on (Id, Telephone_Number).
For more details, please refer to this blog:
https://www.percona.com/blog/2006/11/23/covering-index-and-prefix-indexes/
If I execute this query:
SELECT * FROM table1 WHERE name LIKE '%girl%'
It returns all records where name contains 'girl'. However, because of the first wildcard % in the LIKE statment, it cannot (or does not) use indexes as stated here: Mysql Improve Search Performance with wildcards (%%)
Then I changed the query to:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
On the leftside of the OR I removed the wildcard so it can use indexes. But the performance win depends on how MySQL evaluates the query.
Hence my question: Does the performance of my query increases when I add the OR statement?
No, the performance will be the same. MySQL still has to evaluate the first condition (LIKE '%girl%') because of the OR. Then it can evaluate the second condition using index. You can see this info when you EXPLAIN your query (mysql will show that it stills needs to do a full table scan, which means check each row):
EXPLAIN SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
For better performance for these kinds of queries you would need to use Fulltext indexes and special syntax for querying them. But FT indexes behave different and are not suited for everything.
(This answer provides a summary of the comments, plus contradicts some of the previous notes.)
Leading wildcard:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
SELECT * FROM table1 WHERE name LIKE '%girl%'
Either of those will do a table scan and ignore any indexes. This both because of the leading wild card and the OR. (It will not use the index for 'girl%', contrary to what #Marki555 says -- it's not worth the extra effort.)
Range query via LIKE (no leading wildcard):
SELECT * FROM table1 WHERE name LIKE 'girl%'
will probably use INDEX(name) in the following way:
Drill down the BTree for that index to the first name starting with "girl";
Scan forward (in the index) until the last row starting with "girl";
For each item in step 2, reach over into the data to get *.
Since Step 3 can be costly, the optimizer first estimates how many rows will need to be touched in Step 2. If more than 20% (approx) of the table, it will revert to a table scan. (Hence, my use of "probably".)
"Covering index":
SELECT name FROM table1 WHERE name LIKE '%girl%'
This will always use INDEX(name). That is because the index "covers". That is, all the columns in the SELECT are found in the INDEX. Since an INDEX looks and feels like a table, scanning the index is the best way to do the query. Since an index is usually smaller than the table, an index scan is usually faster than a table scan.
Here's a less obvious "covering index", but it applies only to InnoDB:
PRIMARY KEY(id)
INDEX(name)
SELECT id FROM table1 WHERE name LIKE '%girl%'
Every secondary key (name) in InnoDB implicitly includes the PK (id). Hence the index looks like (name, id). Hence all the columns in the SELECT are in the index. Hence it is a "covering index". Hence it will use the index and do an "index scan".
A "covering index" is indicated by Using index showing up in the EXPLAIN SELECT ....
I am new to MySQL, and I need to add indexes on an existing table (which contains roughly 200K rows).
Table mytable: (id:integer, created_time:timestamp, deleted_time:timestamp)
I have 2 queries which need to benefit from the index:
select s.id from mytable s
where s.completed_time is not null
and s.completed_time < ?
and ( s.deleted_time is null
or s.deleted_time >= ? );
and :
select s.id from mytable s
where
s.completed_time is not null
and (
( s.deleted_time is not null
and s.deleted_time >= ?
and s.deleted_time < ? )
or ( s.completed_time >= ?
and s.completed_time < ? ) ) ;
I am considering introducing a multi column index (on completed_time and deleted_time)
However, I am not sure if the condition "s.completed_time is not null" matches the criteria to make these queries use the composite index.
Do you have any thoughts about what is best (composite index or 2 indexes)? I am trying to use "explain" to figure out what's best but I am unsure on how to interpret the results.
And more generally: with a table having a composite index on (column1, column2), I understand that filtering on column2 only will not use the index.
But what if I introduce a dummy condition like (column1 > MIN_VALUE), or (column1 is not null) when it is correct to do so?
Thanks!
Assuming the two queries you mentionned will be "frequently" used, I would advise a composite index versus two distinct indexes on two columns.
As you already know, a query searching on two columns might sometimes use two separate indexes by (roughly) merging these two indexes into one. But this is sub-optimal, and has a cost in terms of performance.
Conversely, a composite index can only be used if the left-most columns are involved in the seach condition, or as the manual puts it:
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on
With regards to your suggested hack (introducing dummy conditions so as to be able to use the index), this might work, but I would rather advise creating a second index on column2 only (besides the two-column index on (column1, column2)). This comes at a (minor) cost, but is so much more elegant and reusable.
As for the suggestion of getting rid of NULL values, I strongly disagree. It is sematnically incorrect to use 0. 0 means "zero", NULL means "no value". All your tests would need to account for this special value, whereas IS NULL is standard and well understood everywhere. It is also just impractical in some situations (try to insert 0 with SQL_MODE='TRADITIONAL').
On the other hand, the performance gain is dubious (I believe this is mostly based on the false assumption that NULL values are not indexed). It is easy to verify that a query like s.completed_time IS NOT NULL will hit an index if such an index exists.
Provided you get rid of the nulls as mentioned by a user. A composite index might work for the first query.
But the second query the index may not work. - As it has an OR between your indexed columns.
Usually it goes by the Left-Most columns in the index and in that order.
I suggest creating separate indexes. Though it has an overhead (May be) of http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html