I am new to MySQL, and I need to add indexes on an existing table (which contains roughly 200K rows).
Table mytable: (id:integer, created_time:timestamp, deleted_time:timestamp)
I have 2 queries which need to benefit from the index:
select s.id from mytable s
where s.completed_time is not null
and s.completed_time < ?
and ( s.deleted_time is null
or s.deleted_time >= ? );
and :
select s.id from mytable s
where
s.completed_time is not null
and (
( s.deleted_time is not null
and s.deleted_time >= ?
and s.deleted_time < ? )
or ( s.completed_time >= ?
and s.completed_time < ? ) ) ;
I am considering introducing a multi column index (on completed_time and deleted_time)
However, I am not sure if the condition "s.completed_time is not null" matches the criteria to make these queries use the composite index.
Do you have any thoughts about what is best (composite index or 2 indexes)? I am trying to use "explain" to figure out what's best but I am unsure on how to interpret the results.
And more generally: with a table having a composite index on (column1, column2), I understand that filtering on column2 only will not use the index.
But what if I introduce a dummy condition like (column1 > MIN_VALUE), or (column1 is not null) when it is correct to do so?
Thanks!
Assuming the two queries you mentionned will be "frequently" used, I would advise a composite index versus two distinct indexes on two columns.
As you already know, a query searching on two columns might sometimes use two separate indexes by (roughly) merging these two indexes into one. But this is sub-optimal, and has a cost in terms of performance.
Conversely, a composite index can only be used if the left-most columns are involved in the seach condition, or as the manual puts it:
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on
With regards to your suggested hack (introducing dummy conditions so as to be able to use the index), this might work, but I would rather advise creating a second index on column2 only (besides the two-column index on (column1, column2)). This comes at a (minor) cost, but is so much more elegant and reusable.
As for the suggestion of getting rid of NULL values, I strongly disagree. It is sematnically incorrect to use 0. 0 means "zero", NULL means "no value". All your tests would need to account for this special value, whereas IS NULL is standard and well understood everywhere. It is also just impractical in some situations (try to insert 0 with SQL_MODE='TRADITIONAL').
On the other hand, the performance gain is dubious (I believe this is mostly based on the false assumption that NULL values are not indexed). It is easy to verify that a query like s.completed_time IS NOT NULL will hit an index if such an index exists.
Provided you get rid of the nulls as mentioned by a user. A composite index might work for the first query.
But the second query the index may not work. - As it has an OR between your indexed columns.
Usually it goes by the Left-Most columns in the index and in that order.
I suggest creating separate indexes. Though it has an overhead (May be) of http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html
Related
I have a client running a php photo gallery (on php 5.5, mysql 5.5, using myisam tables) that uses the directory tree method. Unfortunately, some of the queries in their gallery application is demanding horribly long filesorts. The offending query:
SELECT `name`, `slug`
FROM `db_table`
WHERE `left_ptr` <= '914731'
AND `right_ptr` >= '914734'
AND `id` <> 1
ORDER BY `left_ptr` ASC
There are indexes on id, left_ptr and right_ptr, but according to the EXPLAIN, none of them are being used in the query.
I heard that creating a composite index (on the 'condition' columns) would make things faster, but does that apply to this case? The last condition is really but an 'anything but 1' clause, so would a composite index apply to that, too? Thanks for any insight into this.
Yes, a composite index on (left_ptr, right_ptr) should make this query run better.
MySQL will only use one index per query. It's likely not using any single index because it's determined no single index would be much faster than a full table scan. For example, id <> 1 is every row but the first, so just do a full table scan. The other two filters depend on how the data is distributed, but if it doesn't filter a significant portion of the table it won't use an index.
A composite index on (left_ptr, right_ptr) should make this query run better. Don't bother with id, as above id <> 1 only filters one row.
MySQL can use the first column of a composite index alone, so this composite index also replaces the one on left_ptr alone
I have a table in MySQL with two columns
id int(11) unsigned NOT NULL AUTO_INCREMENT,
B varchar(191) CHARACTER SET utf8mb4 DEFAULT NULL,
The id being the PK.
I need to do a lookup in a query using either one of these. id in (:idList) or B in (:bList)
Would this query perform better if, there is a composite index with these two columns in them?
No, it will not.
Indexes can be used to look up values from the leftmost columns in an index:
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
So, if you have a composite index on id, B fields (in this order), then the index can be used to look up values based on their id, or a combination of id and B values. But cannot be used to look up values based on B only. However, in case of an or condition that's what you need to do: look up values based on B only.
If both fields in the or condition are leftmost fields in an index, then MySQL attempts to do an index merge optimisation, so you may actually be better off having separate indexes for these two fields.
Note: if you use innodb table engine, then there is no point in adding the primary key to any multi column index because innodb silently adds the PK to every index.
For OR I dont think so.
Optimizer will try to find a match in the first side, if fail will try the second side. So Individual index for each search will be better.
For AND a composite index will help.
MySQL index TIPS
Of course you can always add the index and compare the explain plan.
MySQL Explain Plan
The trick for optimizing OR is to use UNION. (At least, it works well in some cases.)
( SELECT ... FROM ... WHERE id IN (...) )
UNION DISTINCT
( SELECT ... FROM ... WHERE B IN (...) )
Notes:
Need separate indexes on id and B.
No benefit from any composite index (unless it is also "covering").
Change DISTINCT to ALL if you know that there won't be any rows found by both the id and B tests. (This avoids a de-dup pass.)
If you need ORDER BY, add it after the SQL above.
If you need LIMIT, it gets messier. (This is probably not relevant for IN, but it often is with ORDER BY.)
If the rows are 'wide' and the resultset has very few rows, it may be further beneficial to do
Something like this:
SELECT t...
FROM t
JOIN (
( SELECT id FROM t WHERE id IN (...) )
UNION DISTINCT
( SELECT id FROM t WHERE B IN (...) )
) AS u USING(id);
Notes:
This needs PRIMARY KEY(id) and INDEX(B, id). (Actually there is no diff, as Michael pointed out.)
The UNION is cheaper here because of collecting only id, not the bulky columns.
The SELECTs in the UNION are faster because you should be able to provide "covering" indexes.
ORDER BY would go at the very end.
I am having an issue with a table the uses a compound primary key.
The key consists of a date followed by an bigint.
Selects on the table look to be scanning even when only selecting fields from the PK and using a where clause that contains both columns. For Example
SELECT mydate, myid from foo WHERE mydate >='2014-08-26' AND my_id = 1234;
Explain select shows using where and the number of rows considered is in the millions.
One oddity is the key_len which is shown as 7 which seems far too small.
My instinct says the key is broken but I may be missing something obvious.
Any thoughts?
Thank you
Richard
For this query, the index you want is on id, date:
create index idx_foo_myid_mydate on foo(my_id, mydate);
This is because the conditions in the where clause have an equality and inequality. The equality conditions need to match the index from left to right, before the inequalities can be applied.
MySQL documentation actually does a good job (in my opinion) in explaining composite indexes.
Your existing index will be used for the inequality on mydate. However, all the index after the date in question will then be scanned to satisfy the condition on my_id. With the right index, MySQL can just go to the right rows directly.
I have a query SELECT.. WHERE user_id='' && date>:expire && used=0;
When I try to create index. should I create all together in one query like
CREATE INDEX new_index ON table (user_id, date, used)
or should I separate them and create index for each column?
This really depends on how you plan the query these columns. Your safest bet would be just make indexes for each column, though this may not yield optimum performance.
Multi-column indexes are useful for cases where:
You need to enforce a unique constraint across the combination of column values
You know that you will always uses columns for joins, where clauses, order by, group by, etc. in a specific combination
For example for the combo index you proposed (user_id, date, used) you would be able to utilize the index only in the following conditions:
You are doing join, where, etc. only on user_id
You are doing join, where, etc. on user_id and date
You are doing join, where, etc. on all three columns
You would not be able to utilize the index for these cases
You are doing join, where, etc. on date or used individually
You are doing join, where, etc. on date and used
For further reading, here is MySQL documentation on multi-column indexes:
http://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
Based on your query, you should use a multi-column index. However, the right index is:
CREATE INDEX new_index ON table (user_id, used, date)
Note that date is last in the index, because you have an inequality.
MySQL happens to have a very good discussion here on multi-column indexes and how they are used.
I have a table with 3 columns. This table contains many raws (millions). When I select rows from the table I frequently use the following where clauses:
where column2=value1 and column3=value2
where column1=value
To speed up the select query I want to declare column1 and column2 as indexes. My questions is if declaring the second column as an index will not reduce the positive effect of declaring the first column as index.
I also would like to ask if declaring the second column as index will speed up the queries of this type: where column2=value1 and column3=value2.
ADDED
The column1, column2, and column3 are entity, attribute, value. It's very general. As entities I use person, movies, cities, countries and so on. Attributes are things like: "located in", "date of birth", "produced by".
You should create indexes that support your queries. In this case you want to create an index on column2,column3 together (not two separate indexes, but one index for the combination of columns) to support the first query, and another on column1 to support the second query. More generally, if a query uses a set of columns, adding an index for all those columns will speed it up (although there are many exceptions, of course).
An index on column2 would speed up the query column2=value1 and column1=value2, and so would an index on column2,column3 (the important thing is that column2 is the first column in the index).
When working with indexes the EXPLAIN keyword is very useful. Prefix your queries with EXPLAIN (e.g. EXPLAIN SELECT * FROM table) to get a description of how the database is going to perform your query. It will tell you if it's going to use an index, and in that case which.
Seems like neither of your plans are going to work. Based on both of the where clauses I would suggest having the primary key on column1 and a second index column2,column3. This would speed up both of your queries.