MySQL index misses - mysql

I have a query that looks like the following:
select count(*) from `foo` where expires_at < now()”
since expires_at is indexed, the query hits the index no problem. however the following query:
select count(*) from `foo` where expires_at < now() and some_id != 5
the index never gets hit.
both expires_at and some_id are indexed.
is my index not properly created?

This query:
SELECT COUNT(*)
FROM foo
WHERE expires_at < NOW()
can be satisfied by the index only, without referring to the table itself. You may see it from the using index in the plan.
This query:
SELECT COUNT(*)
FROM foo
WHERE expires_at < NOW()
AND some_id <> 5
needs to look into the table to find the value of some_id.
Since the table lookup is quite an expensive thing, it is more efficient to use the table scan and filter the records.
If you had a composite index on expires_at, some_id, the query would probably use the index both for ranging on expires_at and filtering on some_id.
SQL Server even offers a feature known as included fields for this. This command
CREATE INDEX ix_foo_expires__someid ON foo (expires_at) INCLUDE (some_id)
would create an index on expires_at which would additionally store some_id in the leaf entires (without overhead of sorting).
MySQL, unfortunately, does not support it.

Probably what's happening is that for the first query, the index can be used to count the rows satisfying the WHERE clause. In other words, the query would result in a table scan, but happily all the columns involved in the WHERE condition are in an index, so the index is scanned instead.
In the second query though, there's no single index that contains all the columns in the WHERE clause. So MySQL resorts to a full table scan. In the case of the first query, it was using your index, but not to find the rows to check - in the special case of a COUNT() query, it could use the index to count rows. It was doing the equivalent of a table scan, but on the index instead of the table.

1) It seems you have two single-column indices. You can try to create a multi-column index.
For a detailed explanation why this is different than multiple single column indices, see the following:
http://www.mysqlfaqs.net/mysql-faqs/Indexes/When-does-multi-column-index-come-into-use-in-MySQL
2) Do you have a B-tree index on the expires_at column? Since you are doing a range query (<), that might give better performance.
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Best of luck!

Related

mysql single column condition query on a multi-column index

suppose a table has only one index : idx_goods (is_deleted,price)
and is_deleted column is either 0 or 1
my query looks like this : where price < 10, which of the follwing behavior is this case:
1 mysql do a full scan on the secondary index for price match
2 mysql partially scan the secondary index for price match where it starts with is_deleted=0 in the secondary index, reach price=10, then jump to is_deleted=1 in the secondary index and continue there
3 mysql ignore the secondary index and scan the base table for price match, in other words, condition field that's not in the query's index key prefix is not matched against secondary index but against base table, event though the condition field is part of index key
To utilize index based on multiple fields where condition must mostly use fields from the begining of index. So, if index is on fields (field1, field2, field3) then condition must contain field1.
More about MySQL index optimisation
There are some DB systems which can use indexes even if its first part is ommited in condition, but has very limited range of values in it.
Query plan can be checked by using EXPLAIN SELECT ..... documentation
For tables with small number of rows (fewer than 10.. from documentation ) indexes may be not used, so better prepare more data for real tests.
And as seen in this example index is not used for query
explain select * from test where price < 10
but for query below index is used with good results.
explain select * from test where is_deleted in (0,1) and price < 10
If is_deleted can be null then condition should be modified to (is_deleted in (0,1) or is_deleted is null), still it uses index.
And, as #Luuk mentioned, in this case (condition only on price field) index on price will be the best option.

MySQL how to index a query that searches for a substring in column while filtering integer columns

I have a table with a billion+ rows. I have have the below query which I frequently execute:
SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';
Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:
create index price_date on mytable(price, domain, date)
The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'
On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.
How can I optimise the above query? I have too many rows not to use an index.
UPDATE
Is this an sql limit? could such a query provide better performance on a noSQL database?
You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.
Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.
The order of terms in the WHERE clause does not have to match the order of columns in the index.
MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.
You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.
I vote for this order:
INDEX(domain, -- first because of "="
date, -- then range
url, price) -- "covering"
but, since the constants look like most of the billion rows would be hit, I don't expect good performance.
If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.

Is there a benefit of a second index if column is already part of other index?

I have a MysQL 5.7 table with about 500m records with a unique index on (column1, column2, column3, date).
In some cases I only query for records from a particular day. e.g. DATE = curdate() - interval 2 day. This can take up to 1 minute.
Would there be a benefit if there would be a seperate index on date?
The point to appreciate here is that for queries only involving the date column, MySQL would not be able to use your current 4-column index. The reason for this is that B-tree splits on 3 other columns before hitting the date column. As a result, this index could only satisfy the date query if MySQL were to do a complete index scan. Typically, in this case, MySQL would just rather do a full table scan and not bother with the index.
So, if you want your date filter only query to use an index, you would have to add a new index:
CREATE INDEX idx2 ON yourTable (date);
Regarding index "bloat," namely having too many indices, you are not really near that with just 2 indices defined. If you do find that you have say 5-7 indices defined, then you might be able to combine or consolidate them.

MySQL: composite index fulltext+btree?

I want a query that does a fulltext search on one field and then a sort on a different field (imagine searching some text document and order by publication date). The table has about 17M rows and they are more or less uniformly distributed in dates. This is to be used in a webapp request/response cycle, so the query has to finish in at most 200ms.
Schematically:
SELECT * FROM table WHERE MATCH(text) AGAINST('query') ORDER BY date=my_date DESC LIMIT 10;
One possibility is having a fulltext index on the text field and a btree on the publication date:
ALTER TABLE table ADD FULLTEXT index_name(text);
CREATE INDEX index_name ON table (date);
This doesn't work very well in my case. What happens is that MySQL evaluates two execution paths. One is using the fulltext index to find the relevant rows, and once they are selected use a FILESORT to sort those rows. The second is using the BTREE index to sort the entire table and then look for matches using a FULL TABLE SCAN. They're both bad. In my case MySQL chooses the former. The problem is that the first step can select some 30k results which it then has to sort, which means the entire query might take of the order 10 seconds.
So I was thinking: do composite indexes of FULLTEXT+BTREE exist? If you know how a FULLTEXT index works, it first tokenizes the column you're indexing and then builds an index for the tokens. It seems reasonable to me to imagine a composite index such that the second index is a BTREE in dates for each token. Does this exist in MySQL and if so what's the syntax?
BONUS QUESTION: If it doesn't exist in MySQL, would PostgreSQL perform better in this situation?
Use IN BOOLEAN MODE.
The date index is not useful. There is no way to combine the two indexes.
Beware, if a user searches for something that shows up in 30K rows, the query will be slow. There is no straightforward away around it.
I suspect you have a TEXT column in the table? If so, there is hope. Instead of blindly doing SELECT *, let's first find the ids and get the LIMIT applied, then do the *.
SELECT a.*
FROM tbl AS a
JOIN ( SELECT date, id
FROM tbl
WHERE MATCH(...) AGAINST (...)
ORDER BY date DESC
LIMIT 10 ) AS x
USING(date, id)
ORDER BY date DESC;
Together with
PRIMARY KEY(date, id),
INDEX(id),
FULLTEXT(...)
This formulation and indexing should work like this:
Use FULLTEXT to find 30K rows, deliver the PK.
With the PK, sort 30K rows by date.
Pick the last 10, delivering date, id
Reach back into the table 10 times using the PK.
Sort again. (Yeah, this is necessary.)
More (Responding to a plethora of Comments):
The goal behind my reformulation is to avoid fetching all columns of 30K rows. Instead, it fetches only the PRIMARY KEY, then whittles that down to 10, then fetches * only 10 rows. Much less stuff shoveled around.
Concerning COUNT on an InnoDB table:
INDEX(col) makes it so that an index scan works for SELECT COUNT(*) or SELECT COUNT(col) without a WHERE.
Without INDEX(col),SELECT COUNT(*)will use the "smallest" index; butSELECT COUNT(col)` will need a table scan.
A table scan is usually slower than an index scan.
Be careful of timing -- It is significantly affected by whether the index and/or table is already cached in RAM.
Another thing about FULLTEXT is the + in front of words -- to say that each word must exist, else there is no match. This may cut down on the 30K.
The FULLTEXT index will deliver the date, id is random order, not PK order. Anyway, it is 'wrong' to assume any ordering, hence it is 'right' to add ORDER BY, then let the Optimizer toss it if it knows that it is redundant. And sometimes the Optimizer can take advantage of the ORDER BY (not in your case).
Removing just the ORDER BY, in many cases, makes a query run much faster. This is because it avoids fetching, say, 30K rows and sorting them. Instead it simply delivers "any" 10 rows.
(I have not experience with Postgres, so I cannot address that question.)

Mysql query slow performance

I have a table with 500k rows. I have specific table which takes really long time to run every query.
One of the queries is:
SELECT *
FROM player_data
WHERE `user_id` = '61120'
AND `opzak` = 'ja'
ORDER BY opzak_nummer ASC
the opzak_nummer column is a tinyint with a number.
EXPLAIN:
Is there any way to improve this query performance and the general of this query/table?
The table name is player_data and includes about 25 columns, most of them are integers with values of stats.
The index is id AUTO_INCREMENT.
You need to run that query, it will alter table and add index. You can read more details here http://dev.mysql.com/doc/refman/5.7/en/drop-index.html
ALTER TABLE pokemon_speler ADD INDEX index_name (user_id, opzak);
The optimal index for that query is either of these:
INDEX(user_id, opzak, opzak_nummer)
INDEX(opzak, user_id, opzak_nummer)
The first two columns do the filtering; the last avoids a tmp table and sort by consuming the ORDER BY.
Is any combination of columns 'unique' (other than id)? If so, we might be able to make it run even faster.