MySQL Index sometimes not being used - mysql

I have a table with 150k rows of data, and I have column with a UNIQUE INDEX, It has a type of VARCHAR(10) and stores 10 digit account numbers.
Now whenever I query, like a simple one:
SELECT * FROM table WHERE account_number LIKE '0103%'
It results 30,000+ ROWS, and when I run a EXPLAIN on my query It shows no INDEX is used.
But when I do:
SELECT * FROM table WHERE account_number LIKE '0104%'
It results 4,000+ ROWS, with the INDEX used.
Anyone can explain this?
I'm using MySQL 5.7 Percona XtraDB.

30k+/150k > 20% and I guess it is faster to do table scan. From 8.2.1.19 Avoiding Full Table Scans:
The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:
You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.
If you don't need all values try to use:
SELECT account_number FROM table WHERE account_number LIKE '0103%'
instead of SELECT *. Then your index will become covering index and optimizer should always use it (as long as WHERE condition is SARGable).

The most database uses B tree for indexing. In this case the database optimizer don't use the index because its faster to scan without index. Like #lad2025 explained.
Your database column is unique and i think your cardinality of your index is high. But since your query using the like filter the database optimizer decides for you to choose not to use the index.
You can use try force index to see the result. Your using varchar with unique index. I would choose another data type or change your index type. If your table only contains numbers change it to numbers. This will help to optimize you query a lot.
In some cases when you have to use like you can use full text index.
If you need help with optimizing your query and table. Provide us more info and which info you want to fetch from your table.

lad2025 is correct. The database is attempting to make an intelligent optimization.
Benchmark with:
SELECT * FROM table FORCE INDEX(table_index) WHERE account_number LIKE '0103%'
and see who is smarter :-) You can always try your hand at questioning the optimizer. That's what index hints are for...
https://dev.mysql.com/doc/refman/5.7/en/index-hints.html

Related

MySQL how to index a query that searches for a substring in column while filtering integer columns

I have a table with a billion+ rows. I have have the below query which I frequently execute:
SELECT SUM(price) FROM mytable WHERE domain IN ('com') AND url LIKE '%/shop%' AND date BETWEEN '2001-01-01' AND '2007-01-01';
Where domain is varchar(10) and url is varchar(255) and price is float. I understand that any query with %..% will not use any index. So logically, I created an index on price domain and date:
create index price_date on mytable(price, domain, date)
The problem here persists, this index is also not used because query contains: url LIKE '%.com/shop%'
On the other hand a FULLTEXT index still will not work since I have other non text filters in the query.
How can I optimise the above query? I have too many rows not to use an index.
UPDATE
Is this an sql limit? could such a query provide better performance on a noSQL database?
You have two range conditions, one uses IN() and the other uses BETWEEN. The best you can hope is that the condition on the first column of the index uses the index to examine rows, and the condition on the second column of the index uses index condition pushdown to make the storage engine do some pre-filtering.
Then it's up to you to choose which column should be the first column in the index, based on how well each condition would narrow down the search. If your condition on date is more likely to reduce the set of examined rows, then put that first in the index definition.
The order of terms in the WHERE clause does not have to match the order of columns in the index.
MySQL does not support optimizing with both a fulltext index and a B-tree index on the same table reference in the same query.
You can't use a fulltext index anyway for the pattern you are searching for. Fulltext indexes don't allow searches for punctuation characters, only words.
I vote for this order:
INDEX(domain, -- first because of "="
date, -- then range
url, price) -- "covering"
but, since the constants look like most of the billion rows would be hit, I don't expect good performance.
If this is a common query and/or "shop" is one of only a few possible filters, we can discuss whether a summary table would be useful.

Index on mysql partitioned tables

I have a table with two partitions. Partitions are pactive = 1 and pinactive = 0. I understand that two partitions does not make so much of a gain, but I have used it to truncate and load in one partition and plain inserts in another partition.
The problem comes when I create indexes.
Query goes this way
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
Created index -
create index idx_try on customformattributes(partitionflag,companyid,activityname,completiondate,attributename,isclosed)
there are around 200000 records that will be retreived from the above query. But the query along with the mentioned index takes 30+ seconds. What is the reason for such a long time? Also, if remove the partitionflag from the mentioned index, the index is not even used.
And is the understanding that,
Even with the partitions available, the optimizer needs to have the required partition mentioned in the index definition, so that it only hits the required partition ---- Correct?
Any ideas on understanding this would be very helpful
You can optimize your index by reordering the columns in it. Usually the columns in the index are ordered by its cardinality (starting from the highest and go down to the lowest). Cardinality is the uniqueness of data in the given column. So in your case I suppose there are many variations of companyid in customformattributes table while partitionflag will have cardinality of 2 (if all the options for this column are 1 and 0).
Your query will first filter all the rows with partitionflag=0, then it will filter by company id and so on.
When you remove partitionflag from the index the query did not used the index because may be the optimizer decides that it will be faster to make full table scan instead of using the index (in most of the cases the optimizer is right)
For the given query:
select partitionflag,companyid,activityname
from customformattributes
where companyid=47
and activityname = 'Activity 1'
and partitionflag=0
the following index may be would be better (but of course :
create index idx_try on customformattributes(companyid,activityname, completiondate,attributename, partitionflag, isclosed)
For the query to use index the following rule must be met - the left most column in the index should be present in the where clause ... and depending on the mysql version you are using additional query requirements may be needed. For example if you are using old version of mysql - you may need to order the columns in the where clause in the same order they are listed in the index. In the last versions of mysql the query optimizer is responsible for ordering the columns in the where clause in the correct order.
Your SELECT query took 30+ seconds because it returns 200k rows and because the index might not be the optimal for the given query.
For the second question about the partitioning: the common rule is that the column you are partitioning by must be part of all the UNIQUE keys in a table (Primary key is also unique key by definition so the column should be added to the PK also). If table structure and logic allows you to add the partitioning column to all the UNIQUE indexes in the table then you add it and partition the table.
When the partitioning is made correctly you can take the advantage of partitioning pruning - this is when SELECT query searches the data only in the partitions where given data is stored (otherwise it looks in all partitions)
You can read more about partitioning here:
https://dev.mysql.com/doc/refman/5.6/en/partitioning-overview.html
The query is slow simply because disks are slow.
Cardinality is not important when designing an index.
The optimal index for that query is
INDEX(companyid, activityname, partitionflag) -- in any order
It is "covering" since it includes all the columns mentioned anywhere in the SELECT. This is indicated by "Using index" in the EXPLAIN.
Leaving off the other 3 columns makes the query faster because it will have to read less off the disk.
If you make any changes to the query (add columns, change from '=' to '>', add ORDER BY, etc), then the index may no longer be optimal.
"Also, if remove the partitionflag from the mentioned index, the index is not even used." -- That is because it was no longer "covering".
Keep in mind that there are two ways an index may be used -- "covering" versus being a way to look up the data. When you don't have a "covering" index, the optimizer chooses between using the index and bouncing between the index and the data versus simply ignoring the index and scanning the table.

Performance LIKE including wildcards (%)

If I execute this query:
SELECT * FROM table1 WHERE name LIKE '%girl%'
It returns all records where name contains 'girl'. However, because of the first wildcard % in the LIKE statment, it cannot (or does not) use indexes as stated here: Mysql Improve Search Performance with wildcards (%%)
Then I changed the query to:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
On the leftside of the OR I removed the wildcard so it can use indexes. But the performance win depends on how MySQL evaluates the query.
Hence my question: Does the performance of my query increases when I add the OR statement?
No, the performance will be the same. MySQL still has to evaluate the first condition (LIKE '%girl%') because of the OR. Then it can evaluate the second condition using index. You can see this info when you EXPLAIN your query (mysql will show that it stills needs to do a full table scan, which means check each row):
EXPLAIN SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
For better performance for these kinds of queries you would need to use Fulltext indexes and special syntax for querying them. But FT indexes behave different and are not suited for everything.
(This answer provides a summary of the comments, plus contradicts some of the previous notes.)
Leading wildcard:
SELECT * FROM table1 WHERE name LIKE 'girl%' OR name LIKE '%girl%'
SELECT * FROM table1 WHERE name LIKE '%girl%'
Either of those will do a table scan and ignore any indexes. This both because of the leading wild card and the OR. (It will not use the index for 'girl%', contrary to what #Marki555 says -- it's not worth the extra effort.)
Range query via LIKE (no leading wildcard):
SELECT * FROM table1 WHERE name LIKE 'girl%'
will probably use INDEX(name) in the following way:
Drill down the BTree for that index to the first name starting with "girl";
Scan forward (in the index) until the last row starting with "girl";
For each item in step 2, reach over into the data to get *.
Since Step 3 can be costly, the optimizer first estimates how many rows will need to be touched in Step 2. If more than 20% (approx) of the table, it will revert to a table scan. (Hence, my use of "probably".)
"Covering index":
SELECT name FROM table1 WHERE name LIKE '%girl%'
This will always use INDEX(name). That is because the index "covers". That is, all the columns in the SELECT are found in the INDEX. Since an INDEX looks and feels like a table, scanning the index is the best way to do the query. Since an index is usually smaller than the table, an index scan is usually faster than a table scan.
Here's a less obvious "covering index", but it applies only to InnoDB:
PRIMARY KEY(id)
INDEX(name)
SELECT id FROM table1 WHERE name LIKE '%girl%'
Every secondary key (name) in InnoDB implicitly includes the PK (id). Hence the index looks like (name, id). Hence all the columns in the SELECT are in the index. Hence it is a "covering index". Hence it will use the index and do an "index scan".
A "covering index" is indicated by Using index showing up in the EXPLAIN SELECT ....

Use an index on a join without "where"

I'm trying to understand if it's possible to use an index on a join if there is no limiting where on the first table.
Note: this is not a line-by-line real-case usage, just a thing I draft together for understanding purposes. Don't point out the obvious "what are your trying to obtain with this schema?", "you should use UNSIGNED" or the likes because that's not the question.
Note2: this MySQL JOINS without where clause is somehow related but not the same
Schema:
CREATE TABLE posts (
id_post INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
text VARCHAR(100)
);
CREATE TABLE related (
id_relation INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
id_post1 INT NOT NULL,
id_post2 INT NOT NULL
);
CREATE INDEX related_join_index ON related(id_post1) using BTREE;
Query:
EXPLAIN SELECT * FROM posts FORCE INDEX FOR JOIN(PRIMARY) INNER JOIN related ON id_post=id_post1 LIMIT 0,10;
SQL Fiddle: http://sqlfiddle.com/#!2/84597/3
As you can see, the index is being used on the second table, but the engine is doing a full table scan on the first one (the FORCE INDEX is there just to highlight the general question).
I'd like to understand if it's possible to get a "ref" on the left side too.
Thanks!
Update: if the first table has significantly more record than the second, the thing swap: the engine uses an index for the first one and a full table scan for the second http://sqlfiddle.com/#!2/3a3bb/1 Still, no way to get indexes used on both.
The DBMS has an optimizer to figure out the best plan to execute a query. It's up to the optimizer to decide whether to use an index or simply read the table directly.
An index makes sense when the DBMS expects only few records to read from a table (say 1% of all rows only). But once it expects to read many records (say 99% of all rows) it will not use the index. The threshold may lie at low as 5% (i.e. <= 5% -> index; > 5% table scan).
There are exceptions. One is when an index holds all columns needed. Then the table itself doesn't have to be read at all. Another may be when the optimizer thinks an index access may result faster in spite of having to read many rows. It's also always possible the optimizer simply guesses wrong.
There is a page on the MySQL documentation about this subject.
Regarding the possibility to get a ref on the first table from the query, the short answer is NO.
The reason is obvious: because there is no WHERE clause ALL the rows from table posts are analyzed because they could be included in the result set. There is no reason to use an index for that, a full table scan is better because it gets all the rows; and because the order doesn't matter, the access is (more or less) sequential. Using an index requires reading more information from the storage (index and data).
MySQL will use the join type index if all the columns that appear in the SELECT clause are present in an index. In this case MySQL will perform a full index scan (join type index) instead of a full table scan (join type ALL) because it requires reading less information from the storage (an index is usually smaller than the entire table data).

Use an index for a string column

If I have a query with ordering by a string column, like this...
SELECT * FROM foo ORDER BY name
...should I create an index for foo.name? (foo.name may be VARCHAR(255) or VARCHAR(400)
Obviously, you should create an index on name.
If you run queries with order_by by a column or where conditions by a columns, then those columns should be indexed.
It will increase the speed in which you get the result from any database.
But indexing should be used with caution. Too much of indexing may slow up your database.
You should index those columns which are searched frequently or ordered frequently.
Doesn't seem to affect, in MySQL with my test table. My test table is small, though. Explain revealed that Mysql has to do file sort in both the cases; with and without index.
Itz better to check your query with explain to confirm the same.