mysql query not using the index i want - mysql

I have the following query left joining 2 tables :
explain
select
n.* from npi n left join npi_taxonomy nt on n.NPI=nt.NPI_CODE
where
n.Provider_First_Name like '%s%' and
n.Provider_Last_Name like '%b%' and
n.Provider_Business_Practice_Location_Address_State_Name = 'SC' and
n.Provider_Business_Practice_Location_Address_City_Name = 'charleston' and
n.Provider_Business_Practice_Location_Address_Postal_Code in (29001,29003,29010,29016,29018,29020,29030,29032,29033,29038,29039,29040,29041,29042,29044,29045,29046,29047,29048,29051,29052,29053,29056,29059,29061,29062,29069,29071,29072,29073,29078,29079,29080,29081,29082,29102,29104,29107,29111,29112,29113,29114,29115,29116,29117,29118,29123,29125,29128,29133,29135,29137,29142,29143,29146,29147,29148,29150,29151,29152,29153,29154,29160,29161,29162,29163,29164,29168,29169,29170,29171,29172,29201,29202,29203,29204,29205,29206,29207,29208,29209,29210,29212,29214,29215,29216,29217,29218,29219,29220,29221,29222,29223,29224,29225,29226,29227,29228,29229,29230,29240,29250,29260,29290,29292,29401,29402,29403,29404,29405,29406,29407,29409) and
n.Entity_Type_Code = 1 and
nt.Healthcare_Provider_Taxonomy_Code in ('101Y00000X')
limit 0,10;
I have added a multi-column index :
npi_fname_lname_state_city_zip_entity on the table npi which indexes the columns in the following order :
NPI,
Provider_First_Name,
Provider_First_Name,
Provider_Business_Practice_Location_Address_State_Name, Provider_Business_Practice_Location_Address_City_Name, Provider_Business_Practice_Location_Address_Postal_Code,
Entity_Type_Code
However, when i do an explain on the query, it shows me that it uses the primary index (NPI). Also, it says rows examined = 1
What's worse is : the query takes roughly 120 seconds to execute. How do i optimize this ?
I would really appreciate some help regarding this.

The reason why your multi column index doesn't help, is because you are filtering with a wild card like '%s%'.
Indexes can only be used when filtering using the left most prefix of the index, which means that 1) cannot do a contains search, and 2) if the left most column of the multi column index cannot be used, the other columns in the index cannot be used aswell.
You should switch the order of the columns in the index to
Provider_Business_Practice_Location_Address_State_Name,
Provider_Business_Practice_Location_Address_City_Name,
Provider_Business_Practice_Location_Address_Postal_Code,
Entity_Type_Code
That way MySql will only scan the rows that match those the criteria for those columns (SC, charleston etc).
Alternatively, look into full text indexes.

Related

Should I create separate MySQL indexes for url_title vs url_title, url_description, url_keywords?

Using MySQL 5.7, I have a table of urls containing url_title, url_description, url_keywords
Sometimes I just need to look in url_title, but sometimes look for something in all columns.
Is it better to just create one index containing all 3 columns or create a separate index for url_title alone and another index containing all 3 columns ?
e.g Will it search for url_title slower in the 3 columns index vs single column ?
Or can MySQL only search/read in given column even if index would contain 3 columns ?
Later edit: this is a sample query but I do have other less important variations:
SELECT *
FROM urls
WHERE match(url_title, url_description,
url_keywords, url_paragraphs)
against('red boots' IN BOOLEAN MODE)
LIMIT 500
Update: You didn't mention in your original post that you were talking about fulltext indexes, not conventional B-tree indexes.
Fulltext indexes are a different type. You must specify ALL the columns of the fulltext index in your MATCH() clause. No fewer, and no more, and they must be in the same order as they appear in the index definition.
If you want to do a fulltext search only on a single column sometimes, then you will have to create another fulltext with that single column.
Below is my original answer, that I wrote before you clarified that you were using a fulltext index. Perhaps it will help someone else.
MySQL can use the index if the column(s) you search are the leftmost column(s) of that index. It can use a subset of the columns of a multi-column index.
For example, given an index on (a, b, c), the following query uses all three columns:
SELECT ... WHERE a = ? AND b = ? AND c = ?
The following query uses the first column a of the index, because it's the leftmost column.
SELECT ... WHERE a = ?
The following query uses the first two columns of the index, because they're consecutive and the leftmost subset of columns.
SELECT ... WHERE a = ? AND b = ?
The following query uses only the first column a of the index, because the conditions don't match consecutive columns of the index. It will use the index to narrow down the search to rows matching the a condition, but then it will have to examine each of those rows to evaluate the c condition, even though c is part of the same index.
SELECT ... WHERE a = ? AND c = ?
MySQL has an optimization called index condition pushdown which does a short-cut for this. It delegates to the storage engine to evaluate the c condition, knowing that c is part of the index. So it still counts as examining the row, but it make the row read a little bit less costly.
The following query cannot use the index at all, because the conditions are not on leftmost columns of that index.
SELECT ... WHERE b = ? AND c = ?
The guidelines for FULLTEXT indexes and MATCH...AGAINST are different than for INDEX. For this:
SELECT *
FROM urls
WHERE match(url_title, url_description,
url_keywords, url_paragraphs)
against('red boots' IN BOOLEAN MODE)
LIMIT 500
(and assuming ENGINE=InnoDB), you need a FULLTEXT index with all 4 columns in it.
FULLTEXT(url_title, url_description,
url_keywords, url_paragraphs)
If you might also be searching, say, just url_title in another query, then you would also need FULLTEXT(url_title). (Etc)
See if either of these would be 'better' for your application:
against('+red +boots' IN BOOLEAN MODE)
against('red boots')

Mysql not using right index

I have a framework that generate SQL. one of the query is using my index "A" and return results in 7 seconds. I see that I can optimize this and I created an index "B".
now if I run "explain my query", it still use my index A. however, if I force the use of index B, I get my results in 1 seconds (7x faster)
so clearly my index B is faster than my index A. I can't use the "force index" or "use index" command as my sql is generated from a framework that does not support this.
So, Why is mysql not naturally using the fastest index. And is there a way I can tell mysql to always use a certain index without adding "use" or "force".
the query :
SELECT *
FROM soumission
LEFT OUTER JOIN region_administrative
ON soumission.region_administrative_oid=region_administrative.oid
WHERE (soumission.statut=2
AND ((soumission.telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR (soumission.autre_telephone LIKE '%007195155134070067132211046052045128049212213255%'))
OR (soumission.cellulaire LIKE '%007195155134070067132211046052045128049212213255%')))
ORDER BY soumission.date_confirmation DESC, soumission.numero;
i added an index on multiple column "statut","telephone","autre_telephone","cellulaire"
if I force using this index my query is 7x faster but if I dont specify which index to use, it use another index (only on statut field) which is 7x slower
here is the explain if I select a large date period (using the wrong index)
here is When I select a small date window
This seems to be what you are doing...
SELECT s.*, ra.*
FROM soumission AS s
LEFT OUTER JOIN region_administrative AS ra ON s.region_administrative_oid=ra.oid
WHERE s.statut = 2
AND ( s.telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR s.autre_telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR s.cellulaire LIKE '%007195155134070067132211046052045128049212213255%'
)
ORDER BY s.date_confirmation DESC, s.numero;
If you don't need ra.*, get rid of the LEFT JOIN.
The multi-column index you propose is useless and won't be used unless... statut = 2 for less than 20% of the rows. In that case, it will only use the first column of the index.
OR defeats indexing. (See below)
Leading wildcard on LIKE defeats indexing. Do you need the leading or trailing wild cards?
The mixing of DESC and ASC in the ORDER BY defeats using an index to avoid sorting.
So, what to do? Instead of having 3 columns for exactly 3 phone numbers, have another table for phone numbers. Then have any number of rows for a given soumission. Then searching that table may be faster because of avoiding OR -- but only if you get rid the leading wildcard.
(That's an awfully long phone number! Is it real?)
As to the query itself:
Try avoiding the leading LIKE wildcard (removed in the query below).
Split the query to several parts, combined with a UNION clause, so that indexes can be used.
So, create these indexes:
ALTER TABLE `region_administrative` ADD INDEX `region_administrativ_idx_oid` (`oid`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_cellulaire` (`statut`,`region_administrative_oid`,`cellulaire`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_telephone` (`statut`,`region_administrative_oid`,`autre_telephone`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_telephone` (`statut`,`region_administrative_oid`,`telephone`);
Then try this query:
SELECT
*
FROM
((SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(
soumission.statut = 2
AND (
(
soumission.cellulaire LIKE '007195155134070067132211046052045128049212213255%'
)
)
)
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
UNION
DISTINCT (SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(soumission.statut = 2
AND (((soumission.autre_telephone LIKE '007195155134070067132211046052045128049212213255%'))))
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
UNION
DISTINCT (SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(soumission.statut = 2
AND ((soumission.telephone LIKE '007195155134070067132211046052045128049212213255%')))
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
) AS union1
ORDER BY
union1.date_confirmation DESC,
union1.numero

Optimizing MySQL Left join query between 3 tables to reduce execution time

I have the following query:
SELECT region.id, region.world_id, min_x, min_y, min_z, max_x, max_y, max_z, version, mint_version
FROM minecraft_worldguard.region
LEFT JOIN minecraft_worldguard.region_cuboid
ON region.id = region_cuboid.region_id
AND region.world_id = region_cuboid.world_id
LEFT JOIN minecraft_srvr.lot_version
ON id=lot
WHERE region.world_id = 10
AND region_cuboid.world_id=10;
The Mysql slow query log tells me that it takes more than 5 seconds to execute, returns 2300 rows but examines 15'404'545 rows to return it.
The three tables each have bout 6500 rows only with unique keys on the id and lot fields as well as keys on the world_id fields. I tried to minimize the amount of rows examined by filtering both cuboid and world by their ID and the double WHERE on world_id, but it did not seem to help.
Any idea how I can optimize this query?
Here is the sqlfiddle with the indexes as of current status.
MySQL can't use index in this case because joined fields has different data types:
`lot` varchar(20) COLLATE utf8_unicode_ci NOT NULL
`id` varchar(128) COLLATE utf8_bin NOT NULL
If you change types of this fields to general type (for example, region.id to utf8_unicode_ci), MySQL uses primary key (fiddle).
According to docs:
Comparison of dissimilar columns (comparing a string column to a
temporal or numeric column, for example) may prevent use of indexes if
values cannot be compared directly without conversion.
You have joined the two tables "minecraft_worldguard.region" and "minecraft_worldguard.region_cuboid", on region.world_id and region_cuboid.world_id. So WHERE clause wouldn't require two conditions.
The two columns in the WHERE clause have been equated in the JOIN condition, hence you wouldn't require checking both the conditions in the WHERE clause. Remove one of them in the WHERE clause and add an index on the column that is remaining on the WHERE condition.
In your example, leave the WHERE clause as below:
WHERE region.world_id = 10
and add an index on the region.world_id column, that would improve the performance a bit.
NOTE: observe that I am suggesting you to discard "AND region_cuboid.world_id=10;" part of the WHERE clause.
Hope that helps.
First, when writing queries that have multiple tables, it is a very good thing to get used to "alias" references to the tables so you don't have to retype the entire long name throughout. Also, it is a really good idea to identify which tables the columns are coming from to allow users to better understand what is where which can also help improve performance (such as suggesting a covering index).
That said, I have applied aliases to your original query, but AM GUESSING the table per the respective columns, but you can obviously identify quickly and adjust.
SELECT
R.id,
R.world_id,
RC.min_x,
RC.min_y,
RC.min_z,
RC.max_x,
RC.max_y,
RC.max_z,
LV.version,
LV.mint_version
FROM
minecraft_worldguard.region R
LEFT JOIN minecraft_worldguard.region_cuboid RC
ON R.id = RC.region_id
AND R.world_id = RC.world_id
LEFT JOIN minecraft_srvr.lot_version LV
ON R.id = LV.lot
WHERE
R.world_id = 10
I also removed from the where clause your "region_cuboid.world_id = 10" as that is redundant as a result of the JOIN clause based on region AND world.
For suggestion of indexes, and if I have the proper alias references to the columns, I would suggest a covering index on the region table of
( world_id, id ). The "World_id" in the first position quickly qualifies the WHERE clause, and the "id" is there for the RC and LV tables.
For the region_cuboid table, I would also have an index on ( world_id, region_id) to match the region table being joined to it.
For the lot_version table, and index on (lot) or a covering index on (lot, version, mint_version)

Best way to use indexes on large mysql like query

This mysql query is runned on a large (about 200 000 records, 41 columns) myisam table :
select t1.* from table t1 where 1 and t1.inactive = '0' and (t1.code like '%searchtext%' or t1.name like '%searchtext%' or t1.ext like '%searchtext%' ) order by t1.id desc LIMIT 0, 15
id is the primary index.
I tried adding a multiple column index on all 3 searched (like) columns. works ok but results are served on a auto filled ajax table on a website and the 2 seond return delay is a bit too slow.
I also tried adding seperate indexes on all 3 columns and a fulltext index on all 3 columns without significant improvement.
What would be the best way to optimize this type of query? I would like to achieve under 1 sec performance, is it doable?
The best thing you can do is implement paging. No matter what you do, that IO cost is going to be huge. If you only return one page of records, 10/25/ or whatever that will help a lot.
As for the index, you need to check the plan to see if your index is actually being used. A full text index might help but that depends on how many rows you return and what you pass in. Using parameters such as % really drain performance. You can still use an index if it ends with % but not starts with %. If you put % on both sides of the text you are searching for, indexes can't help too much.
You can create a full-text index that covers the three columns: code, name, and ext. Then perform a full-text query using the MATCH() AGAINST () function:
select t1.*
from table t1
where match(code, name, ext) against ('searchtext')
order by t1.id desc
limit 0, 15
If you omit the ORDER BY clause the rows are sorted by default using the MATCH function result relevance value. For more information read the Full-Text Search Functions documentation.
As #Vulcronos notes, the query optimizer is not able to use the index when the LIKE operator is used with an expression that starts with a wildcard %.

I'm not sure if I have the correct indexes or if I can improve the speed of my query in MySQL?

My query has a join, and it looks like it's using two indexes which makes it more complicated. I'm not sure if I can improve on this, but I thought I'd ask.
The query produces a list of records with similar keywords the record being queried.
Here's my query.
SELECT match_keywords.padid,
COUNT(match_keywords.word) AS matching_words
FROM keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
WHERE match_keywords.word IS NOT NULL
AND current_program_keywords.padid = 25695
GROUP BY match_keywords.padid
ORDER BY matching_words DESC
LIMIT 0, 11
The EXPLAIN
Word is varchar(40).
You can start by trying to remove the IS NOT NULL test, which is implicitly removed by COUNT on the field. It also looks like you would want to omit 25695 from match_keywords, otherwise 25695 (or other) would surely show up as the "best" match within your 11 row limit?
SELECT match_keywords.padid,
COUNT(match_keywords.word) AS matching_words
FROM keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
WHERE current_program_keywords.padid = 25695
GROUP BY match_keywords.padid
ORDER BY matching_words DESC
LIMIT 0, 11
Next, consider how you would do it as a person.
You would to start with a padid (25695) and retrieve all the words for that padid
From those list of words, go back into the table again and for each matching word,
get their padid's (assumed to have no duplicate on padid + word)
group the padid's together and count them
order the counts and return the highest 11
With your list of 3 separate single-column indexes, the first two steps (both involve only 2 columns) will always have to jump from index back to data to get the other column. Covering indexes may help here - create two composite indexes to test
create index ix_keyword_pw on keyword(padid, word);
create index ix_keyword_wp on keyword(word, padid);
With these composite indexes in place, you can remove the single-column indexes on padid and word since they are covered by these two.
Note: You always have to temper SELECT performance against
size of indexes (the more you create the more to store)
insert/update performance (the more indexes, the longer it takes to commit since it has to update the data, then update all indexes)
Try the following... ensure index on PadID, and one on WORD. Then, by changing the order of the SELECT WHERE qualifier should optimize on the PADID of the CURRENT keyword first, then join to the others... Exclude a join to itself. Also, since you were checking on equality on the inner join to matching keywords... if the current keyword is checked for null, it should never join to a null value, thus eliminating a compare on the MATCH keywords alias as looking at every comparison as looking for NULL...
SELECT STRAIGHT_JOIN
match_keywords.padid,
COUNT(*) AS matching_words
FROM
keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
and match_keywords.padid <> 25695
WHERE
current_program_keywords.padid = 25695
AND current_program_keywords.word IS NOT NULL
GROUP BY
match_keywords.padid
ORDER BY
matching_words DESC
LIMIT
0, 11
You should index the following fields (check to what table corresponds)
match_keyword.padid
current_program_keywords.padid
match_keyword.words
current_program_keywords.words
Hope it helps accelerate