Mysql not using right index - mysql

I have a framework that generate SQL. one of the query is using my index "A" and return results in 7 seconds. I see that I can optimize this and I created an index "B".
now if I run "explain my query", it still use my index A. however, if I force the use of index B, I get my results in 1 seconds (7x faster)
so clearly my index B is faster than my index A. I can't use the "force index" or "use index" command as my sql is generated from a framework that does not support this.
So, Why is mysql not naturally using the fastest index. And is there a way I can tell mysql to always use a certain index without adding "use" or "force".
the query :
SELECT *
FROM soumission
LEFT OUTER JOIN region_administrative
ON soumission.region_administrative_oid=region_administrative.oid
WHERE (soumission.statut=2
AND ((soumission.telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR (soumission.autre_telephone LIKE '%007195155134070067132211046052045128049212213255%'))
OR (soumission.cellulaire LIKE '%007195155134070067132211046052045128049212213255%')))
ORDER BY soumission.date_confirmation DESC, soumission.numero;
i added an index on multiple column "statut","telephone","autre_telephone","cellulaire"
if I force using this index my query is 7x faster but if I dont specify which index to use, it use another index (only on statut field) which is 7x slower
here is the explain if I select a large date period (using the wrong index)
here is When I select a small date window

This seems to be what you are doing...
SELECT s.*, ra.*
FROM soumission AS s
LEFT OUTER JOIN region_administrative AS ra ON s.region_administrative_oid=ra.oid
WHERE s.statut = 2
AND ( s.telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR s.autre_telephone LIKE '%007195155134070067132211046052045128049212213255%'
OR s.cellulaire LIKE '%007195155134070067132211046052045128049212213255%'
)
ORDER BY s.date_confirmation DESC, s.numero;
If you don't need ra.*, get rid of the LEFT JOIN.
The multi-column index you propose is useless and won't be used unless... statut = 2 for less than 20% of the rows. In that case, it will only use the first column of the index.
OR defeats indexing. (See below)
Leading wildcard on LIKE defeats indexing. Do you need the leading or trailing wild cards?
The mixing of DESC and ASC in the ORDER BY defeats using an index to avoid sorting.
So, what to do? Instead of having 3 columns for exactly 3 phone numbers, have another table for phone numbers. Then have any number of rows for a given soumission. Then searching that table may be faster because of avoiding OR -- but only if you get rid the leading wildcard.
(That's an awfully long phone number! Is it real?)

As to the query itself:
Try avoiding the leading LIKE wildcard (removed in the query below).
Split the query to several parts, combined with a UNION clause, so that indexes can be used.
So, create these indexes:
ALTER TABLE `region_administrative` ADD INDEX `region_administrativ_idx_oid` (`oid`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_cellulaire` (`statut`,`region_administrative_oid`,`cellulaire`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_telephone` (`statut`,`region_administrative_oid`,`autre_telephone`);
ALTER TABLE `soumission` ADD INDEX `soumission_idx_statut_oid_telephone` (`statut`,`region_administrative_oid`,`telephone`);
Then try this query:
SELECT
*
FROM
((SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(
soumission.statut = 2
AND (
(
soumission.cellulaire LIKE '007195155134070067132211046052045128049212213255%'
)
)
)
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
UNION
DISTINCT (SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(soumission.statut = 2
AND (((soumission.autre_telephone LIKE '007195155134070067132211046052045128049212213255%'))))
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
UNION
DISTINCT (SELECT
*
FROM
soumission
LEFT OUTER JOIN
region_administrative
ON soumission.region_administrative_oid = region_administrative.oid
WHERE
(soumission.statut = 2
AND ((soumission.telephone LIKE '007195155134070067132211046052045128049212213255%')))
ORDER BY
soumission.date_confirmation DESC,
soumission.numero)
) AS union1
ORDER BY
union1.date_confirmation DESC,
union1.numero

Related

Do queries from subqueried tables get optimized?

About query optimizations, I'm wondering if statements like one below get optimized:
select *
from (
select *
from table1 t1
join table2 t2 using (entity_id)
order by t2.sort_order, t1.name
) as foo -- main query of object
where foo.name = ?; -- inserted
Consider that the query is taken care by a dependency object but just (rightly?) allows one to tack in a WHERE condition. I'm thinking that at least not a lot of data gets pulled in to your favorite language, but I'm having second thoughts if that's an adequate optimization and maybe the database is still taking some time going through the query.
Or is it better to take that query out and write a separate query method that has the where and maybe a LIMIT 1 clause, too?
In MySQL, no.
The predicate in an outer query does not get "pushed" down into the inline view query.
The query in the inline view is processed first, independent of the outer query. (MySQL will optimize that view query just like it would optimize that query if you submitted that separately.)
The way that MySQL processes this query: the inline view query gets run first, the result is materialized as a 'derived table'. That is, the result set from that query gets stored as a temporary table, in memory in some cases (if it's small enough, and doesn't contain any columns that aren't supported by the MEMORY engine. Otherwise, it's spun out to disk with as a MyISAM table, using the MyISAM storage engine.
Once the derived table is populated, then the outer query runs.
(Note that the derived table does not have any indexes on it. That's true in MySQL versions before 5.6; I think there are some improvements in 5.6 where MySQL will actually create an index.
Clarification: indexes on derived tables: As of MySQL 5.6.3 "During query execution, the optimizer may add an index to a derived table to speed up row retrieval from it." Reference: http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
Also, I don't think MySQL "optimizes out" any unneeded columns from the inline view. If the inline view query is a SELECT *, then all of the columns will be represented in the derived table, whether those are referenced in the outer query or not.
This can lead to some significant performance issues, especially when we don't understand how MySQL processes a statement. (And the way that MySQL processes a statement is significantly different from other relational databases, like Oracle and SQL Server.)
You may have heard a recommendation to "avoid using views in MySQL". The reasoning behind this general advice (which applies to both "stored" views and "inline" views) is the significant performance issues that can be unnecessarily introduced.
As an example, for this query:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
) q
WHERE q.id = 42
MySQL does not "push" the predicate id=42 down into the view definition. MySQL first runs the inline view query, and essentially creates a copy of huge_table, as an un-indexed MyISAM table. Once that is done, then the outer query will scan the copy of the table, to locate the rows satisfying the predicate.
If we instead re-write the query to "push" the predicate into the view definition, like this:
SELECT q.name
FROM ( SELECT h.*
FROM huge_table h
WHERE h.id = 42
) q
We expect a much smaller resultset to be returned from the view query, and the derived table should be much smaller. MySQL will also be able to make effective use of an index ON huge_table (id). But there's still some overhead associated with materializing the derived table.
If we eliminate the unnecessary columns from the view definition, that can be more efficient (especially if there are a lot of columns, there are any large columns, or any columns with datatypes not supported by the MEMORY engine):
SELECT q.name
FROM ( SELECT h.name
FROM huge_table h
WHERE h.id = 42
) q
And it would be even more efficient to eliminate the inline view entirely:
SELECT q.name
FROM huge_table q
WHERE q.id
I can't speak for MySQL - not to mention the fact that it probably varies by storage engine and MySQL version, but for PostgreSQL:
PostgreSQL will flatten this into a single query. The inner ORDER BY isn't a problem, because adding or removing a predicate cannot affect the ordering of the remaining rows.
It'll get flattened to:
select *
from table1 t1
join table2 t2 using (entity_id)
where foo.name = ?
order by t2.sort_order, t1.name;
then the join predicate will get internally converted, producing a plan corresponding to the SQL:
select t1.col1, t1.col2, ..., t2.col1, t2.col2, ...
from table1 t1, table2 t2
where
t1.entity_id = t2.entity_id
and foo.name = ?
order by t2.sort_order, t1.name;
Example with a simplified schema:
regress=> CREATE TABLE demo1 (id integer primary key, whatever integer not null);
CREATE TABLE
regress=> INSERT INTO demo1 (id, whatever) SELECT x, x FROM generate_series(1,100) x;
INSERT 0 100
regress=> EXPLAIN SELECT *
FROM (
SELECT *
FROM demo1
ORDER BY id
) derived
WHERE whatever % 10 = 0;
QUERY PLAN
-----------------------------------------------------------
Sort (cost=2.51..2.51 rows=1 width=8)
Sort Key: demo1.id
-> Seq Scan on demo1 (cost=0.00..2.50 rows=1 width=8)
Filter: ((whatever % 10) = 0)
Planning time: 0.173 ms
(5 rows)
... which is the same plan as:
EXPLAIN SELECT *
FROM demo1
WHERE whatever % 10 = 0
ORDER BY id;
QUERY PLAN
-----------------------------------------------------------
Sort (cost=2.51..2.51 rows=1 width=8)
Sort Key: id
-> Seq Scan on demo1 (cost=0.00..2.50 rows=1 width=8)
Filter: ((whatever % 10) = 0)
Planning time: 0.159 ms
(5 rows)
If there was a LIMIT, OFFSET, a window function, or certain other things that prevent qualifier push-down/pull-up/flattening in the inner query then PostgreSQL would recognise that it can't safely flatten it. It'd evaluate the inner query either by materializing it or by iterating over its output and feeding that to the outer query.
The same applies for a view. PostgreSQL will in-line and flatten views into the containing query where it is safe to do so.

Mysql left join very slow

I have a left join:
$query = "SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `#__document_management_documents` AS a
LEFT JOIN `#__document_managment_tags` AS b
ON a.id = b.documentid
".$tagexplode."
".$issueDateText."
AND a.committee in (".$committeeQueryTextExplode.")
AND a.documenttitle LIKE '".$documentNameFilter."%'
GROUP BY a.id ORDER BY a.documenttitle ASC
";
It's really slow abaout 7 seconds on 4000 records
Any ideas what I might be doing wrong
SELECT a.`id`, a.`documenttitle`, a.`committee`, a.`issuedate`, b.`tagname`
FROM `w4c_document_management_documents` AS a
LEFT JOIN `document_managment_tags` AS b
ON a.id = b.documentid WHERE a.issuedate >= ''
AND a.committee in ('1','8','9','10','11','12','13','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47')
AND a.documenttitle LIKE '%' GROUP BY a.id ORDER BY a.documenttitle ASC
I would put an index on a.committee, and full text index the doctitle col. The IN and LIKE are immediate flags to me. Issue date should also have an index because you are >= it
Try running the following commands in a MySQL client:
show index from #__document_management_documents;
show index from #_document_management_tags;
Check to see if there are keys/indexes on the id and documentid fields from the respective tables. If there aren't, MySQL will be doing a full table scan to lookup the values. Creating indexes on these fields makes the search time logarithmic, because it sorts them in a binary tree which is stored in the index file. Even better is to use primary keys (if possible), because that way the row data is stored in the leaf, which saves MySQL another I/O operation to lookup the data.
It could also simply be that the IN and >= operators have bad performance, in which case you might have to rewrite your queries or redesign your tables.
As mentioned above, try to find if your columns have index. You can even do "EXPLAIN" command in your MySQL client at the start of your query to see if the query is actually using indexes. You will see in the 'key' columns and 'Extra' column. Get more information here
This will help you optimize your query. Also group by causes using temporary and filesort which causes MySQL to create a temporary table and going through each rows. If you could use PHP to group by it would be faster.

SQL Specific Statement Query

I am doing quite a large query to a database and for some reason it is returning many results that do not match any of my search terms. It also seems to duplicate my results so I get the same SQL item 16 times. Any ideas why?
SELECT a.*
FROM
j20e8_datsogallery AS a,
j20e8_datsogallery_tags AS t
WHERE
(a.id LIKE "%bear%" OR
a.imgtitle LIKE "%bear%" OR
a.imgtext LIKE "%bear%" OR
a.imgauthor LIKE "%bear%" OR
t.tag LIKE "%bear%")
ORDER BY a.id DESC
LIMIT 0, 16
I think it maybe something to do with the LIKE %term% section but cannot get it working at all.
I'd make sure you qualify your join. Otherwise you'll end up with a full join, or worse, a Cartesian product from a cross join. Something along these lines:
SELECT a.*
FROM
j20e8_datsogallery AS a
JOIN j20e8_datsogallery_tags AS t ON a.ID = t.GalleryID
WHERE
...
ORDER BY a.id DESC
LIMIT 0, 16
Also, consider using a FULLTEXT INDEX ... it could combine all those columns into a single index, and would make searching all of them quite functional.
A FULLTEXT INDEX in MySql can be used to 'combine' several different columns into one big pile of text, which you can then MATCH() columns AGAINST search terms.
To create a FULLTEXT INDEX, you can simply use the CREATE INDEX syntax documented here.
CREATE FULLTEXT INDEX FDX_datsogallery
ON j20e3_datsogallery ( id, imgtitle, imgtext, imgauthor )
You can then use it in a query with the MATCH() ... AGAINST statements, which are documented here:
SELECT a.*
FROM j20e8_datsogallery AS a
WHERE MATCH( id, imgtitle, imgtext, imgauthor ) AGAINST( 'bear' )
It's bringing back multiples because:
SELECT a.*
FROM j20e8_datsogallery AS a, j20e8_datsogallery_tags AS t
brings back every combination of records from the two tables on it's own. So bear in one table joins to every record in the other table.
You need to specify a relationship between the tables, preferably using an explicit JOIN
You have a cross join between the two tables, which means every row in a will be joined with every row in t, and as I said in my comment, you will be getting every record that has bear in one of those fields.
you should have a join condition somewhere. Then do your filtering.
Your results are a cartesian product of your data, because you don't have a join condition. This means that it is returning every combination of matching rows from a and t.
You probably need to so something like this:
SELECT a.*
FROM
j20e8_datsogallery AS a,
INNER JOIN j20e8_datsogallery_tags AS t ON a.id = t.a_id --(or whatever the foreign key is)
WHERE
(a.id LIKE "%bear%" OR
a.imgtitle LIKE "%bear%" OR
a.imgtext LIKE "%bear%" OR
a.imgauthor LIKE "%bear%" OR
t.tag LIKE "%bear%")
ORDER BY a.id DESC
LIMIT 0, 16

mysql query not using the index i want

I have the following query left joining 2 tables :
explain
select
n.* from npi n left join npi_taxonomy nt on n.NPI=nt.NPI_CODE
where
n.Provider_First_Name like '%s%' and
n.Provider_Last_Name like '%b%' and
n.Provider_Business_Practice_Location_Address_State_Name = 'SC' and
n.Provider_Business_Practice_Location_Address_City_Name = 'charleston' and
n.Provider_Business_Practice_Location_Address_Postal_Code in (29001,29003,29010,29016,29018,29020,29030,29032,29033,29038,29039,29040,29041,29042,29044,29045,29046,29047,29048,29051,29052,29053,29056,29059,29061,29062,29069,29071,29072,29073,29078,29079,29080,29081,29082,29102,29104,29107,29111,29112,29113,29114,29115,29116,29117,29118,29123,29125,29128,29133,29135,29137,29142,29143,29146,29147,29148,29150,29151,29152,29153,29154,29160,29161,29162,29163,29164,29168,29169,29170,29171,29172,29201,29202,29203,29204,29205,29206,29207,29208,29209,29210,29212,29214,29215,29216,29217,29218,29219,29220,29221,29222,29223,29224,29225,29226,29227,29228,29229,29230,29240,29250,29260,29290,29292,29401,29402,29403,29404,29405,29406,29407,29409) and
n.Entity_Type_Code = 1 and
nt.Healthcare_Provider_Taxonomy_Code in ('101Y00000X')
limit 0,10;
I have added a multi-column index :
npi_fname_lname_state_city_zip_entity on the table npi which indexes the columns in the following order :
NPI,
Provider_First_Name,
Provider_First_Name,
Provider_Business_Practice_Location_Address_State_Name, Provider_Business_Practice_Location_Address_City_Name, Provider_Business_Practice_Location_Address_Postal_Code,
Entity_Type_Code
However, when i do an explain on the query, it shows me that it uses the primary index (NPI). Also, it says rows examined = 1
What's worse is : the query takes roughly 120 seconds to execute. How do i optimize this ?
I would really appreciate some help regarding this.
The reason why your multi column index doesn't help, is because you are filtering with a wild card like '%s%'.
Indexes can only be used when filtering using the left most prefix of the index, which means that 1) cannot do a contains search, and 2) if the left most column of the multi column index cannot be used, the other columns in the index cannot be used aswell.
You should switch the order of the columns in the index to
Provider_Business_Practice_Location_Address_State_Name,
Provider_Business_Practice_Location_Address_City_Name,
Provider_Business_Practice_Location_Address_Postal_Code,
Entity_Type_Code
That way MySql will only scan the rows that match those the criteria for those columns (SC, charleston etc).
Alternatively, look into full text indexes.

I'm not sure if I have the correct indexes or if I can improve the speed of my query in MySQL?

My query has a join, and it looks like it's using two indexes which makes it more complicated. I'm not sure if I can improve on this, but I thought I'd ask.
The query produces a list of records with similar keywords the record being queried.
Here's my query.
SELECT match_keywords.padid,
COUNT(match_keywords.word) AS matching_words
FROM keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
WHERE match_keywords.word IS NOT NULL
AND current_program_keywords.padid = 25695
GROUP BY match_keywords.padid
ORDER BY matching_words DESC
LIMIT 0, 11
The EXPLAIN
Word is varchar(40).
You can start by trying to remove the IS NOT NULL test, which is implicitly removed by COUNT on the field. It also looks like you would want to omit 25695 from match_keywords, otherwise 25695 (or other) would surely show up as the "best" match within your 11 row limit?
SELECT match_keywords.padid,
COUNT(match_keywords.word) AS matching_words
FROM keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
WHERE current_program_keywords.padid = 25695
GROUP BY match_keywords.padid
ORDER BY matching_words DESC
LIMIT 0, 11
Next, consider how you would do it as a person.
You would to start with a padid (25695) and retrieve all the words for that padid
From those list of words, go back into the table again and for each matching word,
get their padid's (assumed to have no duplicate on padid + word)
group the padid's together and count them
order the counts and return the highest 11
With your list of 3 separate single-column indexes, the first two steps (both involve only 2 columns) will always have to jump from index back to data to get the other column. Covering indexes may help here - create two composite indexes to test
create index ix_keyword_pw on keyword(padid, word);
create index ix_keyword_wp on keyword(word, padid);
With these composite indexes in place, you can remove the single-column indexes on padid and word since they are covered by these two.
Note: You always have to temper SELECT performance against
size of indexes (the more you create the more to store)
insert/update performance (the more indexes, the longer it takes to commit since it has to update the data, then update all indexes)
Try the following... ensure index on PadID, and one on WORD. Then, by changing the order of the SELECT WHERE qualifier should optimize on the PADID of the CURRENT keyword first, then join to the others... Exclude a join to itself. Also, since you were checking on equality on the inner join to matching keywords... if the current keyword is checked for null, it should never join to a null value, thus eliminating a compare on the MATCH keywords alias as looking at every comparison as looking for NULL...
SELECT STRAIGHT_JOIN
match_keywords.padid,
COUNT(*) AS matching_words
FROM
keywords current_program_keywords
INNER JOIN keywords match_keywords
ON match_keywords.word = current_program_keywords.word
and match_keywords.padid <> 25695
WHERE
current_program_keywords.padid = 25695
AND current_program_keywords.word IS NOT NULL
GROUP BY
match_keywords.padid
ORDER BY
matching_words DESC
LIMIT
0, 11
You should index the following fields (check to what table corresponds)
match_keyword.padid
current_program_keywords.padid
match_keyword.words
current_program_keywords.words
Hope it helps accelerate