Mysql range check instead of index usage on inner join - mysql

I'm having a serious problem with MySQL (innoDB) 5.0.
A very simple SQL query is executed with a very unexpected query plan.
The query:
SELECT
SQL_NO_CACHE
mbCategory.*
FROM
MBCategory mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
MBCategory - contains 216583 rows
ResourcePermission - contains 3098354 rows.
In MBCategory I've multiple indexes (columns order as in index):
Primary (categoryId)
A (groupId,parentCategoryId,categoryId)
B (groupId,parentCategoryId)
In ResourcePermission I've multiple indexes (columns order as in index):
Primary - on some column
A (primKey).
When I look into query plan Mysql changes tables sequence and selects rows from ResourcePermission at first and then it joins the MBCategory table (crazy idea) and it takes ages. So I added STRAIGHT_JOIN to force the innodb engine to use correct table sequence:
SELECT
STRAIGHT_JOIN SQL_NO_CACHE
mbCategory.*
FROM
MBCategory
mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
But here the second problem materialzie:
In my opinion mysql should use index A (primKey) on the join operation instead it performs Range checked for each record (index map: 0x400) and it again takes ages !
Force index doesn't help, mysql still performing Range checked for each record .
There are only 23 rows in the MBCategory which fulfill where criteria, and after join there are only 75 rows.
How can I make mysql to choose correct index on this operation ?

Ok,
elementary problem.
I owe myself a beer.
The system I'm recently tunning is not a system I've developted - I've been assigned to it by my management to improve performance (originall team doesn't have knowledge on this topic).
After fee weeks of improving SQL queries, indexes, number of sql queries that are beeing executed by application I didn't check one of the most important things in this case !!
COLUMN TYPES ARE DIFFERENT !
Developer who have written than kind of code should get quite a big TALK.
Thanks for help !

I had the same problem with a different cause. I was joining a large table, and the ON clause used OR to compare the primary key (ii.itemid) to two different columns:
SELECT *
FROM share_detail sd
JOIN box_view bv ON sd.container_id = bv.id
JOIN boxes b ON b.id = bv.shared_id
JOIN item_index ii ON ii.itemid = bv.shared_id OR b.parent_itemid = ii.itemid;
Fortunately, it turned out the parent_itemid comparison was redundant, so I was able to remove it. Now the index is being used as expected. Otherwise, I was going to try splitting the item_index join into two separate joins.

Related

Optimizing INNER JOIN across multiple tables

I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.

Lengthy Query Time for MySQL

I have a mysql db with 4 tables that get updated independently. 2 of the tables will have roughly 25k to 35k rows at any given time (bhMacAcct and sv). I have to join various bits from each table.
This is the query I ended up landing on:
SELECT bhMacAcct.acct,pkg,sv.mac,sv.recData
FROM bhMacAcct,bhPkgAcct,sv
WHERE (bhMacAcct.acct = bhPkgAcct.acct) AND (bhMacAcct.mac = sv.mac);
It takes between 3-5 minutes when using this query. Is there possibly an easier way? Can this be further optimized for speed?
First, always use proper, explicit JOIN syntax and qualify all column names:
SELECT ma.acct, pkg, sv.mac, sv.recData
FROM bhMacAcct ma JOIN
bhPkgAcct pa
ON ma.acct = pa.acct JOIN
sv
ON ma.mac = sv.mac;
For this query, start with indexes on all the columns used for joins. I would recommend: bhMacAcct(acct, mac), bhPkgAcct(acct), and sv(mac, recData). I would also put pkg in the appropriate index, so the indexes cover the query.

SQL query takes too much time (3 joins)

I'm facing an issue with an SQL Query. I'm developing a php website, and to avoid making too much queries, I prefer to make a big one looking like :
select m.*, cj.*, cjb.*, me.pseudo as pseudo_acheteur
from mercato m
JOIN cartes_joueur cj
ON m.ID_carte = cj.ID_carte_joueur
JOIN cartes_joueur_base cjb
ON cj.ID_carte_joueur_base = cjb.ID_carte_joueur_base
JOIN membres me
ON me.ID_membre = cj.ID_membre
where not exists (select * from mercato_encheres me where me.ID_mercato = m.ID_mercato)
and cj.ID_membre = 2
and m.status <> 'cancelled'
ORDER BY total_carac desc, cj.level desc, cjb.nom_carte asc
This should return all cards sold by the member without any bet on it. In the result, I need all the information to display them.
Here is the approximate rows in each table :
mercato : 1200
cartes_joueur : 800 000
carte_joueur_base : 62
membres : 2000
mercato_enchere : 15 000
I tried to reduce them (in dev environment) by deleting old data; but the query still needs 10~15 seconds to execute (which is way too long on a website )
Thanks for your help.
Let's take a look.
The use of * in SELECT clauses is harmful to query performance. Why? It's wasteful. It needlessly adds to the volume of data the server must process, and in the case of JOINs, can force the processing of columns with duplicate values. If you possibly can do so, try to enumerate the columns you need.
You may not have useful indexes on your tables for accelerating this. We can't tell. Please notice that MySQL can't exploit multiple indexes in a single query, so to make a query fast you often need a well-chosen compound index. I suggest you try defining the index (ID_membre, ID_carte_jouer, ID_carte_joueur_base) on your cartes_joueur table. Why? Your query matches for equality on the first of those columns, and then uses the second and third column in ON conditions.
I have often found that writing a query with the largest table (most rows) first helps me think clearly about optimizing. In your case your largest table is cartes_jouer and you are choosing just one ID_membre value from that table. Your clearest path to optimization is the knowledge that you only need to examine approximately 400 rows from that table, not 800 000. An appropriate compound index will make that possible, and it's easiest to imagine that index's columns if the table comes first in your query.
You have a correlated subquery -- this one.
where not exists (select *
from mercato_encheres me
where me.ID_mercato = m.ID_mercato)
MySQL's query planner can be stupidly literal-minded when it sees this, running it thousands of times. In your case it's even worse: it's got SELECT * in it: see point 1 above.
It should be refactored to use the LEFT JOIN ... IS NULL pattern. Here's how that goes.
select whatever
from mercato m
JOIN ...
JOIN ...
LEFT JOIN mercato_encheres mench ON mench.ID_mercato = m.ID_mercato
WHERE mench.ID_mercato IS NULL
and ...
ORDER BY ...
Explanation: The use of LEFT JOIN rather than ordinary inner JOIN allows rows from the mercato table to be preserved in the output even when the ON condition does not match them to tables in the mercato_encheres table. The mismatching rows get NULL values for the second table. The mench.ID_mercato IS NULL condition in the WHERE clause then selects only the mismatching rows.

Mysql, PHP Laravel

I have following tables on Production with the respective counts of records,
people count= '565367'
donors count= '556325'
telerec_recipients count= '115147'
person_addresses count= '563183'
person_emails count= '106958'
person_phones count= '676474'
person_sms count= '22275'
On the UI end I want to display some data by applying some joins or left joins, and at the end by grouping, ordering and paginating I'm generating a json response.
So for achieving this I tried 2 methods, one by creating the view file of the join queries and apply where, group by and order by clauses inside controller, and second by directly firing the laravel syntax of joins in my controller.
Because I have a large data in all the tables, The join query works good but when it comes to group by statement it takes much time to execute.
Help me out in order to optimize or faster this process.
My Example Query is:
SELECT people.id as person_id,donors.donor_id,telerec_recipients.id as telerec_recipient_id,people.first_name,people.last_name,people.birth_date,
person_addresses.address_id,person_emails.email_id,person_phones.phone_id,person_sms.sms_id
FROM `people`
inner join `donors`
on `people`.`id` = `donors`.`person_id`
left join `telerec_recipients`
on `people`.`id` = `telerec_recipients`.`person_id`
left join `person_addresses`
on `people`.`id` = `person_addresses`.`person_id`
left join `person_emails`
on `people`.`id` = `person_emails`.`person_id`
left join `person_phones`
on `people`.`id` = `person_phones`.`person_id`
left join `person_sms`
on `people`.`id` = `person_sms`.`person_id`
GROUP BY `people`.`id`, `telerec_recipients`.`id`
ORDER BY `last_name` ASC LIMIT 25 offset 0;
Result of the explain:
Donors Table Indexes
The problem is that MySQL uses the wrong index on the donors table, you need to instruct MySQL using force index index hint to use the primary key instead.
Explanation
From the output of the explain it is clear that sg really goes wrong with donors table, since you can see using temporary, using filesort in the extra column.
The key column tells you that MySQL uses a unique index on the donors.donor_id field. However, donor_id field is not used anywhere in the query for joining / grouping / filtering purposes. donors.person_id field is used in the join. Since the primary key of donors table is on the person_id field, myql should use that for the join. force index index hint tells MySQL that you firmly believe that a certain index should be used.
Note 1:
You have several duplicate indexes. Remove all duplicates, they are only slowing your system own.
Note 2:
I think you query is against the sql standards because you have fields in the select list that are neither in the group by list, nor are subject of an aggregate function such as sum(), nor are functionally dependent on the grouping fields. In MySQL under certain sql mode settings such queries are allowed to run, but they may not produce entirely the same output as you would like.

MySQL indexing in an "or" statement

I have 3 tables that I need to join, these join together fine using indexes. However, we are transitioning over from using one legacy field as the identifier to another one in another table.
LEGACYID is that legacy field, while NEWID is the new field. Both fields are varchars.
Both fields are indexed exclusively with a btree index, both tables are MyISAM.
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE C.LEGACYID = '692041'
OR Q.NEWID = '692041'
This query takes 5.147 seconds, that's 5 seconds longer than I expect.
When doing an EXPLAIN EXTENDED query the index type for NEWID is ALL i.e. full table scan , possible keys are (primary,NEWID) and key(null). If I remove the LEGACYID from the Or statement, explain says key (NEWID) will now be used. If I remove NEWID from the OR statement changes occur as following:
the type of the table joins for (S,C) change from type ref to eq_ref
key_len changes from 4 to 5 (on both)
extra changes from empty to "Using where" .
With either one of the statements removed from the the OR statement the query runs at expected speeds.
Table Q has 183k records; C:115k; S:169k.
One last point. if I move the query placement:
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
LEFT JOIN TBLQ Q ON Q.SHPID = S.SHPID
AND Q.NEWID = '692041'
WHERE C.LEGACYID = '692041'
Although its not the same query, for the way the data works, it will provide the results I need, and the speed is down to under a .1 of a second again.
I did want to clarify that I don't really need a query that works solution. Thanks to Ponies below that already has provided one. What I need to know is if anyone else has run into this problem and can explain why this is happening and what I can do for this simple or statement to use both indexes.
If you know there won't be duplicates, change UNION to UNION ALL (UNION ALL is faster because it doesn't remove duplicates). Otherwise, use:
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE C.LEGACYID = '692041'
UNION
SELECT Username
FROM CUST C use index(primary,NEWID)
JOIN TBLSHP S ON S.CUSID = C.CUSID
JOIN TBLQ Q ON Q.SHPID = S.SHPID
WHERE Q.NEWID = '692041'
ORs are notoriously bad performers, because it splinters the execution path. The UNION alleviates that splintering, and the combines the two results sets. That said, IN is preferable to ORs because though being logically the same, the execution of IN is generally more optimized.
UNION isn't always the answer
Investigate many options, comparing the EXPLAIN PLAN output before determining a solution. I've come across a couple recently that perform better using a cursor than a single query using esoteric functionality.
Also, make sure foreign key columns (what you're using in the ON clause when JOINing) are indexed. MySQL has started (v5.5+?) to automatically do this when a foreign key constraint is made, but that's only for InnoDB tables.