MySQL: Usage of indices in UNION subselects - mysql

In MySQL 5.0.75-0ubuntu10.2 I've got a fixed table layout like that:
Table parent with an id
Table parent2 with an id
Table children1 with a parentId
CREATE TABLE `Parent` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Parent2` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(200) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `Children1` (
`id` int(11) NOT NULL auto_increment,
`parentId` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `parent` (`parentId`)
) ENGINE=InnoDB
A children has a parent in one of the tables Parent or Parent2. When I need to get a children I use a query like that:
select * from Children1 c
inner join (
select id as parentId from Parent
union
select id as parentId from Parent2
) p on p.parentId = c.parentId
Explaining this query yields:
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| 3 | UNION | Parent2 | index | NULL | PRIMARY | 4 | NULL | 1 | Using index |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+---------------+---------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
which is reasonable given the layout.
Now the problem: The previous query is somewhat useless, since it returns no columns from the parent elements. In the moment I add more columns to the inner query no index will be used anymore:
mysql> explain select * from Children1 c inner join ( select id as parentId,name from Parent union select id as parentId,name from Parent2 ) p on p.parentId = c.parentId;
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED | Parent | ALL | NULL | NULL | NULL | NULL | 1 | |
| 3 | UNION | Parent2 | ALL | NULL | NULL | NULL | NULL | 1 | |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+------+---------------+------+---------+------+------+-----------------------------------------------------+
4 rows in set (0.00 sec)
Can anyone explain why the (PRIMARY) indices are not used any more? Is there a workaround for this problem if possible without having to change the DB layout?
Thanks!

I think that the optimizer falls down once you start pulling out multiple columns in the derived query because of the possibility that it would need to convert data types on the union (not in this case, but in general). It may also be due to the fact that your query essentially wants to be a correlated derived subquery, which isn't possible (from dev.mysql.com):
Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
What you are trying to do (but isn't valid) is:
select * from Children1 c
inner join (
select id as parentId from Parent where Parent.id = c.parentId
union
select id as parentId from Parent2 where Parent.id = c.parentId
) p
Result: "Unknown column 'c.parentId' in 'where clause'.
Is there a reason you don't prefer two left joins and IFNULLs:
select *, IFNULL(p1.name, p2.name) AS name from Children1 c
left join Parent p1 ON p1.id = c.parentId
left join Parent2 p2 ON p2.id = c.parentId
The only difference between the queries is that in yours you'll get two rows if there is a parent in each table. If that's what you want/need then this will work well also and joins will be fast and always make use of the indexes:
(select * from Children1 c join Parent p1 ON p1.id = c.parentId)
union
(select * from Children1 c join Parent2 p2 ON p2.id = c.parentId)

My first thought is to insert a "significant" number of records in the tables and use ANALYZE TABLE to update the statistics. A table with 4 records will always be faster to read using a full scan rather then going via the index!
Further, you can try USE INDEX to force the usage of the index and look how the plan changes.
I will also recomend reading this documentation and see which bits are relevant
MYSQL::Optimizing Queries with EXPLAIN
This article can also be useful
7 ways to convince MySQL to use the right index

Related

MySQL 8 is not using INDEX when subquery has a group column

We have just moved from mariadb 5.5 to MySQL 8 and some of the update queries have suddenly become slow. On more investigation, we found that MySQL 8 does not use index when the subquery has group column.
For example, below is a sample database. Table users maintain the current balance of the users per type and table 'accounts' maintain the total balance history per day.
CREATE DATABASE 'test';
CREATE TABLE `users` (
`uid` int(10) unsigned NOT NULL DEFAULT '0',
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`type` int(10) unsigned NOT NULL DEFAULT '0',
KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `accounts` (
`uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`balance` int(10) unsigned NOT NULL DEFAULT '0',
`day` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`uid`),
KEY `day` (`day`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Below is a explanation for the query to update accounts
mysql> explain update accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users) r
on r.uid=a.uid and r.day=a.day set a.balance=r.balance;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | UPDATE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
As you can see, mysql is not using index.
On more investigation, I found that if I remove sum() from the subquery, it starts using index. However, that's not the case with mariadb 5.5 which was correctly using the index in all the case.
Below are two select queries with and without sum(). I've used select query to cross check with mariadb 5.5 since 5.5 does not have explanation for update queries.
mysql> explain select * from accounts a inner join (
select uid, balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| 1 | SIMPLE | a | NULL | ref | PRIMARY,day | day | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | users | NULL | eq_ref | PRIMARY | PRIMARY | 4 | test.a.uid | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)
and with sum()
mysql> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | no matching row in const table |
| 2 | DERIVED | users | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)
Below is output from mariadb 5.5
MariaDB [test]> explain select * from accounts a inner join (
select uid, sum(balance) balance, day(current_date()) day from users
) r on r.uid=a.uid and r.day=a.day ;
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| 1 | PRIMARY | a | ALL | PRIMARY,day | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 10 | test.a.uid,test.a.day | 2 | Using where |
| 2 | DERIVED | users | ALL | NULL | NULL | NULL | NULL | 1 | |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
3 rows in set (0.00 sec)
Any idea what are we doing wrong?
As others have commented, break your update query apart...
update accounts join
then your query
on condition of the join.
Your inner select query of
select uid, sum(balance) balance, day(current_date()) day from users
is the only thing that is running, getting some ID and the sum of all balances and whatever the current day. You never know which user is getting updated, let alone the correct amount. Start by getting your query to see your expected results per user ID. Although the context does not make sense that your users table has a "uid", but no primary key thus IMPLYING there is multiple records for the same "uid". The accounts (to me) implies ex: I am a bank representative and sign up multiple user accounts. Thus my active portfolio of client balances on a given day is the sum from users table.
Having said that, lets look at getting that answer
select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid
This will show you per user what their total balance is as of right now. The group by now gives you the "ID" key to tie back to the accounts table. In MySQL, the syntax of a correlated update for this scenario would be... (I am using above query and giving alias "PQ" for PreQuery for the join)
update accounts a
JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = PQ.allUserBalance,
Day = day( current_date())
Now, the above will not give a proper answer if you have accounts that no longer have user entries associated... such as all users get out. So, whatever accounts have no users, their balance and day record will be as of some prior day. To fix this, you could to a LEFT-JOIN such as
update accounts a
LEFT JOIN
( select
u.uid,
sum( u.balance ) allUserBalance
from
users u
group by
u.uid ) PQ
-- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
on a.uid = PQ.uid
-- NOW you can SET the values
set Balance = coalesce( PQ.allUserBalance, 0 ),
Day = day( current_date())
With the left-join and COALESCE(), if there is no record summation in the user table, it will set the account balance to zero.

Outside query of subqueries is extremely slow (Mysql)

I have a aggregate query with two levels deep subqueries. What is strange is that the two subqueries run acceptably fast but the outside query unacceptably slow.
The basic idea behind the query is to use a table to find all elements linked to a key, selected by one of the elements queries. This resultant set should then be provided to the outside query that will match it according to its own keys/indexes.
Here with all outputs and statements:
We start with the two table definitions
CREATE TABLE `table1` (
`id1` int(11) NOT NULL DEFAULT '0',
`id2` int(11) NOT NULL,
`value` int(11) DEFAULT '0',
PRIMARY KEY (`id1`,`id2`),
KEY `k_id1` (`id1`),
KEY `k_id2` (`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `lookuptable1` (
`id3` int(11) NOT NULL,
`id4` int(11) NOT NULL,
PRIMARY KEY (`id3`,`id4`),
UNIQUE KEY `id4_idx` (`id4`),
KEY `id3_idx` (`id3`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The inside subquery with it's own subquery
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
+-----------+
| id4 |
+-----------+
| 5960 |
| 17215 |
| 3625734 |
| 9312798 |
+-----------+
4 rows in set (0.00 sec)
As you can see: Fast enough.
But the outside query is where the bad bottleneck lies.
Complete query
SELECT
t1.id1,
sum(t1.value)
FROM table1 t1
WHERE t1.id2 = 3 AND t1.id1 IN
(
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
);
+-----------+-----------------------+
| id 1. | sum(t1.value) |
+-----------+-----------------------+
| 9312798 | 0 |
+-----------+-----------------------+
1 row in set (8.01 sec)
That is 8 seconds too slow
herewith the Explain extended for this query:
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
| 1 | PRIMARY | t1 | index | NULL. | PRIMARY | 8 | NULL. | 1454343 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | lt1 | eq_ref | PRIMARY,id3,id4 | PRIMARY | 8 | const,func | 1 | 100.00 | Using where; Using index |
| 3 | SUBQUERY | pt1 | const | id4 | id4_idx | 4 | | 1 | 100.00 | Using index |
+----+--------------------+-------+--------+-------------------+-------------+---------+------------+---------+----------+--------------------------+
As I understand from this, the outside query doesn't actually use the index that it could.
What could we possibly be doing wrong in this query. Surely it should be running much much faster.
I tried running the outside query with the subqueries' result copy-pasted inside the IN clause (in other words the subqueries aren't run. It runs normally fast. Here's the explain extended then:
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | range | PRIMARY,k_id1 | PRIMARY | 4 | NULL | 5 | 100.00 | Using where |
+----+-------------+-------+-------+----------------+---------+---------+------+------+----------+-------------+
Oh yeah. This is running on MySQL 5.5
you could avoid the IN clause using an inner join
SELECT
t1.id1,
sum(t1.value)
FROM table1 t1
INNER JOIN (
SELECT lt1.id4
FROM lookuptable1 lt1
WHERE lt1.id3 = (SELECT pt1.id3
FROM lookuptable1 pt1
WHERE pt1.id4 = 5960)
) t on t.id4 = t1.id1 and t1.id2 = 3
and this could improve your query ..
be sure you have a proper index on table1 (id1, id2)

Need help to improve MYSQL SubQuery Performance

I just learning MYSQL, I have MySql subquery like this:
EXPLAIN EXTENDED SELECT brand_name, stars, hh_stock, hh_stock_value, sales_monthly_1, sales_monthly_2, sales_monthly_3, sold_monthly_1, sold_monthly_2,
sold_monthly_3, price_uvp, price_ecp, price_default, price_margin AS margin, vc_percent as vc, cogs, products_length, products_id, material_expenses,
MAX(price) AS products_price, SUM(total_sales) AS total_sales,
IFNULL(MAX(active_age), DATEDIFF(NOW(), products_date_added)) AS products_age, DATEDIFF(NOW(), products_date_added) AS jng_products_age,
AVG(sales_weekly) AS sales_weekly, AVG(sales_monthly) AS sales_monthly, SUM(total_sold) AS total_sold, SUM(total_returned) AS total_returned,
((SUM(total_returned)/SUM(total_sold)) * 100) AS returned_rate
FROM
(
SELECT p.products_id, jc.price, jc.price_end_customer AS price_ecp, jc.total_sales, jc.active_age, jc.sales_weekly,
jc.sales_monthly, jc.total_sold, jc.total_returned, jc.price_uvp, p.price_margin, p.vc_percent, p.material_expenses,
p.products_date_added, p.stars , pb.brand_name, p.family_id, p.products_price_default AS price_default, pl.sales_monthly_1,
pl.sales_monthly_2, pl.sales_monthly_3, pl.sold_monthly_1, pl.sold_monthly_2, pl.sold_monthly_3, pst.stock AS hh_stock,
(pst.stock * p.average_stock_value) AS hh_stock_value, pnc.products_length,
IF(ploc.cogs IS NULL OR ploc.cogs=0,
(CASE p.complexity
WHEN 'F' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2),2)
WHEN 'E' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+1.7+0.25+2.2),2)
WHEN 'N' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2),2)
WHEN 'M' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+2.4+0.25+2.2),2)
WHEN 'I' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2),2)
WHEN 'H' THEN ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+3.5+0.25+2.2),2)
ELSE ROUND(5*(p.material_expenses+(7.5/100*p.material_expenses)+5+0.25+2.2)/100+(p.material_expenses+(7.5/100*p.material_expenses)+5+0.25+2.2),2) END), ploc.cogs) AS cogs
FROM products p
LEFT JOIN jng_sp_catalog jc ON jc.products_id=p.products_id
LEFT JOIN products_description pd ON pd.products_id = p.products_id AND pd.language_id = 2
LEFT JOIN products_description2 pd2 ON pd2.products_id = p.products_id
LEFT JOIN products_brand pb ON pb.products_brand_id = p.products_brand_id
LEFT JOIN products_log pl ON pl.products_id = p.products_id
LEFT JOIN products_log_static pls ON pls.products_id=p.products_id
LEFT JOIN products_local ploc ON ploc.products_id = p.products_id
LEFT JOIN products_non_configurator pnc ON pnc.products_id = p.products_id
INNER JOIN
(
SELECT shp.products_id, CONCAT(',', GROUP_CONCAT(shp.styles_id), ',') AS styles_id
FROM styles_has_products shp GROUP BY shp.products_id HAVING styles_id NOT LIKE '%,1967,%') subquery_styles ON subquery_styles.products_id = p.products_id
LEFT JOIN products_stock_temp pst ON pst.products_id=p.products_id WHERE p.active_status='1' AND p.categories_top_id = '1') dt GROUP BY products_id ORDER BY products_id;
The result of explain is like this:
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
| 1 | PRIMARY | p | NULL | ALL | PRIMARY | NULL | NULL | NULL | 40458 | 1.00 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | pb | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_brand_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | ploc | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | pl | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | pls | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | pst | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | NULL |
| 1 | PRIMARY | pd2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using index |
| 1 | PRIMARY | pnc | NULL | eq_ref | PRIMARY | PRIMARY | 4 | manobo_central.p.products_id | 1 | 100.00 | Using where |
| 1 | PRIMARY | pd | NULL | eq_ref | PRIMARY | PRIMARY | 8 | manobo_central.p.products_id,const | 1 | 100.00 | Using index |
| 1 | PRIMARY | jc | NULL | ref | products_id | products_id | 4 | manobo_central.p.products_id | 4 | 100.00 | Using where |
| 1 | PRIMARY | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 4 | manobo_central.p.products_id | 10 | 100.00 | Using where |
| 3 | DERIVED | shp | NULL | index | PRIMARY,products_id | PRIMARY | 8 | NULL | 208226 | 100.00 | Using index; Using filesort |
+----+-------------+------------+------------+--------+---------------------+-------------+---------+------------------------------------+--------+----------+----------------------------------------------+
I have options in mind.
I will drop subquery and use VIEWS to output the data just like using query. Because i have subquery in FROM, so i will use VIEWS from VIEWS. But some said it will affected in performances. How you guys think about this?
I will still using subquery, but will try and search how to optimize the query. For this one, i wanted to ask you guys, for the first result row in EXPLAIN TABLE, It shows table production p which the type 'all', how to avoid 'all' ? I've managed to use type 'eq_ref' for others table, but still have no clue why the product table is 'all'?
Again,
Do you think i need to switch to VIEW? Or just try to optimise again the subquery.
Many Thanks!
EDIT: table products index
create index family_id on products (family_id);
create index idx_products_date_added on products (products_date_added);
create index material_expenses on products (material_expenses);
create index products_brand_id on products (products_brand_id);
create index products_ean on products (products_ean);
create index products_status on products (products_status);
create index tb_status on products (tb_status);
EDIT: table style_has_products
CREATE TABLE `styles_has_products` (
`styles_id` int(10) unsigned NOT NULL DEFAULT '0',
`products_id` int(10) unsigned NOT NULL DEFAULT '0',
`date_added` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`styles_id`,`products_id`),
KEY `products_id` (`products_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
first and foremost never write such a complex query for real time use. i'll suggest do batch process and maintain data warehouse. and use real time query on data warehouse.
still there are many things you should not do to SQL query on real time use to get performance. like never use more join operation, never put more if else conditions , never apply group by especially if table is huge, look for proper index , partition structure in table.
The first thing I notice is your subquery_styles. You don't use its result other than to filter. Criteria, however, belongs in the WHERE clause in my opinion. As it seems you want to exclude products for which exists a style_id 1967, I'd use NOT EXISTS or NOT IN:
WHERE p.active_status = 1
AND p.categories_top_id = 1
AND p.products_id NOT IN
(
SELECT products_id
FROM styles_has_products
WHERE styles_id = 1967
)
The second thing is that there is no appropriate index for your query. You are selecting products with active_status 1 and categories_top_id 1, but there is no index on these columns. With the third condition on product_id not matching style_id 1967, I'd suggest one of the following indexes:
create index idx1 on products (active_status, categories_top_id, products_id);
create index idx2 on products (categories_top_id, active_status, products_id);
Create both, see which is being used, and drop the other.
A last point that can and maybe should be optimized/changed is your aggregation. But in order to help here, I must know the table's unique keys. As soon as you post them, I'll extend this answer :-)
Building on what Thorsten suggests, instead of NOT IN ( SELECT ), use
NOT EXISTS( SELECT * FROM styles_has_products
WHERE products_id = p.products_id
AND styles_id = 1967 )
styles_has_products needs INDEX(products_id, styles_id) in either order.
Please show us SHOW CREATE TABLE styles_has_products. If it is a many:many mapping table, then see the tips here .
Indexes need to be on the table you are going into, not coming from. So the list of indexes for products probably won't be used. This composite index may be useful:
INDEX(categories_top_id, active_status) -- in either order
VIEWs are just syntactic sugar; they do not inherently provide any performance benefit. In some situations they hurt performance.
pd, pd2, pls, and some others, are not used; remove their JOINs.
The SUMs and AVGs will probably be incorrect. This is because of the "explode-implode" happening with JOIN + GROUP BY. Cleanup some of the other stuff, then we can discuss how to rearrange things so the the SUMs and AVGs are done with only one row per product_id.

Query takes too much time with JOIN

I need a little help improving the following query performance
SELECT *
FROM dle_pause
LEFT JOIN dle_post_plus
ON ( dle_pause.pause_postid = dle_post_plus.puuid )
LEFT JOIN dle_post
ON ( dle_post_plus.news_id = dle_post.id )
LEFT JOIN dle_playerfiles
ON ( dle_post.id = dle_playerfiles.post_id )
WHERE pause_user = '2';
it takes 3 rows in set (0.35 sec) the problem is with the third join. one of the rows don't have dle_post.id = dle_playerfiles.post_id so it scans whole the table.
looks like I have all needed indexes
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
| 1 | SIMPLE | dle_pause | ALL | pause_user | NULL | NULL | NULL | 3 | Using where |
| 1 | SIMPLE | dle_post_plus | ref | puuid | puuid | 36 | func | 1 | Using where |
| 1 | SIMPLE | dle_post | eq_ref | PRIMARY | PRIMARY | 4 | online_test.dle_post_plus.news_id | 1 | NULL |
| 1 | SIMPLE | dle_playerFiles | ALL | ix_dle_playerFiles__post_id_type | NULL | NULL | NULL | 131454 | Range checked for each record (index map: 0x2) |
+----+-------------+-----------------+--------+----------------------------------+---------+---------+-----------------------------------+--------+------------------------------------------------+
If you have not put index on dle_playerfiles' post_id, then put index on it.
If you have already put an index on it, then in your query at last join write 'use index' like this:
SELECT *
FROM
dle_pause
LEFT JOIN dle_post_plus
ON ( dle_pause.pause_postid = dle_post_plus.puuid )
LEFT JOIN dle_post
ON ( dle_post_plus.news_id = dle_post.id )
LEFT JOIN dle_playerfiles **use index(post_id)**
ON ( dle_post.id = dle_playerfiles.post_id )
WHERE
pause_user = '2';
This will use index for fourth table also. Right now your explain show that it is not using any index on fourth table and hence scans 131454 rows.
I can suggest two alternative for solving this.
First alternative:
Create a temporary tables that contain only non NULL values for the key you comparing with LEFT join.
Something like this:
select *
into #dle_post_plus
where pause_postid is not null
Do it for all three tables.
Then use your original query on the temporary tables which does not include NULL values.
Second alternative:
Create an index for each key you are comparing in the left join, in this way the index will do the job for you.
Off course you can always combine the two methods I suggested.

MySQL & nested set: slow JOIN (not using index)

I have two tables:
localities:
CREATE TABLE `localities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`type` varchar(30) NOT NULL,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_localities_on_parent_id_and_type` (`parent_id`,`type`),
KEY `index_localities_on_name` (`name`),
KEY `index_localities_on_lft_and_rgt` (`lft`,`rgt`)
) ENGINE=InnoDB;
locatings:
CREATE TABLE `locatings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`localizable_id` int(11) DEFAULT NULL,
`localizable_type` varchar(255) DEFAULT NULL,
`locality_id` int(11) NOT NULL,
`category` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_locatings_on_locality_id` (`locality_id`),
KEY `localizable_and_category_index` (`localizable_type`,`localizable_id`,`category`),
KEY `index_locatings_on_category` (`category`)
) ENGINE=InnoDB;
localities table is implemented as a nested set.
Now, when user belongs to some locality (through some locating) he also belongs to all its ancestors (higher level localities). I need a query that will select all the localities that all the users belong to into a view.
Here is my try:
select distinct lca.*, lt.localizable_type, lt.localizable_id
from locatings lt
join localities lc on lc.id = lt.locality_id
left join localities lca on (lca.lft <= lc.lft and lca.rgt >= lc.rgt)
The problem here is that it takes way too much time to execute.
I consulted EXPLAIN:
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
| 1 | SIMPLE | lt | ALL | index_locatings_on_locality_id | NULL | NULL | NULL | 4926 | 100.00 | Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY | PRIMARY | 4 | bzzik_development.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11439 | 100.00 | |
+----+-------------+-------+--------+---------------------------------+---------+---------+----------------------------------+-------+----------+-----------------+
3 rows in set, 1 warning (0.00 sec)
The last join obviously doesn’t use lft, rgt index as I expect it to. I’m desperate.
UPDATE:
After adding a condition as #cairnz suggested, the query takes still too much time to process.
UPDATE 2: Column names instead of the asterisk
Updated query:
SELECT DISTINCT lca.id, lt.`localizable_id`, lt.`localizable_type`
FROM locatings lt FORCE INDEX(index_locatings_on_category)
JOIN localities lc
ON lc.id = lt.locality_id
INNER JOIN localities lca
ON lca.lft <= lc.lft AND lca.rgt >= lc.rgt
WHERE lt.`category` != "Unknown";
Updated EXAPLAIN:
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
| 1 | SIMPLE | lt | range | index_locatings_on_category | index_locatings_on_category | 153 | NULL | 2545 | 100.00 | Using where; Using temporary |
| 1 | SIMPLE | lc | eq_ref | PRIMARY,index_localities_on_lft_and_rgt | PRIMARY | 4 | bzzik_production.lt.locality_id | 1 | 100.00 | |
| 1 | SIMPLE | lca | ALL | index_localities_on_lft_and_rgt | NULL | NULL | NULL | 11570 | 100.00 | Range checked for each record (index map: 0x10) |
+----+-------------+-------+--------+-----------------------------------------+-----------------------------+---------+---------------------------------+-------+----------+-------------------------------------------------+
Any help appreciated.
Ah, it just occurred to me.
Since you are asking for everything in the table, mysql decides to use a full table scan instead, as it deems it more efficient.
In order to get some key usage, add in some filters to restrict looking for every row in all the tables anyways.
Updating Answer:
Your second query does not make sense. You are left joining to lca yet you have a filter in it, this negates the left join by itself. Also you're looking for data in the last step of the query, meaning you will have to look through all of lt, lc and lca in order to find your data. Also you have no index with left-most column 'type' on locations, so you still need a full table scan to find your data.
If you had some sample data and example of what you are trying to achieve it would perhaps be easier to help.
try to experiment with forcing index - http://dev.mysql.com/doc/refman/5.1/en/index-hints.html, maybe it's just optimizer issue.
It looks like you're wanting the parents of the single result.
According to the person credited with defining Nested Sets in SQL, Joe Celko at http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html "This model is a natural way to show a parts explosion, because a final assembly is made of physically nested assemblies that break down into separate parts."
In other words, Nested Sets are used to filter children efficiently to an arbitrary number of independent levels within a single collection. You have two tables, but I don't see where the properties of the set "locatings" can't be de-normalized into "localities"?
If the localities table had a geometry column, could I not find the one locality from a "locating" and then select on the one table using a single filter: parent.lft <= row.left AND parent.rgt >= row.rgt ?
UPDATED
In this answer https://stackoverflow.com/a/1743952/3018894, there is an example from http://explainextended.com/2009/09/29/adjacency-list-vs-nested-sets-mysql/ where the following example gets all the ancestors to an arbitrary depth of 100000:
SELECT hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM (
SELECT #r AS _id,
#level := #level + 1 AS level,
(
SELECT #r := NULLIF(parent, 0)
FROM t_hierarchy hn
WHERE id = _id
)
FROM (
SELECT #r := 1000000,
#level := 0
) vars,
t_hierarchy hc
WHERE #r IS NOT NULL
) hc
JOIN t_hierarchy hp
ON hp.id = hc._id
ORDER BY
level DESC