Slow inner join order query - mysql

I have a problem with the speed of query. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using index condition; Using where; Using temporary; Using filesort on companies table.
Mysql slow query: INNER JOIN + ORDER BY causes filesort
Slow query:
SELECT * FROM companies
INNER JOIN post_indices
ON companies.post_index_id = post_indices.id
WHERE companies.deleted_at is NULL
ORDER BY post_indices.id
LIMIT 1;
# 1 row in set (5.62 sec)
But if I remove where statement from query it is really fast:
SELECT * FROM companies
INNER JOIN post_indices
ON companies.post_index_id = post_indices.id
ORDER BY post_indices.id
LIMIT 1;
# 1 row in set (0.00 sec)
I've tried using different indexes on companies table:
index_companies_on_deleted_at
index_companeis_on_post_index_id
index_companies_on_deleted_at_and_post_index_id
index_companies_on_post_index_id_and_deleted_at
index_companies_on_deleted_at index is automatically selected by MySQL. Stats for same query using above indexes:
5.6 sec
3.4 sec
8.5 sec
3.5 sec
Any ideas how to improve my query speed? Again said - without where deleted_at is null condition query is instant..
Companies table has 1.3 mil of rows.
PostIndices table has 3k rows.
UPDATE 1:
Order by post_indices.id is used for simplicity since it's indexed already. But it will be used on other columns of join table (post_indices). So sort on companies.post_index_id wont solve this issue
UPDATE 2: for Rick James
Your query takes only 0.04 sec to accomplish. And explain says that index_companies_on_deleted_at_and_post_index_id index is used. So yes, it works better, but this doesn't solve my problem (need to order on post_indices columns, will do this in future, so id post_indices.id used for simplicity of example. In future it will be for example post_indices.city).
My query with WHERE, but without ORDER BY is instant.
UPDATE 3:
EXPLAIN query. Also I noticed that order of indexes matters. index_companies_on_deleted_at index is used if it's higher (created earlier) then index_companies_on_deleted_at_and_post_index_id. Otherwise later index is used. I mean automatically selected by MySQL.
mysql> EXPLAIN SELECT * FROM companies INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
| 1 | SIMPLE | companies | NULL | ref | index_companies_on_post_index_id,index_companies_on_deleted_at,index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at | 6 | const | 638692 | 100.00 | Using index condition; Using where; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+----------------------------------------------------------------------------------------------------------------+-------------------------------+---------+------------------------------------------------------+--------+----------+---------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM companies USE INDEX(index_companies_on_post_index_id) INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | companies | NULL | ALL | index_companies_on_post_index_id | NULL | NULL | NULL | 1277385 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+----------------------------------+---------+---------+------------------------------------------------------+---------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT * FROM companies USE INDEX(index_companies_on_deleted_at_and_post_index_id) INNER JOIN post_indices ON post_indices.id = companies.post_index_id WHERE companies.deleted_at IS NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
| 1 | SIMPLE | companies | NULL | ref | index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at_and_post_index_id | 6 | const | 638692 | 100.00 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | post_indices | NULL | eq_ref | PRIMARY | PRIMARY | 4 | enbro_purecrm_eu_development.companies.post_index_id | 1 | 100.00 | NULL |
+----+-------------+--------------+------------+--------+-------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------------+--------+----------+--------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
UPDATE 4:
I've removed non related columns:
| companies | CREATE TABLE `companies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`address` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`post_index_id` int(11) DEFAULT NULL,
`vat` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`note` text COLLATE utf8_unicode_ci,
`state` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'new',
`deleted_at` datetime DEFAULT NULL,
`lead_list_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_companies_on_vat` (`vat`),
KEY `index_companies_on_post_index_id` (`post_index_id`),
KEY `index_companies_on_state` (`state`),
KEY `index_companies_on_deleted_at` (`deleted_at`),
KEY `index_companies_on_deleted_at_and_post_index_id` (`deleted_at`,`post_index_id`),
KEY `index_companies_on_lead_list_id` (`lead_list_id`),
CONSTRAINT `fk_rails_5fc7f5c6b9` FOREIGN KEY (`lead_list_id`) REFERENCES `lead_lists` (`id`),
CONSTRAINT `fk_rails_79719355c6` FOREIGN KEY (`post_index_id`) REFERENCES `post_indices` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2523518 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
| post_indices | CREATE TABLE `post_indices` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`county` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`postal_code` int(11) DEFAULT NULL,
`group_part` int(11) DEFAULT NULL,
`group_number` int(11) DEFAULT NULL,
`group_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3101 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
UPDATE 5:
Another developer tested same query on his local machine with exactly same data set (dump/restore). And he got totally different explain:
mysql> explain SELECT * FROM companies INNER JOIN post_indices ON companies.post_index_id = post_indices.id WHERE companies.deleted_at is NULL ORDER BY post_indices.id LIMIT 1;
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
| 1 | SIMPLE | post_indices | index | PRIMARY | PRIMARY | 4 | NULL | 1 | NULL |
| 1 | SIMPLE | companies | ref | index_companies_on_post_index_id,index_companies_on_deleted_at,index_companies_on_deleted_at_and_post_index_id | index_companies_on_deleted_at_and_post_index_id | 11 | const,enbro_purecrm_eu_development.post_indices.id | 283 | Using index condition |
+----+-------------+--------------+-------+----------------------------------------------------------------------------------------------------------------+-------------------------------------------------+---------+----------------------------------------------------+------+-----------------------+
2 rows in set (0,00 sec)
Same query on his PC is instant. Have no idea why it is happening.. I've also tried to use STRAIGHT_JOIN. When I force post_indices table to be read first by MySQL, it is blazing fast too. But still it is mistery for me, why same query on another machine is fast (mysql -v 5.6.27) and slow on my machine (mysql -v 5.7.10)
So it seems that problem is MySQL using wrong table as first table to read.

Does this work better?
SELECT * FROM companies AS c
INNER JOIN post_indices AS pi
ON c.post_index_id = pi.id
WHERE c.deleted_at is NULL
ORDER BY c.post_index_id -- Note
LIMIT 1;
INDEX(deleted_at, post_index_id) -- note
For that matter, how fast does it run with the WHERE, but without the ORDER BY?

Using the following optimizer hints, should force MySQL to use the plan that your colleague observed:
SELECT * FROM post_indices
STRAIGHT_JOIN companies FORCE INDEX(index_companies_on_deleted_at_and_post_index_id)
ON companies.post_index_id = post_indices.id
WHERE companies.deleted_at is NULL
ORDER BY post_indices.id
LIMIT 1;
If you will be sorting on other columns of post_indices, you will need an index on those columns to make this plan work well.
Note that what is the most optimal plan will depend on how frequent deleted_at is NULL. If deleted_at is frequently NULL, the above plan will be fast. If not, with the above plan one will have to run through many rows of post_indices before a match is found. Note also that for queries with OFFSET, the same plan may not be the most effective.
I think the issue here is that MySQL decides the join order without considering the effects of ORDER BY and LIMIT. In other words, it will choose the join order that it thinks is fastest to execute the full join.
Since there is a restriction on the companies table (deleted_at is NULL), I am not surprised that it will start with this table.

Related

How to use correct indexes with a double inner join query?

I have a query with 2 INNER JOIN statements, and only fetching a few column, but it is very slow even though I have indexes on all required columns.
My query
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
INNER JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
WHERE
com.prestataireLAD REGEXP '.*'
AND pe_nom REGEXP 'bordeaux|chambéry-annecy|grenoble|lyon|marseille|metz|montpellier|nancy|nice|nimes|rouen|strasbourg|toulon|toulouse|vitry|vitry bis 1|vitry bis 2|vlg'
AND com.date_livraison BETWEEN '2022-06-11 00:00:00'
AND '2022-07-08 00:00:00';
It takes around 20 seconds to compute and fetch 4123 rows.
The problem
In order to find what's wrong and why is it so slow, I've used the EXPLAIN statement, here is the output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|--------|----------------------------|-------------|---------|------------------------|--------|----------|-------------|
| 1 | SIMPLE | dys | | ALL | id_commande,id_commande_2 | | | | 878588 | 100.00 | Using where |
| 1 | SIMPLE | com | | eq_ref | id_commande,date_livraison | id_commande | 110 | db.dys.id_commande | 1 | 7.14 | Using where |
| 1 | SIMPLE | pe | | ref | pe_id | pe_id | 5 | db.com.code_pe | 1 | 100.00 | Using where |
I can see that the dysfonctionnements JOIN is rigged, and doesn't use a key even though it could...
Table definitions
commandes (included relevant columns only)
CREATE TABLE `commandes` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) NOT NULL DEFAULT '',
`date_commande` datetime NOT NULL,
`date_livraison` datetime NOT NULL,
`code_pe` int(11) NOT NULL,
`traitement_dysfonctionnement` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`),
KEY `date_livraison` (`date_livraison`),
KEY `traitement_dysfonctionnement` (`traitement_dysfonctionnement`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
dysfonctionnements (again, relevant columns only)
CREATE TABLE `dysfonctionnements` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_commande` varchar(36) DEFAULT NULL,
`dysfonctionnement` varchar(150) DEFAULT NULL,
`responsable` varchar(50) DEFAULT NULL,
`reimputation` varchar(50) DEFAULT NULL,
`montant` float DEFAULT NULL,
`listRembArticles` text,
PRIMARY KEY (`id`),
UNIQUE KEY `id_commande` (`id_commande`,`dysfonctionnement`),
KEY `id_commande_2` (`id_commande`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
pe (again, relevant columns only)
CREATE TABLE `pe` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`pe_id` int(11) DEFAULT NULL,
`pe_nom` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pe_nom` (`pe_nom`),
KEY `pe_id` (`pe_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Investigation
If I remove the db.pe table from the query and the WHERE clause on pe_nom, the query takes 1.7 seconds to fetch 7k rows, and with the EXPLAIN statement, I can see it is using keys as I expect it to do:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|----|-------------|-------|------------|-------|----------------------------|----------------|---------|------------------------|--------|----------|-----------------------------------------------|
| 1 | SIMPLE | com | | range | id_commande,date_livraison | date_livraison | 5 | | 389558 | 100.00 | Using index condition; Using where; Using MRR |
| 1 | SIMPLE | dys | | ref | id_commande,id_commande_2 | id_commande_2 | 111 | ooshop.com.id_commande | 1 | 100.00 | |
I'm open to any suggestions, I see no reason not to use the key when it does on a very similar query and it definitely makes it faster...
I had a similar experience when MySQL optimiser selected a joined table sequence far from optimal. At that time I used MySQL specific STRAIGHT_JOIN operator to overcome default optimiser behaviour. In your case I would try this:
SELECT
dysfonctionnement,
montant,
listRembArticles,
case when dys.reimputation is not null then dys.reimputation else dys.responsable end as responsable_final
FROM
db.commandes AS com
STRAIGHT_JOIN db.dysfonctionnements AS dys ON com.id_commande = dys.id_commande
INNER JOIN db.pe AS pe ON com.code_pe = pe.pe_id
Also, in your WHERE clause one of the REGEXP probably might be changed to IN operator, I assume it can use index.
Remove com.prestataireLAD REGEXP '.*'. The Optimizer probably won't realize that this has no impact on the resultset. If you are dynamically building the WHERE clause, then eliminate anything else you can.
id_commande_2 is redundant. In queries where it might be useful, the UNIQUE can take care of it.
These indexes might help:
com: INDEX(date_livraison, id_commande, code_pe)
pe: INDEX(pe_nom, pe_id)

Mysql optimizer chooses wrong table order in query

We have simple database with 4 tables: files, file_versions, users, organizations.
I do select all files which owned by some organization with some condition on trashing date by this query:
select * FROM organizations o
LEFT JOIN users u ON o.id=u.organization_id
LEFT JOIN files f ON u.user_identity=f.owner_identity
LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity
AND f.local_path=fv.local_path
WHERE o.id=2001237 AND o.trashed_file_age_limit>=1
AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
Explain select shows me that optimizer choose wrong table order, which is different from query order(organizations-> users->files->file_versions):
mysql> explain select * FROM organizations o LEFT JOIN users u ON o.id=u.organization_id LEFT JOIN files f ON u.user_identity=f.owner_identity LEFT JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | f | ALL | PRIMARY | NULL | NULL | NULL | 109615125 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY,identity,organization_id | identity | 36 | filemirror.f.owner_identity | 1 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+--------+----------------------------------+----------+---------+----------------------------------------------------+-----------+-------------+
4 rows in set (0.01 sec)
Of couse this query is slow because of full scan by files table and I have to use STRAIGHT_JOIN(which is not equivalent to LEFT JOIN) to fix table order and make query faster.
mysql> explain select * FROM organizations o STRAIGHT_JOIN users u ON o.id=u.organization_id STRAIGHT_JOIN files f ON u.user_identity=f.owner_identity STRAIGHT_JOIN file_versions fv ON f.owner_identity=fv.owner_identity AND f.local_path=fv.local_path WHERE o.id=2001237 AND o.trashed_file_age_limit>=1 AND f.trashing_date<(1433943058 - o.trashed_file_age_limit*24*60*60);
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
| 1 | SIMPLE | o | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | u | ref | PRIMARY,identity,organization_id | PRIMARY | 4 | const | 36 | |
| 1 | SIMPLE | f | ref | PRIMARY | PRIMARY | 36 | filemirror.u.user_identity | 6089324 | Using where |
| 1 | SIMPLE | fv | ref | PRIMARY | PRIMARY | 3035 | filemirror.u.user_identity,filemirror.f.local_path | 1 | |
+----+-------------+-------+-------+----------------------------------+---------+---------+----------------------------------------------------+---------+-------------+
4 rows in set (0.00 sec)
My question is why mysql can change table order in not symmetric join operation?
Tables structure:
CREATE TABLE `file_versions` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`,`version_number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `files` (
`owner_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
`local_path` varchar(999) character set utf8 NOT NULL,
`version_number` int(11) unsigned NOT NULL,
..
`trashing_date` int(11) default NULL,
...
PRIMARY KEY (`owner_identity`,`local_path`),
KEY `trashing_date` (`trashing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `organizations` (
`id` int(11) NOT NULL,
...
`trashed_file_age_limit` int(11) default NULL,
...
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
CREATE TABLE `users` (
`organization_id` int(11) NOT NULL,
`id` int(11) NOT NULL,
`user_identity` char(36) character set latin1 collate latin1_bin NOT NULL,
...
PRIMARY KEY (`organization_id`,`id`),
UNIQUE KEY `identity` (`user_identity`),
KEY `organization_id` (`organization_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Mysql version 5.5
Look at the rows estimates, mysql thinks that it will need to read 109M rows of files table in first plan and 6M for each of 36 users = 216M rows for second plan. So it seems reasonable to read all 109M rows only once and in priamry key order instead reading them in separate blocks.. Those estimates does not seem very reasonable to me, so I would try running analyze table on files, but they are estimates so maybe you wont get better numbers.
Using LEFT join and then adding condition on the table to WHERE turns it into INNER join as Strawberry says in their comment - you have to have value for the where condition to ever be true, so mysql feels free to reorder those a bit, maybe even it seems better for optimizer to do "really-inner" joins first, so that may be second reason for that plan.
You can try using STRAIGHT_JOIN in different way - if you put it just once right after SELECT, then your join order is used by optimizer if possible (it usually is barring some weird right joins and other corner cases) without changing join type on specific tables (it is then used as sort of FLAG, in the way SQL_NO_CACHE is used to signalize something, instead of as special join type)
Then to make it even better, you may try adding index to files on (owner_identity, trashing_date) which should help in localizing specific files for each user and not globally as with current key on (trashing_date) only.

Excluding large sets of objects from a query on a table with fast changing order

I have a table of products with a score column, which has a B-Tree Index on it. I have a query which returns products that have not been shown to the user in the current session. I can't simply use simple pagination with LIMIT for it, because the result should be ordered by the score column, which can change between query calls.
My current solution works like this:
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
This works fine for the first few pages, but the response time grows linear to the number of products already shown in the session and hits the second mark by the time this number reaches ~300. Is there a way to fasten this up in MySQL? Or should I solve this problem in an entirely other way?
Edit:
These are the two tables:
CREATE TABLE `products` (
`product_id` int(15) NOT NULL AUTO_INCREMENT,
`shop` varchar(15) NOT NULL,
`shop_id` varchar(25) NOT NULL,
`shop_category_id` varchar(20) DEFAULT NULL,
`shop_subcategory_id` varchar(20) DEFAULT NULL,
`shop_designer_id` varchar(20) DEFAULT NULL,
`shop_designer_name` varchar(40) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`product_url` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
`price_cents` int(10) NOT NULL,
`list_image_url` varchar(255) NOT NULL,
`list_image_height` int(4) NOT NULL,
`ending` timestamp NULL DEFAULT NULL,
`category_id` int(5) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`included_at` timestamp NULL DEFAULT NULL,
`hearts` int(5) NOT NULL,
`score` decimal(10,5) NOT NULL,
`rand_field` decimal(16,15) NOT NULL,
`last_score_update` timestamp NULL DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`product_id`),
UNIQUE KEY `unique_shop_id` (`shop`,`shop_id`),
KEY `score_index` (`active`,`score`),
KEY `included_at_index` (`included_at`),
KEY `active_category_score` (`active`,`category_id`,`score`),
KEY `active_category` (`active`,`category_id`,`product_id`),
KEY `active_products` (`active`,`product_id`),
KEY `active_rand` (`active`,`rand_field`),
KEY `active_category_rand` (`active`,`category_id`,`rand_field`)
) ENGINE=InnoDB AUTO_INCREMENT=55985 DEFAULT CHARSET=utf8
CREATE TABLE `product_seen` (
`seenby_id` int(20) NOT NULL AUTO_INCREMENT,
`session_id` varchar(25) NOT NULL,
`product_id` int(15) NOT NULL,
`last_seen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sorting` varchar(10) NOT NULL,
`in_category` int(3) DEFAULT NULL,
PRIMARY KEY (`seenby_id`),
KEY `last_seen_index` (`last_seen`),
KEY `session_id` (`session_id`,`seenby_id`),
KEY `session_id_2` (`session_id`,`sorting`,`seenby_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17431 DEFAULT CHARSET=utf8
Edit 2:
The query above is a simplification, this is the real query with EXPLAIN:
EXPLAIN SELECT
DISTINCT p.product_id AS id,
p.list_image_url AS image,
p.list_image_height AS list_height,
hearts,
active AS available,
(UNIX_TIMESTAMP( ) - ulp.last_action) AS last_loved
FROM `looksandgoods`.`products` p
LEFT JOIN `looksandgoods`.`user_likes_products` ulp
ON ( p.product_id = ulp.product_id AND ulp.user_id =1 )
LEFT JOIN `looksandgoods`.`product_seen` sb
ON (sb.session_id = 'y7lWunZKKABgMoDgzjwDjZw1'
AND sb.sorting = 'trend'
AND p.product_id = sb.product_id )
WHERE p.active =1
AND sb.product_id IS NULL
ORDER BY p.score DESC
LIMIT 30 ;
Explain output, there is still a temp table and filesort, although the keys for the join exist:
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | p | range | score_index,active_category_score,active_category,active_products,active_rand,active_category_rand | score_index | 1 | NULL | 2299 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ulp | ref | love_count_index,user_to_product_index,product_id | love_count_index | 9 | looksandgoods.p.product_id,const | 1 | |
| 1 | SIMPLE | sb | ref | session_id,session_id_2 | session_id | 77 | const | 711 | Using where; Not exists; Distinct |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
New answer
I think the problem with the real query is the DISTINCT clause. The implication is that either or both of the product_seen and user_likes_products tables can join multiple rows for each product_id which could potentially appear in the result set (given the somewhat disturbing lack of UNIQUE KEYs on the product_seen table), and this is the reason you've included the DISTINCT clause. Unfortunately, it also means MySQL will have to create a temp table to process the query.
Before I go any further, if it's possible to do...
ALTER TABLE product_seen ADD UNIQUE KEY (session_id, product_id, sorting);
...and...
ALTER TABLE user_likes_products ADD UNIQUE KEY (user_id, product_id);
...then the DISTINCT clause is redundant, and removing it should eliminate the problem. N.B. I'm not suggesting you necessarily need to add these keys, but rather just to confirm that these fields are always unique.
If it's not possible, then there may be another solution, but I'd need to know a lot more about the tables involved in the joins.
Old answer
An EXPLAIN for your query yields...
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 10 | Using filesort |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
...which shows it's not using an index on the products table, so it's having to do a table scan and a filesort, which is why it's slow.
I noticed there's an index on (active, score) which you could use by changing the query to only show active products...
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE p.active=TRUE AND ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
...which changes the EXPLAIN to...
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | range | score_index,active_products | score_index | 1 | NULL | 10 | Using where |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
...which is now doing a range scan and no filesort, which should be much faster.
Or if you want it to also return inactive products, then you'll need to add an index on score only, with...
ALTER TABLE products ADD KEY (score);

Mysql query with join, limit and order is very slow (filesort)

I have the following query:
explain select * from users, dls where dls.user_id=users.id and users.status = 'accepted' and users.acc = 0 order by users.user_name desc limit 18416, 16
Which results in the following explain;
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
| 1 | SIMPLE | dls | ALL | PRIMARY,user_id | NULL | NULL | NULL | 19910 | Using temporary; Using filesort |
| 1 | SIMPLE | users | ref | PRIMARY,id_user_name | id_user_name | 4 | dls.user_id | 1 | Using where |
+----+-------------+-------+------+------------------------+-------------+---------+---------------------------------+-------+---------------------------------+
2 rows in set (0.00 sec)
This query is really, really slow and I cannot figure out how to fix it. I tried all kinds of indexes from reading articles on how to optimize order by / limit queries, but the result remains the same. Can anyone please help?
Edit: schemas:
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_name` varchar(100) character set utf8 NOT NULL,
`status` enum('accepted','rejected') character set utf8 NOT NULL,
`acc` varchar(6) character set utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `user_name` (`user_name`),
KEY `id_user_name` (`id`,`user_name`)
)
CREATE TABLE `dls` (
`user_id` int(10) unsigned NOT NULL,
`category_id` bigint(20) NOT NULL,
`download_url` varchar(255) character set utf8 NOT NULL,
PRIMARY KEY (`user_id`,`category_id`),
KEY `user_id` (`user_id`)
)
Output for query by Scrummeister;
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
| 1 | SIMPLE | u | ALL | PRIMARY,id_user_name | NULL | NULL | NULL | 10838 | Using where; Using filesort |
| 1 | SIMPLE | dls | ref | PRIMARY,user_id | user_id | 4 | u.id | 2 | |
+----+-------------+-------+------+------------------------+--------+---------+------------------------------+-------+-----------------------------+
MySql is known to have issues with a LIMIT using a large offset.
The STRAIGHT_JOIN keyword, tells MySql to first scan the users table and then for every user, look up the rows in the dls table.
SELECT STRAIGHT_JOIN *
FROM users u JOIN dls ON dls.user_id = users.id
WHERE u.status = 'accepted' and u.acc = 0
ORDER BY users.user_name desc
LIMIT 18416, 16
Using STRAIGHT_JOIN is not recommended unless there is a need for it, In this specific case i believe it might work since it can use the user_name index for Sorting.
Other options you have:
Increase the size of sort_buffer_size
Increase the size of read_rnd_buffer_size (with caution!)
Doing the paging on the users table only, regardless of how many dls he has, Only than apply the JOIN.
Handle the paging in your code. Assuming a user goes from page to page with skipping to many, you should store the first & last user names for each page. If the user clicks the next page - Add a WHERE user_name > "{LastPageLastUsername} LIMIT 0,16" this will increase
For other optimization, read ORDER BY Optimization and Limit Optimization
Try add an index to the users table with the following columns
status, acc, user_name
or
acc, status, user_name
which ever is the faster

Help: Optimize this query in MySQL

This is my tables, the AUTO_INCREMENT shows the size of each:
tbl_clientes:
CREATE TABLE `tbl_clientes` (
`int_clientes_id_pk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`str_clientes_documento` varchar(255) DEFAULT NULL,
`str_clientes_nome_original` char(255) DEFAULT NULL,
PRIMARY KEY (`int_clientes_id_pk`),
UNIQUE KEY `str_clientes_documento` (`str_clientes_documento`),
KEY `str_clientes_nome_original` (`str_clientes_nome_original`),
KEY `nome_original_cliente_id` (`str_clientes_nome_original`,`int_clientes_id_pk`),
KEY `cliente_id_nome_original` (`int_clientes_id_pk`,`str_clientes_nome_original`)
) ENGINE=MyISAM AUTO_INCREMENT=2815520 DEFAULT CHARSET=utf8
tbl_clienteEnderecos:
CREATE TABLE `tbl_clienteEnderecos` (
`int_clienteEnderecos_id_pk` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`int_clienteEnderecos_cliente_id_fk` bigint(20) unsigned NOT NULL,
`str_clienteEnderecos_endereco` varchar(255) NOT NULL,
`str_clienteEnderecos_cep` varchar(255) DEFAULT NULL,
`str_clienteEnderecos_numero` varchar(255) DEFAULT NULL,
`str_clienteEnderecos_complemento` varchar(255) DEFAULT NULL,
`str_clienteEnderecos_bairro` varchar(255) DEFAULT NULL,
`str_clienteEnderecos_cidade` varchar(255) DEFAULT NULL,
`str_clienteEnderecos_uf` varchar(2) DEFAULT NULL,
`int_clienteEnderecos_correspondencia` tinyint(1) NOT NULL DEFAULT '0',
`int_clienteEnderecos_tipo` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`int_clienteEnderecos_id_pk`),
KEY `int_clienteEnderecos_cliente_id_fk` (`int_clienteEnderecos_cliente_id_fk`),
KEY `str_clienteEnderecos_cidade` (`str_clienteEnderecos_cidade`),
KEY `str_clienteEnderecos_uf` (`str_clienteEnderecos_uf`),
KEY `uf_cidade` (`str_clienteEnderecos_uf`,`str_clienteEnderecos_cidade`)
) ENGINE=MyISAM AUTO_INCREMENT=1542038 DEFAULT CHARSET=utf8
Then I run this query to search, it will be fast, is using indexes:
EXPLAIN
SELECT * FROM tbl_clientes LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20
The result of EXPAIN is:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+------------------------------------+------------------------------------+---------+---------------------------------------------------+------+-------+
| 1 | SIMPLE | tbl_clientes | index | NULL | nome_original_cliente_id | 774 | NULL | 20 | |
| 1 | SIMPLE | tbl_clienteEnderecos | ref | int_clienteEnderecos_cliente_id_fk | int_clienteEnderecos_cliente_id_fk | 8 | mydb.tbl_clientes.int_clientes_id_pk | 1 | |
+----+-------------+----------------------+-------+------------------------------------+------------------------------------+---------+---------------------------------------------------+------+-------+
All right, but I need to filter by tbl_clienteEnderecos.str_clienteEnderecos_uf. It breaks all indexes, use temporary table and filesort (no index). Here's the query:
EXPLAIN
SELECT * FROM tbl_clientes LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
WHERE str_clienteEnderecos_uf = "SP"
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20
Look, this is the output of EXPLAIN:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+--------+----------------------------------------------------------------------+-----------+---------+---------------------------------------------------------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | tbl_clienteEnderecos | ref | int_clienteEnderecos_cliente_id_fk,str_clienteEnderecos_uf,uf_cidade | uf_cidade | 9 | const | 670654 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | tbl_clientes | eq_ref | PRIMARY,cliente_id_nome_original | PRIMARY | 8 | mydb.tbl_clienteEnderecos.int_clienteEnderecos_cliente_id_fk | 1 | |
+----+-------------+----------------------+--------+----------------------------------------------------------------------+-----------+---------+---------------------------------------------------------------------------+--------+----------------------------------------------+
With this Using where; Using temporary; Using filesort it can't be fast. I've tried a lot of things, how optimize this query?
Is it time to switch to NoSQL/MongoDB?
MySQL will typically not use an index if it will not help narrow the results down enough. It appears that "SP" occurs in roughly 670654 rows. Since this is about 1/3 of your total rows, it is more efficient to read it in disk order.
You can try an index to tbl_clienteEnderecos:
KEY `test` (`str_clienteEnderecos_uf `, `int_clienteEnderecos_cliente_id_fk`)
This might be enough to get it to use the index.
What is the difference between these two columns? They look like they should be the same.
int_clienteEnderecos_id_pk
int_clienteEnderecos_cliente_id_fk
Edit
I understand what the names of the columns imply. I was just curious if the two values should be identical. If they are, it would simplify a few things and have them be joined on the primary key of the tables. I am not sure about the specific meaning of the tables involved, so I don't know if there is a 1-1 or 1-0 relationship between them or a one to many relationship.
I suggest trying to retrieve just the primary key of the tables that you want. For instance, instead of select * try:
EXPLAIN
SELECT int_clienteEnerecos_id_pk, int_clientes_id_pk
FROM tbl_clientes
LEFT JOIN tbl_clienteEnderecos ON int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk
WHERE str_clienteEnderecos_uf = "SP"
GROUP BY str_clientes_nome_original, int_clientes_id_pk
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0,20
If this works out the way I hope it will, you sell see "from index" in the Extra column. If you need additional fields returned, you can either make another round trip to fetch them, or add them to your index. Or use a nested query to fetch them based on the results of the query above.
Also, why are you grouping by and ordering by the same thing? Are you expecting multiple matches of the foreign key?
I'd suggest giving the following a try; the subquery might use the key better than the join in this context. Take care, though; I couldn't swear on a stack of K & R's that the query is the same as your original.
SELECT *,
(SELECT *
FROM tbl_clienteEnderecos
WHERE int_clienteEnderecos_cliente_id_fk = int_clientes_id_pk AND
str_clienteEnderecos_uf = "SP") AS T2
FROM tbl_clientes
GROUP BY str_clientes_nome_original, int_clientes_id_pk
HAVING T2.int_clienteEnderecos_id_pk IS NOT NULL
ORDER BY str_clientes_nome_original, int_clientes_id_pk
LIMIT 0, 20