Slow MySQL Query, Group By Order By Limit - mysql

I currently join 5 tables to select 20 objects to show the user, unfortunately if I use GROUP BY and ORDER BY it gets really slow.
An example query looks Like this:
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
GROUP BY a.fid_objekt ORDER BY rating DESC LIMIT 0,20
The EXPLAIN of the Query shows this:
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | l | ref | PRIMARY,slug | slug | 767 | const | 1 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | o | ALL | PRIMARY | NULL | NULL | NULL | 186779 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | a | ref | unique_key,letztes_angebot | unique_key | 8 | ferienhaeuser.o.id,ferienhaeuser.l.id | 1 | Using where |
| 1 | SIMPLE | r | eq_ref | PRIMARY | PRIMARY | 4 | ferienhaeuser.a.fid_region | 1 | |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
So it looks like it doesn't use a key for the table objekte, the Profiling says it uses 2.7s for Copying to tmp table.
Instead of FROM angebote or JOIN objekte I tried it with (SELECT * GROUP BY id) but unfortunately this doesn't improve.
The fields used for WHERE, ORDER BY and GROUP BY are also indexed.
I think I missed some basic concept here and any help will be appreciated.
Since it's most probable I made a mistake with the Tables, here the description of them:
Objekte
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| objekte | CREATE TABLE `objekte` (
`id` int(11) NOT NULL,
`typ` varchar(50) NOT NULL,
`persons` int(11) NOT NULL,
`children` int(11) NOT NULL,
`description` text NOT NULL,
`rating` float NOT NULL,
`totalratings` int(11) NOT NULL,
`minprice` float NOT NULL,
`picture` varchar(255) NOT NULL,
`last_offer` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `minprice` (`minprice`),
KEY `rating` (`rating`),
KEY `last_offer` (`last_offer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Angebote
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| angebote | CREATE TABLE `angebote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fid_objekt` int(11) NOT NULL,
`fid_land` int(11) NOT NULL,
`fid_region` int(11) NOT NULL,
`fid_subregion` int(11) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_key` (`fid_objekt`,`fid_land`,`fid_region`,`fid_subregion`),
KEY `letztes_angebot` (`letztes_angebot`),
KEY `fid_objekt` (`fid_objekt`),
KEY `fid_land` (`fid_land`),
KEY `fid_region` (`fid_region`),
KEY `fid_subregion` (`fid_subregion`)
) ENGINE=InnoDB AUTO_INCREMENT=2433073 DEFAULT CHARSET=utf8 |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
laender, regionen, subregionen (same structure)
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| laender | CREATE TABLE `laender` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`iso` varchar(2) NOT NULL,
`name` varchar(255) NOT NULL,
`slug` varchar(255) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `iso` (`iso`),
KEY `slug` (`slug`),
KEY `letztes_angebot` (`letztes_angebot`)
) ENGINE=InnoDB AUTO_INCREMENT=107 DEFAULT CHARSET=utf8 |
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

First of all this is a non standard group by. As such it will stop working when you upgrade to mysql 5.7.
The biggest problem comes from the fact that no index is used on the objekte table. To make matters worse you are ordering on the ratings field on that table but the index is still not being used. A possible solution is to create a composite index like this:
CREATE INDEX objekte_idx ON objekte(id,rating);

You do not need to use GROUP BY here. You have not use aggregrate functions. So remove GROUP BY from query. Remove the Group By will increase query performance. Also no need to define 0 for limit.
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
ORDER BY rating DESC LIMIT 20

Related

Slow mysql query with multiple joins

I have the following tables in my database:
product_fav:
CREATE TABLE `product_fav` (
`user_id` int(9) unsigned NOT NULL,
`asin` varchar(10) NOT NULL DEFAULT '',
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`price` decimal(7,2) NOT NULL,
PRIMARY KEY (`user_id`,`asin`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
product_info:
CREATE TABLE `product_info` (
`asin` varchar(10) NOT NULL,
`name` varchar(200) DEFAULT NULL,
`brand` varchar(50) DEFAULT NULL,
`part_number` varchar(50) DEFAULT NULL,
`url` text,
`image` text,
`availabillity` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`asin`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
product_price:
CREATE TABLE `product_price` (
`asin` varchar(10) NOT NULL,
`date` date NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`price` decimal(7,2) NOT NULL DEFAULT '0.00',
PRIMARY KEY (`asin`,`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I have the following query:
SELECT pi.*,
pp.price,
pf.date,
pf.price AS price_added,
round((100.0 (pp.price - pf.price) / pf.price),0) AS percentdiff
FROM product_info pi
JOIN
(
SELECT *
FROM product_price
ORDER BY date DESC) pp
ON pp.asin = pi.asin
JOIN product_fav pf
ON pp.asin = pf.asin
WHERE pf.user_id=". $user['user_id'] ."
GROUP BY asin
Product price has many records and query needs about 3 second. Is it possible to make it faster?
I have also the same issue with search query:
SELECT pi.*,
price,
date
FROM product_info pi
JOIN (SELECT *
FROM product_price
ORDER BY date DESC) pp
ON pi.asin = pp.asin
WHERE ( ` NAME ` LIKE '%".$search."%' )
GROUP BY pi.asin
ORDER BY price
EXPLAIN return this:
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 106709 | Using temporary; Using filesort |
| 1 | PRIMARY | pi | eq_ref | PRIMARY | PRIMARY | 32 | pp.asin | 1 | |
| 1 | PRIMARY | pf | eq_ref | PRIMARY | PRIMARY | 36 | const,pp.asin | 1 | |
| 2 | DERIVED | product_price | ALL | NULL | NULL | NULL | NULL | 112041 | Using filesort |
+----+-------------+---------------+--------+---------------+---------+---------+---------------+--------+---------------------------------+
You dont ORDER before JOIN, If you need order do it after the WHERE and GROUP BY so less data to sort.
JOIN
(
SELECT *
FROM product_price
ORDER BY date DESC) pp
Create index for asin so JOIN for ON pp.asin = pi.asin will be more efficient
Create index for user_id so the WHERE pf.user_id=". $user['user_id'] ." will be more efficient
Try running an EXPLAIN on your query to figure out where the bottle-neck is.
What's with the ORDER BY date in the inner query? Try getting rid of it. Also try replacing the inner query with a JOIN, they tend to be faster.
Also, do you have an index on the date field? Try adding one for the ORDER BY at the end of the query.

Terrible and slow query

I have some speed problems with query, that shows list of users in my DB.
I want to show list of users with traffic info and the last employee who works with user.
DB looks like this:
users table (contains users info):
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip` tinytext NOT NULL,
`name` varchar(64) NOT NULL,
... some other fields
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`),
KEY `ip` (`ip`(15)) USING BTREE,
)
users_trf table (contains information about users traffic; uid - id of users from users table):
CREATE TABLE `users_trf` (
`uid` int(11) unsigned NOT NULL,
`uip` varchar(15) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`in` bigint(20) NOT NULL DEFAULT '0',
`out` bigint(20) NOT NULL DEFAULT '0',
`test` tinyint(4) NOT NULL,
UNIQUE KEY `uid` (`uid`),
KEY `test` (`test`)
)
employees with list of all employees:
CREATE TABLE `employees` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`full_name` varchar(16) NOT NULL,
PRIMARY KEY (`id`)
)
and log table where I store data about jobs which employee did with client (uid - id of the client from users table, mid - id of employees from employees table):
CREATE TABLE `employees_log` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uid` int(10) unsigned NOT NULL,
`mid` int(10) unsigned NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`note` text NOT NULL,
PRIMARY KEY (`id`)
)
My query:
SELECT SQL_CALC_FOUND_ROWS *
FROM users u
LEFT JOIN users_trf t ON u.id = t.uid
LEFT JOIN (
SELECT e2.full_name, e1.uid, e1.mid AS moid
FROM employees_log e1
LEFT JOIN employees e2 ON e1.mid = e2.id
WHERE NOT
EXISTS (
SELECT *
FROM employees_log e3
WHERE e1.uid = e3.uid
AND e1.id < e3.id
)
) e ON e.uid = u.id
LIMIT 0 , 50
it works very slow, I think the reason of this is this subquery (I'm trying to select the last employee who works with client):
SELECT e2.full_name, e1.uid, e1.mid AS moid
FROM employees_log e1
LEFT JOIN employees e2 ON e1.mid = e2.id
WHERE NOT
EXISTS (
SELECT *
FROM employees_log e3
WHERE e1.uid = e3.uid
AND e1.id < e3.id
)
Is it possible to speed up my query?
UPD:
I added index ALTER TABLE employees_log ADD INDEX ( uid, id ); and query become 2 times faster, but can I make it more faster?
+----+--------------------+------------+--------+---------------+---------+---------+-------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+--------+---------------+---------+---------+-------------+-------+--------------------------+
| 1 | PRIMARY | u | ALL | NULL | NULL | NULL | NULL | 12029 | |
| 1 | PRIMARY | t | eq_ref | uid | uid | 4 | bill.u.id | 1 | |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2239 | |
| 2 | DERIVED | e1 | ALL | NULL | NULL | NULL | NULL | 2288 | Using where |
| 2 | DERIVED | e2 | eq_ref | PRIMARY | PRIMARY | 4 | bill.e1.mid | 1 | |
| 3 | DEPENDENT SUBQUERY | e3 | ref | PRIMARY,uid | uid | 4 | bill.e1.uid | 1 | Using where; Using index |
+----+--------------------+------------+--------+---------------+---------+---------+-------------+-------+--------------------------+
first of all, i think you have to expalin to yourself why using int and bigint. do you really expect so much data? try using smallint or mediumint, they need less memory and are much faster. if you use the mediumint and smallint as unsigned, they can have a pretty large value, take a look at: http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
second, you need to combine some field to one key:
ALTER TABLE `employees_log ` ADD INDEX ( `uid` , `id` ) ;
If you are creating a new MySQL table you can specify a column to index by using the INDEX term.Indexes are something extra that you can enable on your MySQL tables to increase performance
http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm
http://www.tutorialspoint.com/mysql/mysql-indexes.htm view this it gives you much idea..
With your goal of trying to join to the log for the last employee for that user in the log table (based on the key at least), maybe just try a = <subquery> instead of a NOT EXISTS?
SELECT e2.full_name, e1.uid, e1.mid AS moid
FROM employees_log e1
LEFT JOIN employees e2 ON e1.mid = e2.id
WHERE e1.id = (
SELECT MAX(e3.id)
FROM employees_log e3
WHERE e1.uid = e3.uid
)
Consider adding an index on the column MID and UID on employees_log - the explain suggests that this join is not using an index.
Like so: create index compound on employees_log (mid, uid)

Mysql Slow Query - Even with all the indices

mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.

Excluding large sets of objects from a query on a table with fast changing order

I have a table of products with a score column, which has a B-Tree Index on it. I have a query which returns products that have not been shown to the user in the current session. I can't simply use simple pagination with LIMIT for it, because the result should be ordered by the score column, which can change between query calls.
My current solution works like this:
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
This works fine for the first few pages, but the response time grows linear to the number of products already shown in the session and hits the second mark by the time this number reaches ~300. Is there a way to fasten this up in MySQL? Or should I solve this problem in an entirely other way?
Edit:
These are the two tables:
CREATE TABLE `products` (
`product_id` int(15) NOT NULL AUTO_INCREMENT,
`shop` varchar(15) NOT NULL,
`shop_id` varchar(25) NOT NULL,
`shop_category_id` varchar(20) DEFAULT NULL,
`shop_subcategory_id` varchar(20) DEFAULT NULL,
`shop_designer_id` varchar(20) DEFAULT NULL,
`shop_designer_name` varchar(40) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`product_url` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`description` mediumtext NOT NULL,
`price_cents` int(10) NOT NULL,
`list_image_url` varchar(255) NOT NULL,
`list_image_height` int(4) NOT NULL,
`ending` timestamp NULL DEFAULT NULL,
`category_id` int(5) NOT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`included_at` timestamp NULL DEFAULT NULL,
`hearts` int(5) NOT NULL,
`score` decimal(10,5) NOT NULL,
`rand_field` decimal(16,15) NOT NULL,
`last_score_update` timestamp NULL DEFAULT NULL,
`active` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`product_id`),
UNIQUE KEY `unique_shop_id` (`shop`,`shop_id`),
KEY `score_index` (`active`,`score`),
KEY `included_at_index` (`included_at`),
KEY `active_category_score` (`active`,`category_id`,`score`),
KEY `active_category` (`active`,`category_id`,`product_id`),
KEY `active_products` (`active`,`product_id`),
KEY `active_rand` (`active`,`rand_field`),
KEY `active_category_rand` (`active`,`category_id`,`rand_field`)
) ENGINE=InnoDB AUTO_INCREMENT=55985 DEFAULT CHARSET=utf8
CREATE TABLE `product_seen` (
`seenby_id` int(20) NOT NULL AUTO_INCREMENT,
`session_id` varchar(25) NOT NULL,
`product_id` int(15) NOT NULL,
`last_seen` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sorting` varchar(10) NOT NULL,
`in_category` int(3) DEFAULT NULL,
PRIMARY KEY (`seenby_id`),
KEY `last_seen_index` (`last_seen`),
KEY `session_id` (`session_id`,`seenby_id`),
KEY `session_id_2` (`session_id`,`sorting`,`seenby_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17431 DEFAULT CHARSET=utf8
Edit 2:
The query above is a simplification, this is the real query with EXPLAIN:
EXPLAIN SELECT
DISTINCT p.product_id AS id,
p.list_image_url AS image,
p.list_image_height AS list_height,
hearts,
active AS available,
(UNIX_TIMESTAMP( ) - ulp.last_action) AS last_loved
FROM `looksandgoods`.`products` p
LEFT JOIN `looksandgoods`.`user_likes_products` ulp
ON ( p.product_id = ulp.product_id AND ulp.user_id =1 )
LEFT JOIN `looksandgoods`.`product_seen` sb
ON (sb.session_id = 'y7lWunZKKABgMoDgzjwDjZw1'
AND sb.sorting = 'trend'
AND p.product_id = sb.product_id )
WHERE p.active =1
AND sb.product_id IS NULL
ORDER BY p.score DESC
LIMIT 30 ;
Explain output, there is still a temp table and filesort, although the keys for the join exist:
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | p | range | score_index,active_category_score,active_category,active_products,active_rand,active_category_rand | score_index | 1 | NULL | 2299 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ulp | ref | love_count_index,user_to_product_index,product_id | love_count_index | 9 | looksandgoods.p.product_id,const | 1 | |
| 1 | SIMPLE | sb | ref | session_id,session_id_2 | session_id | 77 | const | 711 | Using where; Not exists; Distinct |
+----+-------------+-------+-------+----------------------------------------------------------------------------------------------------+------------------+---------+----------------------------------+------+----------------------------------------------+
New answer
I think the problem with the real query is the DISTINCT clause. The implication is that either or both of the product_seen and user_likes_products tables can join multiple rows for each product_id which could potentially appear in the result set (given the somewhat disturbing lack of UNIQUE KEYs on the product_seen table), and this is the reason you've included the DISTINCT clause. Unfortunately, it also means MySQL will have to create a temp table to process the query.
Before I go any further, if it's possible to do...
ALTER TABLE product_seen ADD UNIQUE KEY (session_id, product_id, sorting);
...and...
ALTER TABLE user_likes_products ADD UNIQUE KEY (user_id, product_id);
...then the DISTINCT clause is redundant, and removing it should eliminate the problem. N.B. I'm not suggesting you necessarily need to add these keys, but rather just to confirm that these fields are always unique.
If it's not possible, then there may be another solution, but I'd need to know a lot more about the tables involved in the joins.
Old answer
An EXPLAIN for your query yields...
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 10 | Using filesort |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+------+---------------+------------+---------+-------+------+-------------------------+
...which shows it's not using an index on the products table, so it's having to do a table scan and a filesort, which is why it's slow.
I noticed there's an index on (active, score) which you could use by changing the query to only show active products...
SELECT *
FROM products p
LEFT JOIN product_seen ps
ON (ps.session_id = ? AND p.product_id = ps.product_id )
WHERE p.active=TRUE AND ps.product_id is null
ORDER BY p.score DESC
LIMIT 30;
...which changes the EXPLAIN to...
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
| 1 | SIMPLE | p | range | score_index,active_products | score_index | 1 | NULL | 10 | Using where |
| 1 | SIMPLE | ps | ref | session_id | session_id | 27 | const | 1 | Using where; Not exists |
+----+-------------+-------+-------+-----------------------------+-------------+---------+-------+------+-------------------------+
...which is now doing a range scan and no filesort, which should be much faster.
Or if you want it to also return inactive products, then you'll need to add an index on score only, with...
ALTER TABLE products ADD KEY (score);

Optimise MySql query using temporary and filesort

I have this query (shown below) which currently uses temporary and filesort in order to generate a grouped by set of ordered results. I would like to get rid of their usage if possible. I have looked into the underlying indexes used in this query and I just can't see what is missing.
SELECT
b.institutionid AS b__institutionid,
b.name AS b__name,
COUNT(DISTINCT f2.facebook_id) AS f2__0
FROM education_institutions b
LEFT JOIN facebook_education_matches f ON b.institutionid = f.institutionid
LEFT JOIN facebook_education f2 ON f.school_uid = f2.school_uid
WHERE
(
b.approved = '1'
AND f2.facebook_id IN ( [lots of facebook ids here ])
)
GROUP BY b__institutionid
ORDER BY f2__0 DESC
LIMIT 10
Here is the output for EXPLAIN EXTENDED :
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | f | index | PRIMARY,institutionId | institutionId | 4 | NULL | 308 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | f2 | ref | facebook_id_idx,school_uid_idx | school_uid_idx | 9 | f.school_uid | 1 | 100.00 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | f.institutionId | 1 | 100.00 | Using where |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
The CREATE TABLE statements for each table are shown below so you know the schema.
CREATE TABLE facebook_education (
education_id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) DEFAULT NULL,
school_uid bigint(20) DEFAULT NULL,
school_type varchar(255) DEFAULT NULL,
year smallint(6) DEFAULT NULL,
facebook_id bigint(20) DEFAULT NULL,
degree varchar(255) DEFAULT NULL,
PRIMARY KEY (education_id),
KEY facebook_id_idx (facebook_id),
KEY school_uid_idx (school_uid),
CONSTRAINT facebook_education_facebook_id_facebook_user_facebook_id FOREIGN KEY (facebook_id) REFERENCES facebook_user (facebook_id)
) ENGINE=InnoDB AUTO_INCREMENT=484 DEFAULT CHARSET=utf8;
CREATE TABLE facebook_education_matches (
school_uid bigint(20) NOT NULL,
institutionId int(11) NOT NULL,
created_at timestamp NULL DEFAULT NULL,
updated_at timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (school_uid),
KEY institutionId (institutionId),
CONSTRAINT fk_facebook_education FOREIGN KEY (school_uid) REFERENCES facebook_education (school_uid) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT fk_education_institutions FOREIGN KEY (institutionId) REFERENCES education_institutions (institutionId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT;
CREATE TABLE education_institutions (
institutionId int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
type enum('School','Degree') DEFAULT NULL,
approved tinyint(1) NOT NULL DEFAULT '0',
deleted tinyint(1) NOT NULL DEFAULT '0',
normalisedName varchar(100) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (institutionId)
) ENGINE=InnoDB AUTO_INCREMENT=101327 DEFAULT CHARSET=utf8;
Any guidance would be greatly appreciated.
The filesort probably happens because you have no suitable index for the ORDER BY
It's mentioned in the MySQL "ORDER BY Optimization" docs.
What you can do is load a temp table, select from that afterwards. When you load the temp table, use ORDER BY NULL. When you select from the temp table, use ORDER BY .. LIMIT
The issue is that group by adds an implicit order by <group by clause> ASC unless you disable that behavior by adding a order by null.
It's one of those MySQL specific gotcha's.
I can see two possible optimizations,
b.approved = '1' - You definitely need an index on approved column for quick filtering.
f2.facebook_id IN ( [lots of facebook ids here ]) ) - Store the facebook ids in a temp table,. Then create an index on the temp table and then join with the temp table instead of using IN clause.