Debugging Slow mySQL query with Explain - mysql

Have found an inefficient query in our system. content holds versions of slides, and this is supposed to select the highest version of a slide by id.
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
group by `slide_id`
) c ON `c`.`version` = `content`.`version`;
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | ref | PRIMARY,version | PRIMARY | 8 | const | 9703 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------+------+----------+--------------------------+
One big issue is that it returns almost all the slides in the system as the outer query does not filter by slide id. After adding that I get...
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';
EXPLAIN
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | content | NULL | const | PRIMARY,fk_content_slides_idx,version,slide_id | PRIMARY | 16 | const,const | 1 | 100.00 | NULL |
| 2 | DERIVED | content | NULL | ref | PRIMARY,fk_content_slides_idx,thumbnail_asset_id,version,slide_id | fk_content_slides_idx | 8 | const | 1 | 100.00 | Using where; Using index |
+----+-------------+------------------+------------+--------+--------------------------------------------------------------------------------+------------------------------------+---------+-------------+------+----------+--------------------------+
That reduces the amount of rows down to one correctly, but doesnt really speed things up.
There are indexes on version, slide_id and a unique key on version AND slide_id.
Is there anything else I can do to speed this up?
Use a TOP LIMIT 1 insetead of Max ?
m

MySQL seems to take an index (version, slide_id) to join the tables. You should get a better result with
SELECT `content`.*
FROM `content`
FORCE INDEX FOR JOIN (fk_content_slides_idx)
join (
SELECT `slide_id`, max(version) as `version` from `content`
WHERE `slide_id` = '16901' group by `slide_id`
) c ON `c`.`slide_id` = `content`.`slide_id` and `c`.`version` = `content`.`version`
You need an index that has slide_id as first column, I just guessed that's fk_content_slides_idx, if not, take another one.
The part FORCE INDEX FOR JOIN (fk_content_slides_idx) is just to enforce it, you should try if mysql takes it by itself without forcing (it should).
You might get even a slightly better result with an index (slide_id, version), it depends on the amount of data (e.g. the number of versions per id) if you see a difference (but you should not spam indexes, and you already have a lot on this table, but you can try it for fun.)

Just a suggestion i think you should avoid the group by slide_id because you are filter by one slide_id only (16901)
SELECT `content`.*
FROM (`content`)
JOIN (
SELECT max(version) as `version` from `content`
WHERE `slide_id` = '16901'
) c ON `c`.`version` = `content`.`version`
WHERE `slide_id` = '16901';

Related

MySQL: Out of sort memory, consider increasing server sort buffer size

I cant find explanation of MySQL behavior. Examples are below.
Fields name and name_full are text type and field price_steps is json type. None of them has index.
SELECT
name,
name_full,
price_steps
FROM
`lots`
WHERE
EXISTS (
SELECT
*
FROM
`categories`
INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id`
WHERE
`lots`.`id` = `category_lot`.`lot_id`
AND `category_id` IN (25)
)
ORDER BY
`created_at` DESC
LIMIT 31 OFFSET 0
MySQL throws error: [Err] 1038 - Out of sort memory, consider increasing server sort buffer size.
Ok, let it be.
Then I added extra field into select part.
SUBSTRING(`name_full`, 1, 200000000000) as name_full2
and query runs successfully (Why? Extra field should lead to extra memory allocation, isnt it?).
Then I decided to make query heavier and replace string
AND `category_id` IN (25)
with
AND `category_id` IN (1,2,3,4,5,6,7,8,9,10, 25)
and the query also finishes successfully.
Count of rows with category = 25 is only about 250, but with categories in (1,2,3,4,5,6,7,8,9,10, 25) is about 40000 rows. This must lead to extra memory demands, but mysql doesnt throw error. Why?
Any explanation to this paradox? Thanks in advance!
UPDATE1
EXPLAIN with failing query
mysql> EXPLAIN SELECT name, name_full, price_steps FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(25) ) ORDER BY `created_at` DESC;
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | categories | NULL | const | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | const | 1099 | 100.00 | Start temporary |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | torgs.category_lot.lot_id | 1 | 100.00 | End temporary |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
3 rows in set, 2 warnings (0.00 sec)
EXPLAIN with success query (added 4th field)
mysql> EXPLAIN SELECT name, name_full, price_steps,SUBSTRING(`name_full`, 1, 200000000000) as name_full2 FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(25) ) ORDER BY `created_at` DESC;
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | categories | NULL | const | PRIMARY | PRIMARY | 8 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | const | 1099 | 100.00 | Start temporary |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | torgs.category_lot.lot_id | 1 | 100.00 | End temporary |
+----+-------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------------+------+----------+----------------------------------------------+
3 rows in set, 2 warnings (0.00 sec)
EXPLAIN with success query (added 4th field and category_id in (1,2,3,4,5,6,7,8,9,10,25))
mysql> EXPLAIN SELECT name, name_full, price_steps, SUBSTRING(`name_full`, 1, 200000000000) as name_full2 FROM `lots` WHERE EXISTS ( SELECT * FROM `categories` INNER JOIN `category_lot` ON `categories`.`id` = `category_lot`.`category_id` WHERE `lots`.`id` = `category_lot`.`lot_id` AND `category_id` IN(1,2,3,4,5,6,7,8,9,10,25) ) ORDER BY `created_at` DESC;
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | lots | NULL | eq_ref | PRIMARY | PRIMARY | 8 | <subquery2>.lot_id | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | categories | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 11 | 100.00 | Using where; Using index |
| 2 | MATERIALIZED | category_lot | NULL | ref | category_lot_lot_id_foreign,category_lot_category_id_foreign | category_lot_category_id_foreign | 8 | torgs.categories.id | 1883 | 100.00 | NULL |
+----+--------------+--------------+------------+--------+--------------------------------------------------------------+----------------------------------+---------+---------------------+------+----------+---------------------------------+
4 rows in set, 2 warnings (0.00 sec)
UPDATE2
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1276 | Field or reference 'torgs.lots.id' of SELECT #2 was resolved in SELECT #1 |
| Note | 1003 | /* select#1 */ select `torgs`.`lots`.`name` AS `name`,`torgs`.`lots`.`name_full` AS `name_full`,`torgs`.`lots`.`price_steps` AS `price_steps`,substr(`torgs`.`lots`.`name_full`,1,200000000000) AS `name_full2` from `torgs`.`lots` semi join (`torgs`.`categories` join `torgs`.`category_lot`) where ((`torgs`.`category_lot`.`category_id` = `torgs`.`categories`.`id`) and (`torgs`.`lots`.`id` = `<subquery2>`.`lot_id`) and (`torgs`.`categories`.`id` in (1,2,3,4,5,6,7,8,9,10,25))) order by `torgs`.`lots`.`created_at` desc |
+-------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

MySQL ORDER BY columns across multiple joined tables

I have a database which consists of three tables, with the following structure:
restaurant table: restaurant_id, location_id, rating. Example: 1325, 77, 4.5
restaurant_name table: restaurant_id, language, name. Example: 1325, 'en', 'Pizza Express'
location_name table: location_id, language, name. Example: 77, 'en', 'New York'
I would like to get the restaurant info in English, sorted by location name and restaurant name, and use the LIMIT clause to paginate the result. So my SQL is:
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
This is terribly slow - so I refined my SQL with deferred JOIN, which make things a lot faster (from over 10 seconds to 2 seconds):
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN (
SELECT r.restaurant_id
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
) r1
ON r.restaurant_id = r1.restaurant_id
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
2 seconds is unfortunately still not very acceptable to the user, so I go and check the EXPLAIN of the my query, and it appears that the slow part is on the ORDER BY clause, which I see "Using temporary; Using filesort". I checked the official reference manual about ORDER BY optimization and I come across this statement:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it may still use indexes to find the rows that match the
WHERE clause. Examples:
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
So for my case, given that the two columns I'm ordering by are from the nonconstant joined tables, index cannot be used. My question is, is there any other approach I can take to speed things up, or what I've done so far is already the best I can achieve?
Thanks in advance for your help!
EDIT 1
Below is the EXPLAIN output with the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | ALL | idx_restaurant_name_1 | NULL | NULL | NULL | 8484 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
Below is the EXPLAIN output without the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | index | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | NULL | 8484 | Using where; Using index |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where; Using index |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
EDIT 2
Below are the DDL of the table. I built them for illustrating this problem only, the real table has much more columns.
CREATE TABLE restaurant (
restaurant_id INT NOT NULL AUTO_INCREMENT,
location_id INT NOT NULL,
rating INT NOT NULL,
PRIMARY KEY (restaurant_id),
INDEX idx_restaurant_1 (location_id)
);
CREATE TABLE restaurant_name (
restaurant_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_restaurant_name_1 (restaurant_id, language),
INDEX idx_restaurant_name_2 (name)
);
CREATE TABLE location_name (
location_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_location_name_1 (location_id, language),
INDEX idx_location_name_2 (name)
);
Based on the EXPLAIN numbers, there could be about 170 "pages" of restaurants (8484/50)? I suggest that that is impractical for paging through. I strongly recommend you rethink the UI. In doing so, the performance problem you state will probably vanish.
For example, the UI could be 2 steps instead of 170 to get to the restaurants in Zimbabwe. Step 1, pick a country. (OK, that might be page 5 of the countries.) Step 2, view the list of restaurants in that country; it would be only a few pages to flip through. Much better for the user; much better for the database.
Addenda
In order to optimize the pagination, get the paginated list of pages from a single table (so that you can 'remember where you left off'). Then join the language table(s) to look up the translations. Note that this only looks up on page's worth of translations, not thousands.

MySQL LEFT JOIN or WHERE IN SUBQUERY

I need a piece of advice, building an app now and I need to run some queries on rather large tables, possibly at a very frequent rate, so I'm trying to get the best approach performance wise.
I have the following 2 tables:
Albums:
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| album | varchar(200) | NO | | NULL | |
| filename | varchar(200) | NO | | NULL | |
| obstacle_time | time | NO | | NULL | |
+---------------+--------------+------+-----+---------+----------------+
and keywords:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| eventid | int(11) | NO | MUL | NULL | |
| filename | varchar(200) | NO | | NULL | |
| bibnumbers | varchar(200) | NO | | NULL | |
| gender | varchar(20) | YES | | NULL | |
| top_style | varchar(20) | YES | | NULL | |
| pants_style | varchar(20) | YES | | NULL | |
| other | varchar(20) | YES | | NULL | |
| cap | varchar(200) | NO | | NULL | |
| tshirt | varchar(200) | NO | | NULL | |
| pants | varchar(200) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Both table have a unique_index declared which is a constraint of the eventid+filename column.
Both table contains information about some images, but the albums table is available instantly (as soon as I have the images), while the keywords table usually becomes available several days later after a manual tagging of the images is completed
Now I will have people searching for all kind of things once the tagging is enabled, but since the results can be HUGE (up to 10.000 or more) I'm only showing them in small chunks so the browser doesn't get killed with trying to load a huge amount of images, because of this my server will be hit with loads of query requests (every time the visitor scrolls to the bottom of the page, an ajax query will return the next chunk of images).
Now my question is, which of the following queries is better performance wise:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
LEFT JOIN `keywords`
ON `keywords`.`eventid` = `albums`.`eventid`
AND `albums`.`filename` = `keywords`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
ORDER BY `obstacle_time`
LIMIT X, 10
OR using an IN CLAUSE inside WHERE like:
SELECT `albums`.`filename`,`basket`.`id`,`albums`.`id`,`obstacle_time`
FROM `albums`
LEFT JOIN `basket`
ON `basket`.`eventid` = `albums`.`eventid`
AND `basket`.`fileid` = `albums`.`id`
AND `basket`.`visitor_id` = 1
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%string%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (
SELECT `filename`
FROM `keywrods`
WHERE
`eventid` = 1
AND `gender` = 1
AND `top_style` REGEXP '[[:<:]]0[[:>:]]|[[:<:]]1[[:>:]]'
AND `cap` = '2'
AND `tshirt` = '1'
AND `pants` = '3'
)
ORDER BY `obstacle_time`
LIMIT X, 10
I had looked to similar questions but wasn't able to figure it out which is the best course of action.
My understanding so far is that:
Using LEFT JOINtakes advantages of INDEXING, BUT!!! if I use it I will get a full join of the tables even when I only need a significantly smaller result set, so it's almost a wast to join thousands of rows just to then filter out most of them.
Using IN and subquery isn't indexed??? I'm not 100% sure about this, I'm using MySQL 5.6 and to the best of my understanding since 5.6 even subqueries get automatically indexed my MySQL. I think this method has benefits when there result is significantly filtered, not sure if there will be any benefit if the subquery will return all the possible filenames.
As footnote questions:
Should I consider returning the whole result to the client on the first query and use client side (HTML) techniques to load the images gradually rather than re-querying the server each time?
Should I consider merging the 2 tables into 1, how much of a performance impact will that have? (can be tricky due to various reasons, which have no place in the question)
Thanks.
EDIT 1
Explain for JOIN query:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using index |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+
Using WHERE IN:
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
| 1 | SIMPLE | albums_2015 | ref | unique_index | unique_index | 4 | const | 6475 | Using where; Using temporary; Using filesort | |
| 1 | SIMPLE | keywords_2015 | eq_ref | unique_index | unique_index | 206 | const,mybibnumber.albums_2015.filename | 1 | Using where | |
| 1 | SIMPLE | basket | ALL | NULL | NULL | NULL | NULL | 2 | Using where; Using join buffer (Block Nested Loop) | |
+----+-------------+---------------+--------+---------------+--------------+---------+----------------------------------------+------+----------------------------------------------------+--+
EDIT 2
I wasn't able to set up a SQL Fiddler (keep getting error of something went wrong), so I have created a test database on one of my servers.
Address: http://188.165.217.185/phpmyadmin/, user: temp_test, pass: test_temp
I'm still building the whole thing and I don't have all the values filled in yet, like top_style, pants_style, etc, so a more appropriate query for the test scenario will be:
WHERE IN:
SELECT `albums_2015`.`filename`,
`albums_2015`.`id`,
`obstacle_time`
FROM `albums_2015`
WHERE `albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `filename` IN (SELECT `filename`
FROM `keywords_2015`
WHERE eventid = 1
AND
`bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]')
ORDER BY `obstacle_time`
LIMIT 0, 10
LEFT JOIN
SELECT `albums_2015`.`filename`,`albums_2015`.`id`,`obstacle_time`
FROM `albums_2015`
LEFT JOIN `keywords_2015`
ON `keywords_2015`.`eventid` = `albums_2015`.`eventid`
AND `albums_2015`.`filename` = `keywords_2015`.`filename`
WHERE
`albums_2015`.`eventid` = 1
AND `album` LIKE '%'
AND `obstacle_time` >= '08:00:00'
AND `obstacle_time` <= '14:11:10'
AND `bibnumbers` REGEXP '[[:<:]]113[[:>:]]|[[:<:]]106[[:>:]]'
ORDER BY `obstacle_time`
LIMIT 0, 10
More a bunch of tips :
Join using index are the best if you have to deal with multi table query,
Don't mind adding some index to speed up your query (index take space, but on INT field it's nothing and you gain way more than you lose).
In case of big table, caching the data in the distant table is usually a good idea.
An insert Trigger on TAG_table that cache the displayed part in the distant table (like the tag name for the overview of albums) can help you keeping your join query at a descent frequency.
Be careful with REGEX, it's something that hurt badly the perf. Adding a new table to split data is a better idea (and use indexing which is native optimisation)
For every field in a WHERE clause of a big and frequent query you should have an index on it. If you can't put one, then your DB model is f**cked-up and need to be changed.

Proper index/query when using INNER JOIN

I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | |
| log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | |
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | |
| log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | |
| log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | |
| log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | |
+-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where |
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a "Using temporary" problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where |
| 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index |
| 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index |
| 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index |
| 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | |
| 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index |
| 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index |
| 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index |
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don't know if it's a good or bad idea, but a subquery seems to get rid of the "Using temporary". Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
| 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort |
| 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | |
| 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where |
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
There isn't a simple rule for guaranteed success when it comes to indexing - you need to look at a reasonable period of typical calls to work out what will help in terms of performance.
All subsequent comments are therefore to be taken not as absolute rules:
An index is "good" if it quickly gets you to a small subset of the data rather than if it eliminates only half of the data (e.g. there is rarely value in an index on a gender column where there are only M/F as the possible entries). So how unique are the values within e.g. log_code, category_overview and partner_id?
For a given query it is often helpful to have a "covering" index, that is one that includes all the fields that are used by the query - however, if there are too many fields from a single table in a query you instead want an index that includes the fields in the "where" or "join" clause to identify the row and then join back to the table storage to get all the fields required.
So given the information you've provided, a candidate index on log_codes would include log_code and category_overview. Similarly on log_entries for log_code and partner_id. However these would need to be evaluated for how they affect performance.
Bear in mind that any given index may improve the read performance of a single query retrieving data but it will also slow down writes to the table where there is then a requirement to write more information i.e. where the new row fits in the additional index. This is why you need to look at the big picture of activity on the database to determine where indexes are worth it.
Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ's answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses -
ALTER TABLE `log_codes`
ADD INDEX `IX_overview_code` (`category_overview`, `log_code`);
ALTER TABLE `log_entries`
ADD INDEX `IX_partner_code` (`partner_id`, `log_code`),
ADD INDEX `IX_customer_partner_code` (`customer_id`, `partner_id`, `log_code`);
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.

MySQL query optimisation help

hoping you can help me on the right track to start optimising my queries. I've never thought too much about optimisation before, but I have a few queries similar to the one below and want to start concentrating on improving their efficiency. An example of a query which I badly need to optimise is as follows:
SELECT COUNT(*) AS `records_found`
FROM (`records_owners` AS `ro`, `records` AS `r`)
WHERE r.reg_no = ro.contact_no
AND `contacted_email` <> "0000-00-00"
AND `contacted_post` <> "0000-00-00"
AND `ro`.`import_date` BETWEEN "2010-01-01" AND "2010-07-11" AND `r`.`pa_date_of_birth` > "2010-01-01" AND EXISTS ( SELECT `number` FROM `roles` WHERE `roles`.`number` = r.`reg_no` )
Running EXPLAIN on the above produces the following:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+-------------+
| 1 | PRIMARY | r | ALL | NULL | NULL | NULL | NULL | 21533 | Using where |
| 1 | PRIMARY | ro | eq_ref | PRIMARY | PRIMARY | 4 | r.reg_no | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | roles | ALL | NULL | NULL | NULL | NULL | 189 | Using where |
As you can see, you have a dependent subquery, which is one of the worst thing performance-wise in MySQL. See here for tips:
http://dev.mysql.com/doc/refman/5.0/en/select-optimization.html
http://dev.mysql.com/doc/refman/5.0/en/in-subquery-optimization.html