MySQL equality slower than greater-than - mysql

I'm having a curious issue with my MySQL database. The following query runs in 0.003 seconds:
SELECT * FROM `post` where `thread_id` > 12117484 and `index` > -1 limit 1;
If I change the second > to an =, the query doesn't complete (it runs for over a minute):
SELECT * FROM `post` where `thread_id` > 12117484 and `index` = 0 limit 1;
It's worth noting that the result from the first query has index = 0. I know it's bad form to name a column index...but it's the database I've been given. Here's the MySQL Explain for the second query:
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | post | range | post_thread_id_idx1 | post_thread_id_idx1 | 5 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+

Related

MySQL - nested select queries running many time slower than sequential queries (on a large table)

I have a MySQL query that I am having performance problems with that I do not understand. When I try to debug and run the overall query as a sequence of separate subqueries they seem to perform reasonably well, given the volume of data. When I combine them into a single nested query I get much much much longer execution times.
The main ratings table mentioned below is approx 30 million rows (4GB of disk space), with a couple of foreign keys (it's a many-to-many table linking users and items with a small amount of additional supplementary user specific item information - approx 13 fields and 30 bytes).
Query 1 - approx 23s
SELECT COUNT(1) FROM (SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1) AS t1;
Query 1 saved to table - approx 65s if I save the results to a temporary table
CREATE TABLE temp_table SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1;
Query 2 - approx 3s
SELECT COUNT(1) FROM temp_table WHERE id IN (SELECT id from item_stats WHERE
ratings_count > 1000);
Bases on this I would expect a combined query to be approx 30s or so, and not more than approx 70s.
Combined query (Query 1 + Query 2) - indeterminate time (10s of minutes before I give up and cancel)
SELECT COUNT(1) from (SELECT * FROM (SELECT fields FROM ratings WHERE (id >= 0
AND id < 10000) AND item_type = 1) AS t1 WHERE t1.id IN (SELECT id FROM
item_stats WHERE ratings_count > 1000)) as t2;
Can anyone help explain this difference and guide me in creating a query that works? If I need to I can rely on the sequential queries (which would take approx 70s), but that is cumbersome and does not seem the right way to go.
I have tried using INNER JOIN instead of IN but this did not seem to make much difference. The ID count from the item_stats table is about 2700 IDs.
It's using MySQL 8.0 on a laptop (16GB RAM, SSD).
Response to suggestions / questions:
Query 1
EXPLAIN select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1;
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | collections | NULL | ALL | user_id | NULL | NULL | NULL | 32898400 | 1.31 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
EXPLAIN select * from temp_coll where game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
| 1 | SIMPLE | temp_coll | NULL | ALL | NULL | NULL | NULL | NULL | 1674386 | 10.00 | Using where; Using join buffer (hash join) |
| 2 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
Combined query
EXPLAIN select * from (select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1) as t1 where t1.game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | <subquery3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | collections | NULL | ref | user_id,game_id | game_id | 5 | <subquery3>.game_id | 199 | 1.31 | Using where |
| 3 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Your query appears to be functionally identical to the following (rather implausible) query:
SELECT COUNT(*) total
FROM ratings r
JOIN item_stats s
ON s.id = r.id
WHERE r.id >= 0
AND r.id < 10000
AND r.item_type = 1
AND s.ratings_count > 1000
r.id is, presumably, the PRIMARY KEY, so it's automatically included in any INNODB index, which leaves just item_type and ratings_count requiring indexes.
You would benefit a lot from an online tutorial on learning how to read the EXPLAIN plan. The EXPLAINS you shared clearly show missing indexes.
As a general rule, queries should not take 23 seconds or 65 seconds, even with millions of rows. Proper indexes + partitioning should resolve the slowness.
Query 1: The user_id index on that table is not helping performance, as 99% of users are within the range in the where clause. You can add an index on item_type_id
ALTER TABLE collections ADD KEY (item_type_id)
Query 2: The temp_coll table is missing a game_id index. Also, I'm not sure if the underlying code for games_ratings_stats has an index on ratings_count and if that would help. I dont have experience with MySQL materialized tables.
ALTER TABLE temp_coll ADD KEY (game_id)
Query 3:
Would benefit from above indexes.
Increasing the InnoDB Buffer Pool Size (now set to 8GB) seems to have made a significant improvement. If anyone has any further setup or tuning advice on MySQL then that would be appreciated!

MariaDB SELECT with index used but looks like table scan

I have a MariaDB 10.4 with a hung table (about 100 million rows) for storing crawled posts. The table contains 4x columns, and one of them is lastUpadate (datetime) and indexed.
Recently I try to select posts by lastUpdate. Most of them returns fast with index used, but some takes minutes with fewer records returned and looks like a table scan.
This is the query explain without conditions.
> explain select 1 from SourceAttr;
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| 1 | SIMPLE | SourceAttr | index | NULL | idxCreateDate | 5 | NULL | 79830491 | Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
This is the query explain and number of rows returned for the slow one. The number of rows in the explain is almost equals to the above one.
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | SourceAttr | index | idxLastUpdate | idxLastUpdate | 5 | NULL | 79827437 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
394454 rows in set (14 min 40.908 sec)
The is the fast one.
> explain select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | SourceAttr | range | idxLastUpdate | idxLastUpdate | 5 | NULL | 3699041 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
1352552 rows in set (2.982 sec)
Any reason what might cause this ?
Thanks a lot.
When you see type: index it's called an index scan. This is almost as bad as a table-scan.
Notice the rows: 79827437 in the EXPLAIN of the two slow queries. This means it's examining over 79 million items in the scanned index, either idxCreateDate or idxLastUpdate. So it's basically examining every index entry, which takes nearly as long as examining every row of the table.
Whereas the quick query says rows: 3699041 so it's estimating less than 3.7 million rows examined. More than 20x fewer.

Speed up query with large table DESC limit 1

MariaDB 10 (myisam)
Query executes rather slowly, takes about 90 seconds.
I tried deleting some old rows and then optimizing the table.
SELECT ceil(rate * 8 / 1000000)
FROM db.Octets
WHERE id = 5344
order by datetime DESC
LIMIT 1;
Query takes a really long time to execute.
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | Octets | index | NULL | Octets_1_idx | 8 | NULL | 1 | Using where |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
you could try adding a composite redundant index
create index idx2 on Octets ( id , datetime, rate)

MySQL select query gets quite slow with BOTH where and descending order

I have this select query, ItemType is varchar type and ItemComments is int type:
select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1
You can see this query has 3 conditions:
where 'ItemType' equals a specific value;
order by 'ItemComments'
with descending order
The interesting thing is, when I select rows with all three conditions, it's getting very slow. But if I drop any one of the three (except condition 2), the query runs quite fast. See:
select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 16.318 sec. */
select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.140 sec. */
select * from ItemInfo order by ItemComments desc limit 1;
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 0.015 sec. */
Plus,
I'm using MySQL 5.7 with InnoDB engine.
I have created indexes on both ItemType and ItemComments and table ItemInfo contains 2 million rows.
I have searched many possible explanation like MySQL support for descending index, composite index and so on. But these still can't explain why query #1 runs slowly while query #2 and #3 runs well.
It would be very appreciated if anyone could help me out.
Updates:create table and explain info
Create code:
CREATE TABLE `ItemInfo` (
`ItemID` VARCHAR(255) NOT NULL,
`ItemType` VARCHAR(255) NOT NULL,
`ItemPics` VARCHAR(255) NULL DEFAULT '0',
`ItemName` VARCHAR(255) NULL DEFAULT '0',
`ItemComments` INT(50) NULL DEFAULT '0',
`ItemScore` DECIMAL(10,1) NULL DEFAULT '0.0',
`ItemPrice` DECIMAL(20,2) NULL DEFAULT '0.00',
`ItemDate` DATETIME NULL DEFAULT '1971-01-01 00:00:00',
PRIMARY KEY (`ItemID`, `ItemType`),
INDEX `ItemDate` (`ItemDate`),
INDEX `ItemComments` (`ItemComments`),
INDEX `ItemType` (`ItemType`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
Explain result:
mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | i | NULL | index | ItemType | ItemComments | 5 | NULL | 83 | 1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | i | NULL | index | ItemType | ItemComments | 5 | NULL | 83 | 1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
mysql> explain select * from ItemInfo order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| 1 | SIMPLE | i | NULL | index | NULL | ItemComments | 5 | NULL | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
Query from O. Jones:
mysql> explain
-> SELECT a.*
-> FROM ItemInfo a
-> JOIN (
-> SELECT MAX(ItemComments) ItemComments, ItemType
-> FROM ItemInfo
-> GROUP BY ItemType
-> ) maxcomm ON a.ItemType = maxcomm.ItemType
-> AND a.ItemComments = maxcomm.ItemComments
-> WHERE a.ItemType = 'item_type';
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
| 1 | PRIMARY | a | NULL | ref | ItemComments,ItemType | ItemType | 767 | const | 27378 | 100.00 | Using where |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 772 | mydb.a.ItemComments,const | 10 | 100.00 | Using where; Using index |
| 2 | DERIVED | ItemInfo | NULL | index | PRIMARY,ItemDate,ItemComments,ItemType | ItemType | 767 | NULL | 2289466 | 100.00 | NULL |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
I'm not sure if I execute this query right but I couldn't get the records within quite a long time.
Query from Vijay. But I add ItemType join condition cause with only max_comnt return items from other ItemType:
SELECT ifo.* FROM ItemInfo ifo
JOIN (SELECT ItemType, MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo
ON ifo.ItemComments = inn_ifo.max_comnt and ifo.ItemType = inn_ifo.ItemType
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 7.441 sec. */
explain result:
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
| 1 | PRIMARY | <derived2> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | ifo | NULL | index_merge | ItemComments,ItemType | ItemComments,ItemType | 5,767 | NULL | 88 | 100.00 | Using intersect(ItemComments,ItemType); Using where |
| 2 | DERIVED | ItemInfo | NULL | ref | ItemType | ItemType | 767 | const | 27378 | 100.00 | NULL |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
And I want to explain why I use order with limit at the first place: I was planning to fetch record from the table randomly with a specific probability. The random index generated from python and send to MySQL as a variable. But then I found it cost so much time so I decided to just use the first record I got.
After inspiring by O. Jones and Vijay, I tried using max function, but it doesn't perform well:
select max(ItemComments) from ItemInfo where ItemType='item_type'
/* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 6.225 sec. */
explain result:
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | ItemInfo | NULL | ref | ItemType | ItemType | 767 | const | 27378 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
Thanks for all contribute to this question. Hope you could bring more solutions based on information above.
Please provide CURRENT SHOW CREATE TABLE ItemInfo.
For most of those queries, you need the composite index
INDEX(ItemType, ItemComments)
For the last one, you need
INDEX(ItemComments)
For that especially slow query, please provide EXPLAIN SELECT ....
Discussion - Why does INDEX(ItemType, ItemComments) help with where ItemType="item_type" order by ItemComments desc limit 1?
An index is structured in a BTree (see Wikipedia), thereby making searching for an individual item very fast, and making scanning in a particular order very fast.
where ItemType="item_type" says to filter on ItemType, but there are a lot of such in the index. In this index, they are ordered by ItemComments (for a given ItemType). The direction desc suggests to start with the highest value of ItemContents; that is the 'end' of the index items. Finally limit 1 says to stop after one item is found. (Somewhat like finding the last "S" in your Rolodex.)
So the query is to 'drill down' the BTree to the end of the entries for ItemType in the composite INDEX(ItemType, ItemContents) and grab one entry -- a very efficient task.
Actually SELECT * implies that there is one more step, namely to get all the columns for that one row. That info is not in the index, but over in the BTree for ItemInfo -- which contains all the columns for all the rows, ordered by the PRIMARY KEY.
The "secondary index" (INDEX(ItemType, ItemComments)) implicitly contains a copy of the relevant PRIMARY KEY columns, so we now have the values of ItemID and ItemType. With those, we can drill down this other BTree to find the desired row and fetch all (*) the columns.
Your first query ordering ascending can take advantage of your index on ItemComment.
SELECT * ... ORDER BY ... LIMIT 1 is a notorious performance antipattern. Why? The server must sort a whole mess of rows, just to discard all but the first.
You might try this (for your descending order variant). It's a little more verbose but much more efficient.
SELECT a.*
FROM ItemInfo a
JOIN (
SELECT MAX(ItemComments) ItemComments, ItemType
FROM ItemInfo
GROUP BY ItemType
) maxcomm ON a.ItemType = maxcomm.ItemType
AND a.ItemComments = maxcomm.ItemComments
WHERE a.ItemType = 'item type'
Why does this work? It uses GROUP BY / MAX() to find the maximum value rather that ORDER BY ... DESC LIMIT 1 . The subquery does your search.
To make this work as efficiently as possible you need a compound (multicolumn) index on (ItemType, ItemComments). Create that with
ALTER TABLE ItemInfo CREATE INDEX ItemTypeCommentIndex (ItemType, ItemComments);
When you create the new index, drop your index on ItemType, because the new index is redundant with that one.
MySQL's query planner is smart enough to see the outer WHERE clause before it runs the inner GROUP BY query, so it doesn't have to aggregate the whole table.
With that compound index MySQL can use a loose index scan to satisfy the subquery. Those are almost miraculously fast. You should read up on the topic.
Your query will select all the rows with based on the where condition. After that it will sort the rows according to order by statement , then it will select the first row. A better query would be something like
SELECT ifo.* FROM ItemInfo ifo
JOIN (SELECT MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo
ON ifo.ItemComments = inn_ifo.max_comnt
As this query only finds maximum value from the column. Finding MAX() is only O(n) but the fastest algorithm for sorting is of O(nlogn) . So if you will avoid the order by statemet the query will perform faster.
Hope this helped.

Why is mySQL query, left join 'considerably' faster than my inner join

I've researched this, but I still cannot explain why:
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155
Is significantly slower than:
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155
115ms Vs 478ms. They are both using InnoDB and there are relationships defined. The 'card_legality' contains approx 200k rows, while the 'legality' table contains 11 rows. Here is the structure for each:
CREATE TABLE `card_legality` (
`card_id` varchar(8) NOT NULL DEFAULT '',
`legality_id` int(3) NOT NULL,
`cl_boolean` tinyint(1) NOT NULL,
PRIMARY KEY (`card_id`,`legality_id`),
KEY `legality_id` (`legality_id`),
CONSTRAINT `card_legality_ibfk_2` FOREIGN KEY (`legality_id`) REFERENCES `legality` (`legality_id`),
CONSTRAINT `card_legality_ibfk_1` FOREIGN KEY (`card_id`) REFERENCES `card` (`card_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
And:
CREATE TABLE `legality` (
`legality_id` int(3) NOT NULL AUTO_INCREMENT,
`l_name` varchar(16) NOT NULL DEFAULT '',
PRIMARY KEY (`legality_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
I could simply use LEFT-JOIN, but it doesn't seem quite right... any thoughts, please?
UPDATE:
As requested, I've included the results of explain for each. I had run it previously, but I dont pretend to have a thorough understanding of it..
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE cl ALL PRIMARY NULL NULL NULL 199747 Using where
1 SIMPLE l eq_ref PRIMARY PRIMARY 4 hexproof.co.uk.cl.legality_id 1
AND, inner join:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE l ALL PRIMARY NULL NULL NULL 11
1 SIMPLE cl ref PRIMARY,legality_id legality_id 4 hexproof.co.uk.l.legality_id 33799 Using where
It is because of the varchar on card_id. MySQL can't use the index on card_id as card_id as described here mysql type conversion. The important part is
For comparisons of a string column with a number, MySQL cannot use an
index on the column to look up the value quickly. If str_col is an
indexed string column, the index cannot be used when performing the
lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may
convert to the value 1, such as '1', ' 1', or '1a'.
If you change your queries to
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'
and
SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'
You should see a huge improvement in speed and also see a different EXPLAIN.
Here is a similar (but easier) test to show this:
> desc id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id | varchar(8) | NO | PRI | NULL | |
+-------+------------+------+-----+---------+-------+
1 row in set (0.17 sec)
> select * from id_test;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+----+
9 rows in set (0.00 sec)
> explain select * from id_test where id = 1;
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | id_test | index | PRIMARY | PRIMARY | 10 | NULL | 9 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
> explain select * from id_test where id = '1';
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | id_test | const | PRIMARY | PRIMARY | 10 | const | 1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
In the first case there is Using where; Using index and the second is Using index. Also ref is either NULL or CONST. Needless to say, the second one is better.
L2G has it pretty much summed up, although I suspect it could be because of the varchar type used for card_id.
I actually printed out this informative page for benchmarking / profiling quickies. Here is a quick poor-mans profiling technique:
Time a SQL on MySQL
Enable Profiling
mysql> SET PROFILING = 1
...
RUN your SQLs
...
mysql> SHOW PROFILES;
+----------+------------+-----------------------+
| Query_ID | Duration | Query |
+----------+------------+-----------------------+
| 1 | 0.00014600 | SELECT DATABASE() |
| 2 | 0.00024250 | select user from user |
+----------+------------+-----------------------+
mysql> SHOW PROFILE for QUERY 2;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000034 |
| checking query cache for query | 0.000033 |
| checking permissions | 0.000006 |
| Opening tables | 0.000011 |
| init | 0.000013 |
| optimizing | 0.000004 |
| executing | 0.000011 |
| end | 0.000004 |
| query end | 0.000002 |
| freeing items | 0.000026 |
| logging slow query | 0.000002 |
| cleaning up | 0.000003 |
+--------------------------------+----------+
Good-luck, oh and please post your findings!
I'd try EXPLAIN on both of those queries. Just prefix each SELECT with EXPLAIN and run them. It gives really useful info on how mySQL is optimizing and executing queries.
I'm pretty sure that MySql has better optimization for Left Joins - no evidence to back this up at the moment.
ETA : A quick scout round and I can't find anything concrete to uphold my view so.....