I have a software stack which I run myself, but also deploy to customer premises.
There is a particular query which runs very well in my environment, but runs terribly in the customer's environment.
I have confirmed using EXPLAIN that my environment's query planner sees that there is a great index available (and uses it). Whereas the same query in the customer's environment does not offer that index under possible_keys.
Here's the full query, somewhat anonymized:
SELECT t0.*,
t1.*,
t2.*,
t3.value
FROM table0 t0
LEFT OUTER JOIN table1 t1
ON t0.id = t1.table0_id
LEFT OUTER JOIN table2 t2
ON t1.id = t2.table1_id
AND t2.deleted = 0
LEFT OUTER JOIN table3 t3
ON t0.id = t3.table0_id
AND t3.type = 'whatever'
WHERE t0.business IN ('workcorp')
AND '2016-11-01 00:00:00' <= t0.date
AND t0.date < '2016-12-01 00:00:00'
ORDER BY t0.date DESC
The stage where our environments differ is on JOINing to table3. So theoretically you can ignore a large amount of the query and think of it like this:
SELECT t0.*
t3.value
FROM table0 t0
LEFT OUTER JOIN table3 t3
ON t0.id = t3.table0_id
AND t3.type = 'whatever'
Both of our environments' query plans agree on how to JOIN to t1 and to t2. But they differ in their plan for how to JOIN to t3.
My environment correctly identifies two possible indexes for JOINing to t3, and correctly identifies that table0_id is the best choice for this query:
+----+-------------+-------+------+--------------------------+-----------+---------+------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+--------------------------+-----------+---------+------+-------+----------+-------------+
| 1 | SIMPLE | t3 | ref | table0_id,type_and_value | table0_id | 108 | func | 2 | 100.00 | Using where |
+----+-------------+-------+------+--------------------------+-----------+---------+------+-------+----------+-------------+
The customer's environment does not consider the index table0_id to be an option, and falls back to type_and_value (which is a really bad choice):
+----+-------------+-------+------+----------------+----------------+---------+-------+----------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+----------------+----------------+---------+-------+----------------+----------+-------------+
| 1 | SIMPLE | t3 | ref | type_and_value | type_and_value | 257 | const | (far too many) | 100.00 | Using where |
+----+-------------+-------+------+----------------+----------------+---------+-------+----------------+----------+-------------+
What happens if we FORCE INDEX?
EXPLAIN EXTENDED SELECT t0.*,
t1.*,
t2.*,
t3.value
FROM table0 t0
LEFT OUTER JOIN table1 t1
ON t0.id = t1.table0_id
LEFT OUTER JOIN table2 t2
ON t1.id = t2.table1_id
AND t2.deleted = 0
LEFT OUTER JOIN table3 t3 FORCE INDEX (table0_id)
ON t0.id = t3.table0_id
AND t3.type = 'whatever'
WHERE t0.business IN ('workcorp')
AND '2016-11-01 00:00:00' <= t0.date
AND t0.date < '2016-12-01 00:00:00'
ORDER BY t0.date DESC
On my environment, I got:
+----+-------------+-------+------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | t3 | ref | table0_id | table0_id | 108 | func | 2 | Using where |
+----+-------------+-------+------+---------------+-----------+---------+------+------+-------------+
Compared to my original query plan (which proposed two possible_keys): this narrowed down the choice to just one.
But the customer got a different result:
+----+-------------+-------+------+---------------+------+---------+-------+---------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+---------+----------+----------------------------------------------------+
| 1 | SIMPLE | t3 | ALL | NULL | NULL | NULL | NULL | (loads) | 100.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------+---------------+------+---------+-------+---------+----------+----------------------------------------------------+
Adding the FORCE INDEX narrows down the customer's possible_keys from one bad choice, to no choices.
So why is it that the customer's environment does not have the same indexes available in possible_keys? Naturally I was given to suspect "maybe they don't have that index". So we did a SHOW INDEXES FROM table3. Here's my environment (for comparison):
+--------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table3 | 0 | PRIMARY | 1 | id | A | 16696 | NULL | NULL | | BTREE | | |
| table3 | 1 | table0_id | 1 | table0_id | A | 16696 | NULL | NULL | | BTREE | | |
| table3 | 1 | type_and_value | 1 | type | A | 14 | NULL | NULL | | BTREE | | |
| table3 | 1 | type_and_value | 2 | value | A | 8348 | NULL | NULL | | BTREE | | |
+--------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Their environment had the same index, table0_id available:
+--------+------------+-----------------+--------------+-----------------+-----------+-------------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+-----------------+--------------+-----------------+-----------+-------------------+----------+--------+------+------------+---------+---------------+
| table3 | 1 | table0_id | 1 | table0_id | A | (same as PRIMARY) | NULL | NULL | | BTREE | | |
+--------+------------+-----------------+--------------+-----------------+-----------+-------------------+----------+--------+------+------------+---------+---------------+
I was also careful to ask "is this a slave? is the master the same?": they assured me that all instances had this index, as required.
So I thought "maybe the index is broken in some way?" And proposed that they do the very simplest query relying upon that index:
EXPLAIN EXTENDED SELECT *
FROM table3
WHERE table0_id = 'whatever'
In this case: their environment behaved the same as mine (and correctly), proposing the use of the index table0_id:
+----+-------------+--------+------+---------------+-----------+---------+-------+------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+-----------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | table3 | ref | table0_id | table0_id | 108 | const | 1 | 100.00 | Using index condition |
+----+-------------+--------+------+---------------+-----------+---------+-------+------+----------+-----------------------+
So they definitely have that index. And their query planner can recognise that it is eligible for use (for some queries at least).
What's going on here? Why is table0_id ineligible for use on certain queries, but only on the customer's environment? Could it be that the index is broken in some way? Or that the query planner is making a mistake?
Are there any other tests I can do to figure out why it's not using the index for this query?
Turns out it was the charsets (and/or the collations).
I used this query to reveal the charset in each environment:
SELECT table_name, column_name, character_set_name FROM information_schema.`COLUMNS`
WHERE table_schema = "my_cool_database"
AND table_name IN ("t0", "t3")
ORDER BY 1 DESC, 2
And for bonus points I checked the character collation in each environment:
SHOW FULL COLUMNS FROM t0;
SHOW FULL COLUMNS FROM t3;
In my environment: all columns in both tables had utf8 charset and utf8_unicode_ci collation.
In the customer's environment: t0 matched my environment exactly, yet t3 was a unique snowflake; its columns had latin1 charset and latin1_swedish_ci collation.
So, what we were seeing is that the index that existed on t3.table0_id (a latin1 column) could not be used to JOIN to a utf8 table. Hence the index worked fine for:
SELECT *
FROM table3
WHERE table0_id = 'whatever'
Yet the index could not be used for:
SELECT
t0.id,
t3.value
FROM t0
LEFT OUTER JOIN t3
Similar symptoms are described on the Percona blog, John Watson's blog and Baron Schwartz's blog.
Related
I was trying to improve performance on some queries through indexes using EXPLAIN and I noticed each time I used SHOW index FROM TableB; the output of the rows colums in the EXPLAIN of a query changed
Ex:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
and...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Why does this happen?
The rows column should be taken as a rough estimate only. It's not a precise number.
It's based on statistical estimates of how many rows will be examined during a query. The actual number of rows cannot be known until you actually execute the query.
The statistics are based on samples read from the table periodically. These samples are re-read occasionally, for example after you run ANALYZE TABLE or certain INFORMATION_SCHEMA queries, or certain SHOW statements.
I don't find 20% variation in statistics to be a big deal. In many situations, think of the graph being like an upturned parabola, and you need to know which side of the minimum point you are on. In complex queries, where the Optimizer is likely to goof, it need a lot more than simple stats, such as Histograms of MariaDB 10.0 / 10.1. (I don't have enough experience with such to say whether that makes much headway.)
Your particular query is probably going to be performed in only one way, regardless of the statistics. An example of a complicated query would be a JOIN with WHERE clauses filtering each table. The optimizer has to decide which table to start with. Another case is a single table with a WHERE and ORDER BY and they cannot both be handled by a single index -- should it use an index to filter, but then have to sort? or should it use an index for ORDER BY, but then have to filter on the fly?
I have the following query
SELECT a.id, b.id from table1 AS a, table2 AS b
WHERE a.table2_id IS NULL
AND a.plane = SUBSTRING(b.imb, 1, 20)
AND (a.stat LIKE "f%" OR a.stat LIKE "F%")
Here is the output of EXPLAIN
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 28578039 | |
| 1 | SIMPLE | a | ref | index_on_plane,index_on_table2_id_id,mysql_confirmstat_on_stat | index_on_plane | 258 | func| 2 | Using where |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------------------------------+---------+------+----------+-------------+
The query takes around 80 minutes to execute.
The indexes on table1 are as follows
+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 | 0 | PRIMARY | 1 | id | A | 50319117 | NULL | NULL | | BTREE | | |
| table1 | 1 | index_on_post | 1 | post | A | 7188445 | NULL | NULL | YES | BTREE | | |
| table1 | 1 | index_on_plane | 1 | plane | A | 25159558 | NULL | NULL | YES | BTREE | | |
| table1 | 1 | index_on_table2_id | 1 | table2_id | A | 25159558 | NULL | NULL | YES | BTREE | | |
| table1 | 1 | index_on_stat | 1 | stat | A | 187 | NULL | NULL | YES | BTREE | | |
+--------------+------------+--------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
and table2 indexes are.
+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table2 | 0 | PRIMARY | 1 | id | A | 28578039 | NULL | NULL | | BTREE | | |
| table2 | 1 | index_on_post | 1 | post | A | 28578039 | NULL | NULL | YES | BTREE | | |
| table2 | 1 | index_on_ver | 1 | ver | A | 1371 | NULL | NULL | YES | BTREE | | |
| table2 | 1 | index_on_imb | 1 | imb | A | 28578039 | NULL | NULL | YES | BTREE | | |
+-------+------------+---------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
How can the execution time of this query be improved?
Here is the updated explain
EXPLAIN SELECT STRAIGHT_JOIN a.id, b.id from table1 AS a JOIN b AS b
ON a.plane=substring(b.imb,1,20)
WHERE a.table2_id IS NULL
AND (a.stat LIKE "f%" OR a.stat LIKE "F%");
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+
| 1 | SIMPLE | a | ref | index_on_plane,index_on_table2_id,index_on_stat | index_on_table2_id | 5 | const | 500543 | Using where |
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 28578039 | Using where; Using join buffer |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+-------------------------------+---------+-------+----------+--------------------------------+
SQL fiddle link http://www.sqlfiddle.com/#!2/362a6/4
Your schema dooms your query to slowness, in at least three ways. You are going to need to modify your schema to get anything like decent performance out of this. I can see three ways to fix your schema.
First way (probably very easy to fix):
a.stat LIKE "f%" OR a.stat LIKE "F%"
This OR operation likely doubles the runtime of your query. But if you set the collation of your stat column to something case-insensitive you can change this to
a.stat LIKE "f%"
You already have an index on this column.
Second way (maybe not so hard to fix). This clause definitively defeats the use of an index; they're useless when NULL values are involved.
WHERE a.table2_id IS NULL
Can you change the definition of table2_id to NOT NULL and provide a default value (perhaps zero) to indicate missing data? If so, you'll be in good shape because you'll be able to change this to a search predicate that uses an index.
WHERE a.table2_id = 0
Third way (probably hard). The presence of the function in this clause defeats the use of an index in joining.
WHERE ... a.plane = SUBSTRING(b.imb, 1, 20)
You need to make an extra column (yeah, yeah, in Oracle it could be a function index, but who has that kind of money?) called b.plane or something with that substring stored in it.
If you do all this stuff and refactor your query just a bit, here's what it will look like:
SELECT a.id AS aid,
b.id AS bid
FROM table1 AS a
JOIN table2 AS b ON a.plane = b.plane /* the new column */
WHERE a.stat LIKE 'f%'
AND a.table2_id = 0
Finally, you can probably tweak this performance up a bit by creating the following compound indexes as covering indexes for the query. Look up covering indexes if you're not sure what that means.
table1 (table2_id, stat, plane, id)
table2 (plane, id) /* plane is your new column */
There's a tradeoff in covering indexes: they slow down insertion and update operations, but speed up queries. Only you have enough information to make that tradeoff wisely.
Column on which join operation is getting perform must be indexed and MySQL optimiser should use it for better performance. It will minimise the number of rows examined (join size)
Try this one
SELECT STRAIGHT_JOIN a.id, b.id from table1 AS a JOIN table2 AS b ON a.plane=substring(b.imb,1,20)
WHERE a.table2__id IS NULL and (a.stat LIKE "f%" OR a.stat LIKE "F%")
Check the execution plan first. If it is even not using the index_on_imb index, create one composite index combining table2.imb and table2.id in which table2.imb would be top in order.
An derived table may improve performance in this case depends on this indexes index_on_table2_id,index_on_stat..
SELECT a.id, b.id from table1 AS a, table2 AS b
WHERE a.table2_id IS NULL
AND a.plane = SUBSTRING(b.imb, 1, 20)
AND (a.stat LIKE "f%" OR a.stat LIKE "F%")
May be rewritten to..
The derived table will force MySQL into checking 500543 rows like the last explain said
SELECT a.id, b.id
FROM (SELECT plane FROM table1 WHERE (a.table2_id IS NULL) AND (a.stat LIKE "f%" OR a.stat LIKE "F%")) a
INNER JOIN table2 b
ON a.plane = SUBSTRING(b.imb, 1, 20)
Aside from my comment about ID colmns, it appears you are trying to back-fill a join on the "plane" instead of by the ID columns. If I am correct, you want all records from table2 where there is no match in table1
select
a.id,
b.id
from
table2 b
left join table1 a
on b.id = a.table2_id
AND substr( b.imb, 1, 20 ) = a.plane
AND ( a.stat LIKE "f%"
OR a.stat LIKE "F%")
where
a.table2_id is null
Also, to help the index join, I would have covering indexes so the engine does not have to go back to the raw data to get qualifying records.
table1 -- index ( plane, stat, table2_id, id )
table2 -- index ( imb, id )
But again, please clarify basis of table join do or do not have it based on a Key... Per the sample columns of table1 having a column table2_id, I am GUESSING this relates to table2.id.
The purpose of doing a left-join basically says... For each record in the left-side table (in my example table2), join to the right-side table (table1) on whatever criteria/conditions -- now using the KEY ID column as primary basis, then the plane and status setting.
So, even though I'm doing a join between the two tables on the table2_id, if it DOES find a match, it will be excluded... Only when it does NOT find a match will it be included.
Finally, since you are hiding the true basis of the tables, you are leaving it to guessing work of those helping. Even if it was "personal" type of data, you are not showing any data, just how do I get it. Having a better mental image of what you are looking to get is better than bogus table/column names with limited context.
How can I avoid a full table scan when performing inner joins in MySQL using IN in WHERE clause? For example:
explain SELECT
-> COUNT(DISTINCT(n.nid))
-> FROM node n
-> INNER JOIN term_node tn ON n.nid = tn.nid
-> INNER JOIN content_type_article ca ON n.nid = ca.nid
-> WHERE tn.tid IN (67,100)
-> ;
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
| 1 | SIMPLE | tn | ALL | PRIMARY,nid | NULL | NULL | NULL | 42180 | Using where |
| 1 | SIMPLE | ca | ref | nid,field_article_date_nid_index | nid | 4 | drupal_mm_qas.tn.nid | 1 | Using index |
| 1 | SIMPLE | n | eq_ref | PRIMARY | PRIMARY | 4 | drupal_mm_qas.ca.nid | 1 | Using where; Using index |
+----+-------------+-------+--------+----------------------------------+---------+---------+----------------------+-------+--------------------------+
3 rows in set (0.00 sec)
It seems you're filtering by a column that mysql identified to be not selective enough. When a filter's cardinality is too low (i.e, the number of distinct rows for that filter is low), mysql thinks, most of the time accurately, that a fts would be faster.
To confirm, please show the result of SELECT COUNT(DISTINCT tn.tid) FROM term_node tn and SELECT COUNT(*) FROM term_node tn
I am using MySQl to category name from 1 table based on the Category ID in a "Module" table.
I have the below SQL working fine for my needs but I am wanting to know if this is considered a JOIN or not?
Since it does not call a JOIN ?
SELECT `mo_category_fk` , `mo_name_vc` , `mc_name_vc`
FROM x_modcats mc, x_modules m
WHERE mc.mc_id_pk = m.mo_category_fk
AND m.mo_folder_vc = :module
Yes - In MySQL implicit and explicit joins have identical execution plans. You can verify this with EXPLAIN. But here is a sample from another thread:
mysql> explain select * from table1 a inner join table2 b on a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| 1 | SIMPLE | b | ALL | PRIMARY | NULL | NULL | NULL | 986 | |
| 1 | SIMPLE | a | ref | pid | pid | 4 | schema.b.pid | 70 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.02 sec)
mysql> explain select * from table1 a, table2 b where a.pid = b.pid;
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
| 1 | SIMPLE | b | ALL | PRIMARY | NULL | NULL | NULL | 986 | |
| 1 | SIMPLE | a | ref | pid | pid | 4 | schema.b.pid | 70 | |
+----+-------------+-------+------+---------------+------+---------+--------------+------+-------+
2 rows in set (0.00 sec)
Yes you are joining. Per the documentation, , can be used as a substitute for the JOIN keyword .. except you can't use the very helpful ON clause. However, you have a condition that connects the tables in the WHERE clause. In my opinion, it makes more sense to do it as part of the FROM clause:
SELECT mo_category_fk, mo_name_vc, mc_name_vc
FROM x_modcats mc
JOIN x_modules m ON (mc.mc_id_pk = m.mo_category_fk)
WHERE m.mo_folder_vc = :module
The object of my query is to get all rows from table a where gender = f and username does not exist in table b where campid = xxxx. Here is the query I am using with success:
SELECT `id`
FROM pool
LEFT JOIN sent
ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
The problem is that the query takes over 9 minutes to complete, the pool table contains over 10 million rows and the sent table is eventually going to grow even larger than that. I have created indexes for many of the columns including username and gender. However, MySQL refuses to use any of my indexes for this query. I even tried using FORCE INDEX. Here are my indexes from pool and the output of EXPLAIN for my query:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pool | 0 | PRIMARY | 1 | id | A | 9326880 | NULL | NULL | | BTREE | |
| pool | 1 | username | 1 | username | A | 9326880 | NULL | NULL | | BTREE | |
| pool | 1 | source | 1 | source | A | 6 | NULL | NULL | | BTREE | |
| pool | 1 | gender | 1 | gender | A | 9 | NULL | NULL | | BTREE | |
| pool | 1 | location | 1 | location | A | 59030 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
mysql> explain SELECT `id` FROM pool FORCE INDEX (username) LEFT JOIN sent ON pool.username = sent.username AND sent.campid = 'YA1LGfh9' WHERE sent.username IS NULL AND pool.gender = 'f';
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
| 1 | SIMPLE | pool | ALL | NULL | NULL | NULL | NULL | 9326881 | Using where |
| 1 | SIMPLE | sent | ALL | NULL | NULL | NULL | NULL | 351 | Using where; Not exists |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------------------+
2 rows in set (0.00 sec)
also, here are my indexes for the sent table:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| sent | 0 | PRIMARY | 1 | primary_key | A | 351 | NULL | NULL | | BTREE | |
| sent | 1 | username | 1 | username | A | 351 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
2 rows in set (0.00 sec)
You can see that no indexes are not being used and so my query takes extremely too long. If anyone has a solution that involves reworking the query, please show me an example of how to do it using my data structure so that I won't have any confusion of how to implement and test. Thank you.
First, your original query was correct in your placement of everything... including the camp. By using a LEFT JOIN from Pool to Sent, and then pulling a required equality such as "CAMP" into the WHERE clause as previously suggested is ultimately converting that into an INNER JOIN, thus requiring entry on both sides. Leave it as you had it.
You already have an index on user name on the sent table, but I would do the following.
build an index on the "sent" table on (CampID, UserName) as a composite (ie: multiple key) index. This way the left join will be optimized for BOTH entries.
On your "pool" table, try a composite index on 3 fields of (gender, username, id ).
By doing this, you can take advantage of NOT having to go through all the actual pages of data that encompass your 10+ million records. Since the index HAS the columns for compare, it doesn't have to find the actual record and look at the columns, it can use those of the index directly.
Also, for grins, I added keyword "STRAIGHT_JOIN" which tells MySQL to query exactly as I show and don't try to think for me. MANY times, I've found this to significantly improve query performance... On very few have I been given feedback that it has NOT helped.
SELECT STRAIGHT_JOIN
p.id
FROM
pool p
LEFT JOIN sent s
ON s.campid = 'YA1LGfh9'
AND p.username = s.username
WHERE
p.gender = 'f'
AND s.username IS NULL
All that said, you are still going to be returning how many records out of the 10+ million... if the pool has 10+ million, and the single camp only has 5,000. You will still be returning almost the entire set.