using an index when SELECTing from a MySQL join

using an index when SELECTing from a MySQL join - mysql

I have the following two MySQL/MariaDB tables:
CREATE TABLE requests (
request_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
unix_timestamp DOUBLE NOT NULL,
[...]
INDEX unix_timestamp_index (unix_timestamp)
);
CREATE TABLE served_objects (
request_id BIGINT UNSIGNED NOT NULL,
object_name VARCHAR(255) NOT NULL,
[...]
FOREIGN KEY (request_id) REFERENCES requests (request_id)
);
There are several million columns in each table. There are zero or more served_objects per request. I have a view that provides a complete served_objects view by joining these two tables:
CREATE VIEW served_objects_view AS
SELECT
r.request_id AS request_id,
unix_timestamp,
object_name
FROM requests r
RIGHT JOIN served_objects so ON r.request_id=so.request_id;
This all seems pretty straightforward so far. But when I do a simple SELECT like this:
SELECT * FROM served_objects_view ORDER BY unix_timestamp LIMIT 5;
It takes a full minute or more. It's obviously not using the index. I've tried many different approaches, including flipping things around and using a LEFT or INNER join instead, but to no avail.
This is the output of the EXPLAIN for this SELECT:
+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+
| 1 | SIMPLE | so | ALL | NULL | NULL | NULL | NULL | 5196526 | Using temporary; Using filesort |
| 1 | SIMPLE | r | eq_ref | PRIMARY | PRIMARY | 8 | db.so.request_id | 1 | |
+------+-------------+-------+--------+---------------+---------+---------+------------------+---------+---------------------------------+
Is there something fundamental here that prevents the index from being used? I understand that it needs to use a temporary table to satisfy the view and that that's interfering with the ability to use the index. But I'm hoping that some trick exists that will allow me SELECT from the view while honouring the indexes in the requests table.

You're using a notorious performance antipattern.
SELECT * FROM served_objects_view ORDER BY unix_timestamp LIMIT 5;
You've told the query planner to make a copy of your whole view (in RAM or temp storage), sort it, and toss out all but five rows. So, it obeyed. It really didn't care how long it took.
SELECT * is generally considered harmful to query performance, and this is the kind of case why that's true.
Try this deferred-join optimization
SELECT a.*
FROM served_objects_view a
JOIN (
SELECT request_id
FROM served_objects_view
ORDER BY unix_timestamp
LIMIT 5
) b ON a.request_id = b.request_id
This sorts a smaller subset of data (just the request_id and timestamp values). It then fetches a small subset of the view's rows.
If it's still too slow for your purposes, try creating a compound index on request (unix_timestamp, request_id). But that's probably unnecessary. If it is necessary, concentrate on optimizing the subquery.
Remark: RIGHT JOIN? Really? Don't you want just JOIN?

VIEWs are not always well-optimized. Does the query run slow when you use the SELECT? Have you added the suggested index?
What version of MySQL/MariaDB are you using? There may have been optimization improvements in newer versions, and an upgrade might help.
My point is, you may have to abandon VIEW.

The answer provided by O. Jones was the right approach; thanks! The big saviour here is that if the inner SELECT refers only to columns from the requests table (such as the case when SELECTing only request_id), the optimizer can satisfy the view without performing a join, making it lickety-split.
I had to make two adjustments, though, to make it produce the same results as the original SELECT. First, if non-unique request_ids are returned by the inner SELECT, the outer JOIN creates a cross-product of these non-unique entries. These duplicate rows can be effectively discarded by changing the outer SELECT into a SELECT DISTINCT.
Second, if the ORDER BY column can contain non-unique values, the result can contain irrelevant rows. These can be effectively discarded by also SELECTing orderByCol and adding AND a.orderByCol = b.orderByCol to the JOIN rule.
So my final solution, which works well if orderByCol comes from the requests table, is the following:
SELECT DISTINCT a.*
FROM served_objects_view a
JOIN (
SELECT request_id, <orderByCol> FROM served_objects_view
<whereClause>
ORDER BY <orderByCol> LIMIT <startRow>,<nRows>
) b ON a.request_id = b.request_id AND a.<orderByCol> = b.<orderByCol>
ORDER BY <orderByCol>;
This is a more convoluted solution than I was hoping for, but it works, so I'm happy.
One final comment. An INNER JOIN and a RIGHT JOIN are effectively the same thing here, so I originally formulated it in terms of a RIGHT JOIN because that's the way I was conceptualizing it. However, after some experimentation (after your challenge) I discovered that an INNER join is much more efficient. (It's what allows the optimizer to satisfy the view without performing a join if the inner SELECT refers only to columns from the requests table.) Thanks again!

Related

Optimize query with 1 join, on tables with 10+ millions rows

I am looking at making a request using 2 tables faster.
I have the following 2 tables :
Table "logs"
id varchar(36) PK
date timestamp(2)
more varchar fields, and one text field
That table has what the PHP Laravel Framework calls a "polymorphic many to many" relationship with several other objects, so there is a second table "logs_pivot" :
id unsigned int PK
log_id varchar(36) FOREIGN KEY (logs.id)
model_id varchar(40)
model_type varchar(50)
There is one or several entries in logs_pivot per entry in logs. They have 20+ and 10+ millions of rows, respectively.
We do queries like so :
select * from logs
join logs_pivot on logs.id = logs_pivot.log_id
where model_id = 'some_id' and model_type = 'My\Class'
order by date desc
limit 50;
Obviously we have a compound index on both the model_id and model_type fields, but the requests are still slow : several (dozens of) seconds every times.
We also have an index on the date field, but an EXPLAIN show that this is the model_id_model_type index that is used.
Explain statement:
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
| 1 | SIMPLE | logs_pivot | NULL | ref | logs_pivot_model_id_model_type_index,logs_pivot_log_id_index | logs_pivot_model_id_model_type_index | 364 | const,const | 1 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | logs | NULL | eq_ref | PRIMARY | PRIMARY | 146 | the_db_name.logs_pivot.log_id | 1 | 100.00 | NULL |
+----+-------------+-------------+------------+--------+--------------------------------------------------------------------------------+-----------------------------------------------+---------+-------------------------------------------+------+----------+---------------------------------+
In other tables, I was able to make a similar request much faster by including the date field in the index. But in that case they are in a separate table.
When we want to access these data, they are typically a few hours/days old.
Our InnoDB pools are much too small to hold all that data (+ all the other tables) in memory, so the data is most probably always queried on disk.
What would be all the ways we could make that request faster ?
Ideally only with another index, or by changing how it is done.
Thanks a lot !
Edit 17h05 :
Thank you all for your answers so far, I will try something like O Jones suggest, and also to somehow include the date field in the pivot table, so that I can include in the index index.
Edit 14/10 10h.
Solution :
So I ended up changing how the request was really done, by sorting on the id field of the pivot table, which indeed allow to put it in an index.
Also the request to count the total number of rows is changed to only be done on the pivot table, when it is not filtered by date.
Thank you all !

Just a suggestion. Using a compound index is obviously a good thing. Another might be to pre-qualify an ID by date, and extend your index based on your logs_pivot table indexing on (model_id, model_type, log_id ).
If your querying data, and the entire history is 20+ million records, how far back does the data go where you are only dealing with getting a limit of 50 records per given category of model id/type. Say 3-months? vs say your log of 5 years? (not listed in post, just a for-instance). So if you can query the minimum log ID where the date is greater than say 3 months back, that one ID can limit what else is going on from your logs_pivot table.
Something like
select
lp.*,
l.date
from
logs_pivot lp
JOIN Logs l
on lp.log_id = l.id
where
model_id = 'some_id'
and model_type = 'My\Class'
and log_id >= ( select min( id )
from logs
where date >= datesub( curdate(), interval 3 month ))
order by
l.date desc
limit
50;
So, the where clause for the log_id is done once and returns just an ID from as far back as 3 months and not the entire history of the logs_pivot. Then you query with the optimized two-part key of model id/type, but also jumping to the end of its index with the ID included in the index key to skip over all the historical.
Another thing you MAY want to include are some pre-aggregate tables of how many records such as per month/year per given model type/id. Use that as a pre-query to present to users, then you can use that as a drill-down to further get more detail. A pre-aggregate table can be done on all the historical stuff once since it would be static and not change. The only one you would have to constantly update would be whatever the current single month period is, such as on a nightly basis. Or even possibly better, via a trigger that either inserts a record every time an add is done, or updates a count for the given model/type based on year/month aggregations. Again, just a suggestion as no other context on how / why the data will be presented to the end-user.

I see two problems:
UUIDs are costly when tables are huge relative to RAM size.
The LIMIT cannot be handled optimally because the WHERE clauses come from one table, but the ORDER BY column comes from another table. That is, it will do all of the JOIN, then sort and finally peel off a few rows.

SELECT columns FROM big table ORDER BY something LIMIT small number is a notorious query performance antipattern. Why? the server sorts a whole mess of long rows then discards almost all of them. It doesn't help that one of your columns is a LOB -- a TEXT column.
Here's an approach that can reduce that overhead: Figure out which rows you want by finding the set of primary keys you want, then fetch the content of only those rows.
What rows do you want? This subquery finds them.
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND logs_pivot.model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
This does all the heavy lifting of working out the rows you want. So, this is the query you need to optimize.
It can be accelerated by this index on logs
CREATE INDEX logs_date_desc ON logs (date DESC);
and this three-column compound index on logs_pivot
CREATE INDEX logs_pivot_lookup ON logs_pivot (model_id, model_type, log_id);
This index is likely to be better, since the Optimizer will see the filtering on logs_pivot but not logs. Hence, it will look in logs_pivot first.
Or maybe
CREATE INDEX logs_pivot_lookup ON logs_pivot (log_id, model_id, model_type);
Try one then the other to see which yields faster results. (I'm not sure how the JOIN will use the compound index.) (Or simply add both, and use EXPLAIN to see which one it uses.)
Then, when you're happy -- or satisfied anyway -- with the subquery's performance, use it to grab the rows you need, like this
SELECT *
FROM logs
WHERE id IN (
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
)
ORDER BY date DESC
This works because it sorts less data. The covering three-column index on logs_pivot will also help.
Notice that both the sub query and main query have ORDER BY clauses, to make sure the returned detail result set is in the order you need.
Edit Darnit, been on MariaDB 10+ and MySQL 8+ so long I forgot about the old limitation. Try this instead.
SELECT *
FROM logs
JOIN (
SELECT id
FROM logs
JOIN logs_pivot
ON logs.id = logs_pivot.log_id
WHERE logs_pivot.model_id = 'some_id'
AND model_type = 'My\Class'
ORDER BY logs.date DESC
LIMIT 50
) id_set ON logs.id = id_set.id
ORDER BY date DESC
Finally, if you know you only care about rows newer than some certain time you can add something like this to your subquery.
AND logs.date >= NOW() - INTERVAL 5 DAY
This will help a lot if you have tonnage of historical data in your table.

Query speed drops on two "=" comparisons in WHERE clause

I have a music database with a table for releases and the release titles. This "releases_view" gets the title/title_id and the alternative title/alternative title_id of a track. This is the code of the view:
SELECT
t1.`title` AS title,
t1.`id` AS title_id,
t2.`title` AS title_alt,
t2.`id` AS title_alt_id
FROM
releases
LEFT JOIN titles t1 ON t1.`id`=`releases`.`title_id`
LEFT JOIN titles t2 ON t2.`id`=`releases`.`title_alt_id`
The title_id and title_alt_id fields in the joined tables are both int(11), title and title_alt are varchars.
The issue
This query will take less than 1 ms:
SELECT * FROM `releases_view` WHERE title_id=12345
This query will take less then 1 ms, too:
SELECT * FROM `releases_view` WHERE title_id=12345 OR title_alt_id!=54321
BUT: This query will take 0,2 s. It's 200 times slower!
SELECT * FROM `releases_view` WHERE title_id=20956 OR title_alt_id=38849
As soon I have two comparisons using "=" in the WHERE clause, things really get slow (although all queries only have a couple of results).
Can you help me to understand what is going on?
EDIT
´EXPLAIN´ shows a USING WHERE for the title_alt_id, but I do not understand why. How can I avoid this?
** EDIT **
Here is the EXPLAIN DUMP.
id select_type table partitions type possible_keys key key_len ref rows Extra
1 SIMPLE releases NULL ALL NULL NULL NULL NULL 76802 Using temporary; Using filesort
1 SIMPLE t1 NULL eq_ref PRIMARY PRIMARY 4 db.releases.title_id 1
1 SIMPLE t2 NULL eq_ref PRIMARY PRIMARY 4 db.releases.title_alt_id 1 Using where

The "really slow" is because the Optimizer does not work well with OR.
Plan A (of the Optimizer): Scan the entire table, evaluating the entire OR.
Plan B: "Index Merge Union" could be used for title_id = 20956 OR title_alt_id = 38849 if you have separate indexes in title_id and title_alt_id: use each index to get two lists of PRIMARY KEYs and "merge" the lists, then reach into the table to get *. Multiple steps, not cheap. So Plan B is rarely used.
title_id = 12345 OR title_alt_id != 54321 is a mystery, since it should return most of the table. Please provide EXPLAIN SELECT....
LEFT JOIN (as opposed to JOIN) needs to assume that the row may be missing in the 'right' table.

MySql Indexes are not applied in GROUP BY

I have two tables to make my search engine, one containing all keywords and the other contains all the possible targets for each keyword.
Table: keywords
id (int)
keyword (varchar)
Table: results
id (int)
keyword_id (int)
table_id (int)
target_id (int)
For both tables, I set MyISAM as storage engine since 95% of the times I am just running select queries on these tables and in 5% of the times, insert queries. And off course, I already compared the performance using InnoDB and the performance was poor considering my later queries.
I also added the following indexes
keywords.keyword (unique)
results.keyword_id (index)
results.table_id (index)
results.target_id (index)
in the keywords table, I have about 1.2 million records and in results table I have about 9.8 million records.
Now the issue is that I run the following query and the results is made in 0.0014 seconds
SELECT rs.table_id, rs.target_id
FROM keywords ky INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%" OR ky.keyword LIKE "y%"
But when I add GROUP BY, the result is made in 0.2 seconds
SELECT rs.table_id, rs.target_id
FROM keywords ky INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%" OR ky.keyword LIKE "y%"
GROUP BY rs.table_id, rs.target_id
I tested composite indexes, single column indexes and even dropping table_id and target_id indexes but in all the cases the performance is the same and it seems that in Group By clause, the index is not applied.
The explain plan shows that:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | ky | range | PRIMARY,keyword | keyword | 767 | NULL | 3271 | Using index condition; Using where; Using temporary; Using filesort
1 | SIMPLE | rs | ref | keyword_id | keyword_id | 4 | ky.id | 3
I have the following composite key already added
ALTER TABLE results ADD INDEX `table_id` (`table_id`, `target_id`) USING BTREE;

Here's MySQL documentation for GROUP BY optimization, this is what it says:
The most important preconditions for using indexes for GROUP BY are
that all GROUP BY columns reference attributes from the same index
So, if you have different index on these two columns, they won't be used by GROUP BY. You should try creating a composite index on table_id and target_id.
Also, the query seem to be using LIKE operator. Please note that if the value being compared in LIKE has leading wildcard in it then MySQL won't be able to use any index for that column anyway. Have a look at explain plan of the query and see which indices are used.

JOIN + GROUP BY (or DISTINCT) is what I call "explode-implode" -- First the JOIN multiplies the number of 'rows' to look at, then the GROUP BY deflates the row count.
One work around to avoid this is to focus on the primary table, then check for EXISTS in the other table:
SELECT rs.table_id, rs.target_id
FROM keywords ky
WHERE EXISTS(
SELECT 1
FROM results rs
WHERE ky.id = rs.keyword_id
AND ( ky.keyword LIKE "x%"
OR ky.keyword LIKE "y%" )
);
rs requires INDEX(keyword_id).
An improvement on that might be to get rid of the OR via
WHERE ky.id = rs.keyword_id
AND ky.keyword REGEXP "^[xy]"
But that is not very helpful since it still needs to fully check keyword.
Another improvement could be to turn the OR into UNION:
( SELECT rs.table_id, rs.target_id
FROM keywords ky
INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "x%"
) UNION ALL
( SELECT rs.table_id, rs.target_id
FROM keywords ky
INNER JOIN results rs ON ky.id=rs.keyword_id
WHERE ky.keyword LIKE "y%"
)
ky: INDEX(keyword, id)
rs: INDEX(keyword_id)
The advantage here (other than avoiding the inflate-deflate) is that the index can be used on.
(Please provide SHOW CREATE TABLE for both tables; there may be other tips.)

What does MySQL perform first: The `WHERE` clause or the `ORDER BY` clause?

What does MySQL perform first: The WHERE clause or the ORDER BY clause?
The reason I ask is to determine whether I should add an index to a given column.
I have a table such as the following:
| Column | Type | Index |
|-----------|-------------|-------|
| id | INT (pk) | Yes |
| listorder | INT | ?? |
| data | VARCHAR(16) | No |
| fk | INT (fk) | Yes |
I will often execute queries such as SELECT id, data FROM mytable WHERE fk=12345 ORDER BY listorder ASC. For my data set, it will only result in a small number of records (~5) for a given fk, however, there are many records in the table with many fk values, and many duplicated listorder values spanning the many fk values.
If the WHERE clause is performed first, then I expect I shouldn't add an index to listorder as it will result in UPDATE performance degradation without significant improvement for SELECT.

The way SQL (all makes and models of servers) uses indexes to satisfy queries is a little more complex than you're assuming. Usually a query gets satisfied by filtering first (WHERE) then ordering.
For the exact query you showed us, if you have a compound index on (fk, listorder) the SQL engine will be able to use the index to satisfy both clauses of your query. The index will first be random-accessed by the WHERE clause, then it will be already in the order needed to satisfy your sorting clause.
Read this: http://use-the-index-luke.com/
Updating a compound index is not much more expensive than updating a single column index. Either way, using an index is better than having to scan the table for a WHERE operation.

The WHERE clause is evaluated first. I think this is always true in MySQL, but there might be an occasional exception (at least in other databases there is).
For this query:
SELECT id, data
FROM mytable
WHERE fk = 12345
ORDER BY listorder ASC;
The most practical index is mytable(fk, listorder).

In SQL processing, the WHERE clause is considered an implicit join statement. In fact it is even equivalent to INNER JOIN among query optimizers. Older ANSI syntax only adopted INNER JOIN in the early 90s. Many older SQL select statements read as below:
SELECT *
FROM table1, table2
WHERE table1.ID = table2.ID
which later the gold standard is as follows:
SELECT *
FROM table1
INNER JOIN table2
ON table1.ID = table2.ID
However both statements are equivalent. But many argue INNER JOIN is more human readable. See this hearty SO post on INNER vs WHERE.
Unlike most programming languages, in SQL the order of syntax does not determine order of processing. Ironically though, the last line ORDER BY (unless TOP or LIMIT is declared) is usually the very last step and WHERE among the first just after the FROM clause:
FROM table source
JOIN condition
WHERE condition
GROUP BY expression
HAVING condition
SELECT fields
ORDER BY fields
Essentially, the engine structures the table and/or virtual tables determined by FROM, JOIN, and WHERE clauses. Once that structure is set up, then aggregation, field selection, and ordering is handled. So you could not order the table before you have the table!
Indices help in nearly all aspects of the processing. Setting an index on ORDER BY would not lead to performance degradation. But aligning WHERE and ORDER BY can facilitate sorting optimization. See this MySQL reference. In fact, MySQL is known to leave out indices if not needed.

MySQL query painfully slow on large data

I'm no MySQL whiz but I get it, I have just inherited a pretty large table (600,000 rows and around 90 columns (Please kill me...)) and I have a smaller table that I've created to link it with a categories table.
I'm trying to query said table with a left join so I have both sets of data in one object but it runs terribly slow and I'm not hot enough to sort it out; I'd really appreciate a little guidance and explanation as to why it's so slow.
SELECT
`products`.`Product_number`,
`products`.`Price`,
`products`.`Previous_Price_1`,
`products`.`Previous_Price_2`,
`products`.`Product_number`,
`products`.`AverageOverallRating`,
`products`.`Name`,
`products`.`Brand_description`
FROM `product_categories`
LEFT OUTER JOIN `products`
ON `products`.`product_id`= `product_categories`.`product_id`
WHERE COALESCE(product_categories.cat4, product_categories.cat3,
product_categories.cat2, product_categories.cat1) = '123456'
AND `product_categories`.`product_id` != 0
The two tables are MyISAM, the products table has indexing on Product_number and Brand_Description and the product_categories table has a unique index on all columns combined; if this info is of any help at all.
Having inherited this system I need to get this working asap before I nuke it and do it properly so any help right now will earn you my utmost respect!
[Edit]
Here is the output of the explain extended:
+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | product_categories | index | NULL | cat1 | 23 | NULL | 1224419 | 100.00 | Using where; Using index |
| 1 | SIMPLE | products | ALL | Product_id | NULL | NULL | NULL | 512376 | 100.00 | |
+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+

Optimize Table
To establish a baseline, I would first recommend running an OPTIMIZE TABLE command on both tables. Please note that this might take some time. From the docs:
OPTIMIZE TABLE should be used if you have deleted a large part of a
table or if you have made many changes to a table with variable-length
rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns).
Deleted rows are maintained in a linked list and subsequent INSERT
operations reuse old row positions. You can use OPTIMIZE TABLE to
reclaim the unused space and to defragment the data file. After
extensive changes to a table, this statement may also improve
performance of statements that use the table, sometimes significantly.
[...]
For MyISAM tables, OPTIMIZE TABLE works as follows:
If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the table's statistics are not up to date (and the repair could not be accomplished by sorting the index), update them.
Indexing
If space and index management isn't a concern, you can try adding a composite index on
product_categories.cat4, product_categories.cat3, product_categories.cat2, product_categories.cat1
This would be advised if you use a leftmost subset of these columns often in your queries. The query plan indicates that it can use the cat1 index of product_categories. This most likely only includes the cat1 column. By adding all four category columns to an index, it can more efficiently seek to the desired row. From the docs:
MySQL can use multiple-column indexes for queries that test all the
columns in the index, or queries that test just the first column, the
first two columns, the first three columns, and so on. If you specify
the columns in the right order in the index definition, a single
composite index can speed up several kinds of queries on the same
table.
Structure
Furthermore, given that your table has 90 columns you should also be aware that a wider table can lead to slower query performance. You may want to consider Vertically Partitioning your table into multiple tables:
Having too many columns can bloat your record size, which in turn
results in more memory blocks being read in and out of memory causing
higher I/O. This can hurt performance. One way to combat this is to
split your tables into smaller more independent tables with smaller
cardinalities than the original. This should now allow for a better
Blocking Factor (as defined above) which means less I/O and faster
performance. This process of breaking apart the table like this is a
called a Vertical Partition.

The meaning of your query seems to be "find all products that have the category '123456'." Is that correct?
COALESCE is an extraordinarily expensive function to use in a WHERE statement, because it operates on index-hostile NULL values. Your explain result shows that your query is not being very selective on your product_categories table. In MySQL you need to avoid functions in WHERE statements altogether if you want to exploit indexes to make your queries fast.
The thing someone else said about 90-column tables being harmful is also true. But you're stuck with it, so let's just deal with it.
Can we rework your query to get rid of the function-based WHERE? Let's try this.
SELECT /* some columns from the products table */
FROM products
WHERE product_id IN
(
SELECT DISTINCT product_id
FROM product_categories
WHERE product_id <> 0
AND ( cat1='123456'
OR cat2='123456'
OR cat3='123456'
OR cat4='123456')
)
For this to work fast you're going to need to create separate indexes on your four cat columns. The composite unique index ("on all columns combined") is not going to help you. It still may not be so good.
A better solution might be FULLTEXT searching IN BOOLEAN MODE. You're working with the MyISAM access method so this is possible. It's definitely worth a try. It could be very fast indeed.
SELECT /* some columns from the products table */
FROM products
WHERE product_id IN
(
SELECT product_id
FROM product_categories
WHERE MATCH(cat1,cat2,cat3,cat4)
AGAINST('123456' IN BOOLEAN MODE)
AND product_id <> 0
)
For this to work fast you're going to need to create a FULLTEXT index like so.
CREATE FULLTEXT INDEX cat_lookup
ON product_categories (cat1, cat2, cat3, cat4)
Note that neither of these suggested queries produce precisely the same results as your COALESCE query. The way your COALESCE query is set up, some combinations won't match it that will match these queries. For example.
cat1 cat2 cat3 cat4
123451 123453 123455 123456 matches your and my queries
123456 123455 123454 123452 matches my queries but not yours
But it's likely that my queries will produce a useful list of products, even if it has a few more items in yours.
You can debug this stuff by just working with the inner queries on product_categories.

There is something strange. Does the table product_categories indeed have a product_id column? Shouldn't the from and where clauses be like this:
FROM `product_categories` pc
LEFT OUTER JOIN `products` p ON p.category_id = pc.id
WHERE
COALESCE(product_categories.cat4, product_categories.cat3,product_categories.cat2, product_categories.cat1) = '123456'
AND pc.id != 0

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008