Select two (or more) consecutive rows in a MySQL table - mysql

The question in short
What is an efficient, scalable way of selecting two (or more) rows from a table with consecutive IDs, especially if this table is joined with another table?
Related questions have been asked before on Stack Overflow, e.g.:
SQL check adjacent rows for sequence
How select where there are 2 consecutives rows with a specific value using MySQL?
The answers to these questions suggest a self-join. My working example described below uses that suggestion, but it performs very, very poorly on larger data sets. I've ran out of ideas how to improve it, and I'd really appreciate your input.
The issue in detail
Let's assume I were developing a database that keeps track of ball possession during a football/soccer match (please understand that I can't disclose the purpose of my real application). I require an efficient, scalable way that allows me to query changes of ball possession from one player to another (i.e. passes). For example, I might be interested in a list of all passes from any defender to any forward.
Mock database structure
My mock database consists of two tables, The first table Players stores the players' names in the Name column and their position (GOA, DEF, MID, FOR for goalie, defender, midfield, forward) in the POS column.
The second table Possession keeps track of ball possession. Whenever ball possession changes, i.e. the ball is passed to a new player, a row is added to this table. The primary key ID also indicates the temporal order of possession changes: consecutive IDs indicate an immediate sequence of ball possessions.
CREATE TABLE Players(
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
POS VARCHAR(3) NOT NULL,
Name VARCHAR(7) NOT NULL);
CREATE TABLE Possession(
ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
PlayerID INT NOT NULL);
Next, we create some indices:
CREATE INDEX POS ON Players(POS);
CREATE INDEX Name ON Players(Name);
CREATE INDEX PlayerID ON Possession(PlayerID);
Now, we populate the Players table with a few players, and also add test entries to the Possession table:
INSERT INTO Players (POS, Name) VALUES
('DEF', 'James'), ('DEF', 'John'), ('DEF', 'Michael'),
('DEF', 'David'), ('MID', 'Charles'), ('MID', 'Thomas'),
('MID', 'Paul'), ('FOR', 'Bob'), ('GOAL', 'Kenneth');
INSERT INTO Possession (PlayerID) VALUES
(1), (8), (2), (5), (3), (8), (3), (9), (6), (4), (7), (9);
Let's quickly check our database by joining the Possession and the Players table:
SELECT Possession.ID, PlayerID, POS, Name
FROM
Possession
INNER JOIN Players ON Possession.PlayerID = Players.ID
ORDER BY Possession.ID;
This looks good:
+----+----------+-----+---------+
| ID | PlayerID | POS | Name |
+----+----------+-----+---------+
| 1 | 1 | DEF | James |
| 2 | 8 | FOR | Bob |
| 3 | 2 | DEF | John |
| 4 | 5 | MID | Charles |
| 5 | 3 | DEF | Michael |
| 6 | 8 | FOR | Bob |
| 7 | 3 | DEF | Michael |
| 8 | 9 | GOA | Kenneth |
| 9 | 6 | MID | Thomas |
| 10 | 4 | DEF | David |
| 11 | 7 | MID | Paul |
| 12 | 9 | GOA | Kenneth |
+----+----------+-----+---------+
The table can be read like this: First, the DEFender James passed to the FORward Bob. Then, Bob passed to the DEFender John, who in turn passed to the MIDfield Charles. After some more passes, the ball ends with the GOAlkeeper Kenneth.
Working solution
I need a query that lists all passes from any defender to any forward. As we can see in the previous table, there are two instances of that: right at the start, James sends the ball to Bob (row ID 1 to ID 2), and later on, Michael sends the ball to Bob (row ID 5 to ID 6).
In order to do this in SQL, I create a self-join for the Possession table, with the second instance being offset by one row. In order to be able to access the players' names and positions, I also join the two Possession table instances to the Players table. The following query does that:
SELECT
M1.ID AS "From",
M2.ID AS "To",
P1.Name AS "Sender",
P2.Name AS "Receiver"
FROM
Possession AS M1
INNER JOIN Possession as M2 ON M2.ID = M1.ID + 1
INNER JOIN Players as P1 ON M1.PlayerId = P1.ID AND P1.POS = "DEF" -- see execution plan
INNER JOIN Players as P2 ON M2.PlayerId = P2.ID AND P2.POS = "FOR"
We get the expected output:
+------+----+---------+----------+
| From | To | Sender | Receiver |
+------+----+---------+----------+
| 1 | 2 | James | Bob |
| 5 | 6 | Michael | Bob |
+------+----+---------+----------+
The problem
While this query is executed virtually instantly in the mock football database, there appears to be a problem in the execution plan with this query. Here is the output of EXPLAIN for it:
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
| 1 | SIMPLE | P2 | ref | PRIMARY,POS | POS | 5 | const | 1 | Using index condition |
| 1 | SIMPLE | M2 | ref | PRIMARY,PlayerID | PlayerID | 4 | MOCK.P2.ID | 1 | Using index |
| 1 | SIMPLE | P1 | ALL | PRIMARY,POS | NULL | NULL | NULL | 9 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | M1 | ref | PlayerID | PlayerID | 4 | MOCK.P1.ID | 1 | Using where; Using index |
+------+-------------+-------+------+------------------+----------+---------+------------+------+-------------------------------------------------+
I have to admit that I'm not very good at interpreting query execution plans. But it seems to me that the third row indicates a bottle neck for the join marked in the query above: apparently, a full scan is done for the P1 alias table, no key seems to be used even though POS and the primary key are available, and the join buffer (flat, BNL join) part is also very suspicious. I don't know what any of that means, but I usually don't find this with normal joins.
Perhaps due to this bottle neck, the query does not finish within any acceptable time span for my real database. My real equivalent to the mock Players table has ~60,000 rows, and the Possession equivalent has ~1,160,000 rows. I monitored the execution of the query via SHOW PROCESSLIST. After more than 600 seconds, the process was still tagged as Sending data, at which point I killed the process.
The query plan on this larger data set is rather similar to the one for the small mock data set. The third join appears to be problematic with no key used, a full table scan being performed, and the join buffer part that I don't really understand:
+------+-------------+-------+------+---------------+----------+---------+------------------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+----------+---------+------------------+-------+-------------------------------------------------+
| 1 | SIMPLE | P2 | ref | POS | POS | 1 | const | 1748 | Using index condition |
| 1 | SIMPLE | M2 | ref | PlayerId | PlayerId | 2 | REAL.P2.PlayerId | 7 | |
| 1 | SIMPLE | P1 | ALL | POS | NULL | NULL | NULL | 61917 | Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | M1 | ref | PlayerId | PlayerId | 2 | REAL.P1.PlayerId | 7 | Using where |
+------+-------------+-------+------+---------------+----------+---------+-----------------------+-------+-------------------------------------------------+
I tried forcing an index for the aliased table P1 by using Players AS P1 FORCE INDEX (POS) instead of Players AS P1 in the query shown above. This change does affect the execution plan. If I force POS to be used as the key, the third line in the output of EXPLAIN is very similar to the first line. The only difference is the number of rows, which is still very high (30912). Even this modified query did not complete after 600 seconds.
I don't think that this is a configuration issue. I have made up to 18 GB of RAM available to the MySQL server, and the server uses this memory for other queries. For the present query, memory consumption does not exceed 2 GB of RAM.
Back to the question
Thanks for staying this somewhat long-winded explanation up to this point!
Let's return to the initial question: What is an efficient, scalable way of selecting two (or more) rows from a table with consecutive IDs, especially if this table is joined with another table?
My current query certainly is doing something wrong, as it didn't finish even after ten minutes. Is there something that I can change in my current query to make it useful for my larger real data set? If not: is there an alternative, better solution that I could use?

I believe the issue is that you only have single field indexes on the players table. MySQL can only use a single index per joined table.
In case of the player table 2 fields are key from performance point of view:
playerid, since it is used in the join;
pos, since you filter on it.
You seem to have standalone indexes on both fields, but this forces MySQL to choose whether to use index for joining the 2 tables or to filter based on the where criteria.
I would create a multi-column index on playerid, pos fields (in this order), which can satisfy both the join and the where. This way MySQL can use a single index to satisfy both the join and the where.
I would also use explicit join instead of the comma separated list of tables with the join condition in where for better readability.

Here's a general plan:
SELECT
#n := #n + 1 AS N, -- Now the rows will be numbered 1,2,3,...
...
FROM ( SELECT #n := 0 ) AS init
JOIN tbl
ORDER BY ... -- based on your definition of 'consecutive'
Then you can use that query as a subquery somewhere else.
SELECT ...
FROM ( the above query ) AS x
GROUP BY ceiling(N/2) -- 1&2 will be grouped together; 3&4; etc
You can use `IF((N % 2) = 1, ..., ...) to different things with first versus second item in each pair.
You mentioned JOINing to another table. If possible, avoid doing the JOIN until this last SELECT.

Related

Load average jumps up during a sort in mysql query

I have encountered a MySQL query that takes over 2 minutes to complete and brings up the server load very high (e.g. from 2 to 14, or sometimes higher).
The query does a left join between tables, then sorts the data based on a float column on of the joined tables, like this:
SELECT table1.*, table2.*, table3.field, table4.field
FROM table1
LEFT JOIN table2 ON table1...
LEFT JOIN table3 ON table1...
LEFT JOIN table4 ON table3...
LEFT JOIN table5 ON table1...
WHERE table1.deleted=0
ORDER BY table2.float_field ASC
LIMIT 1,300
The JOINS are all done on indexed keys, and table2 also has an index on the float_field.
The same database structure and query is used on other databases without issues. This table2 is a custom field table, alterable by users of this database, so in this particular system, I see that it has 107 fields, more than 2/3 of them being varchar(150). Would this be why the high load, or is there some other reason? Any suggestion for how to handle it (ideally without having to re-do the db schema)?
Thanks in advance.
EDIT: Here are the 'explain' results:
+----+-------------+--------+--------+---------------+---------+---------+-----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+---------------+---------+---------+-----------------+-------+-------------+
| 1 | SIMPLE | table1 | ALL | idx_1,idx_2 | NULL | NULL | NULL | 33861 | Using where |
| 1 | SIMPLE | table2 | eq_ref | PRIMARY | PRIMARY | 108 | db.table1.id | 1 | |
| 1 | SIMPLE | jtl0 | ref | idx_X | idx_X | 111 | db.table1.id | 1 | |
| 1 | SIMPLE | table4 | eq_ref | PRIMARY,... | PRIMARY | 108 | db.jtl0.field | 1 | |
| 1 | SIMPLE | jt1 | eq_ref | PRIMARY | PRIMARY | 108 | db.table1.fieldX| 1 | |
+----+-------------+--------+--------+---------------+---------+---------+-----------------+-------+-------------+
Both idx_1 and idx_2 use 'deleted' column as the first field in the index. There is only this 1 field in the where
I also corrected the original text, there are 5 tables used, not 4 (although the last table has 20 rows only, so it doesn't matter here).
select table2.*
is generally bad style - returning a lot of columns you are not interested in. In this case it could be causing the slowness given the large number of (text) columns in this table.
100 columns * 150 characters * 1300 rows is roughly 19.5 MB, so the slowness could well be reading all the data from disk and transmitting it across the network.
Do you still see the slowness if you restrict this to the particular columns of table2 that you are interested in?
EDIT : your explain select output seems to confirm that it is not a difficult query to run, with only a small number of rows. This makes the sheer data size of each row in table2 the most likely problem. You can test this by removing / limiting the reference to table2. If that is the case, then the only way to speed this query up will be to request fewer columns from table2.

joining table in mysql not using index properly?

I have four tables that I am trying to join and output the result to a new table. My code looks like this:
create table tbl
select a.dte, a.permno, (ret - rf) f0_xs_ret, (xs_ret - (betav*xs_mkt)) f0_resid, mkt_cap last_year_mkt_cap, betav beta_value
from a inner join b using (dte)
inner join c on (year(a.dte) = c.yr and a.permno = c.permno)
inner join d on (a.permno = d.permno and year(a.dte)-1 = year(d.dte));
All of the tables have multiple indices and for table a, (dte, permno) identify a unique record, for table b, dte id's a unique record, for table c, (yr, permno) id a unique record and for table d, (dte, permno) id a unique record. the explain from the select part of the query is:
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| 1 | SIMPLE | d | ALL | idx1 | NULL | NULL | NULL | 264129 | |
| 1 | SIMPLE | c | ref | idx2 | idx2 | 4 | achernya.d.permno | 16 | |
| 1 | SIMPLE | b | ALL | PRIMARY,idx2 | NULL | NULL | NULL | 12336 | Using join buffer |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx1,idx2 | PRIMARY | 7 | achernya.b.dte,achernya.d.permno | 1 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+----------------------------------+--------+-------------------+
Why does mysql have to read so many rows to process this thing? and if i am reading this correctly, it has to read (264129*16*12336) rows which should take a good month.
Could someone please explain what's going on here?
MySQL has to read the rows because you're using functions as your join conditions. An index on dte will not help resolve YEAR(dte) in a query. If you want to make this fast, then put the year in its own column to use in joins and move the index to that column, even if that means some denormalization.
As for the other columns in your index that you don't apply functions to, they may not be used if the index won't provide much benefit, or they aren't the leftmost column in the index and you don't use the leftmost prefix of that index in your join condition.
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Three Queries Faster than One -- What's Wrong with my Joins?

I've got a JPA ManyToMany relationship set up, which gives me three important tables: my Ticket table, my Join table, and my Inventory table. They're InnoDB tables on MySQL 5.1. The relevant bits are:
Ticket:
+--------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Status | longtext | YES | | NULL | |
+--------+----------+------+-----+---------+----------------+
JoinTable:
+-------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| InventoryID | int(11) | NO | PRI | NULL | | Foreign Key - Inventory
| TicketID | int(11) | NO | PRI | NULL | | Foreign Key - Ticket
+-------------+---------+------+-----+---------+-------+
Inventory:
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| TStampString | varchar(32) | NO | MUL | NULL | |
+--------------+--------------+------+-----+---------+----------------+
TStampStrings are of the form "yyyy.mm.dd HH:MM:SS Z" (for example, '2010.03.19 22:27:57 GMT'). Right now all of the Tickets created directly correspond to some specific hour TStampString, so that SELECT COUNT(*) FROM Ticket; is the same as SELECT COUNT(DISTINCT(SUBSTRING(TStampString, 1, 13))) FROM Inventory;
What I'd like to do is regroup certain Tickets based on the minute granularity of a TStampString: (SUBSTRING(TStampString, 1, 16)). So I'm profiling and testing the SELECT of an INSERT INTO ... SELECT statement:
EXPLAIN SELECT SUBSTRING(i.TStampString, 1, 16) FROM Ticket t JOIN JoinTable j
ON t.ID = j.TicketID JOIN Inventory i ON j.InventoryID = i.ID WHERE t.Status
= 'Regroup' GROUP BY SUBSTRING(i.TStampString, 1, 16);
+--+------+---+--------+-------------+-----+-----+----------+-------+-----------+
|id| type |tbl| type | psbl_keys | key | len | ref | rows | Extra |
+--+------+---+--------+-------------+-----+-----+----------+-------+-----------+
|1 | SMPL | t | ALL | PRI | NULL| NULL| NULL | 35569 | where |
| | | | | | | | | | +temporary|
| | | | | | | | | | +filesort |
|1 | SMPL | j | ref | PRI,FK1,FK2 | FK2 | 4 | t.ID | 378 | index |
|1 | SMPL | i | eq_ref | PRI | PRI | 4 | j.Invent | 1 | |
| | | | | | | | oryID | | |
+--+------+---+--------+-------------+-----+-----+----------+-------+-----------+
What this implies to me is that for each row in Ticket, MySQL first does the joins then later decides that the row is invalid due to the WHERE clause. Certainly the runtime is abominable (I gave up after 30 minutes). Note that it goes no faster with t.Status = 'Regroup' moved to the first JOIN clause and no WHERE clause.
But what's interesting is that if I run this query manually in three steps, doing what I thought the optimizer would do, each step returns almost immediately:
--Step 1: Select relevant Tickets (results dumped to file)
SELECT ID FROM Ticket WHERE Status = 'Regroup';
--Step 2: Get relevant Inventory entries
SELECT InventoryID FROM JoinTable WHERE TicketID IN (step 1s file);
--Step 3: Select what I wanted all along
SELECT SUBSTRING(TStampString, 1, 16) FROM Inventory WHERE ID IN (step 2s file)
GROUP BY SUBSTRING(TStampString, 1, 16);
On my particular tables, the first query gives 154 results, the second creates 206,598 lines, and the third query returns 9198 rows. All of them combined take ~2 minutes to run, with the last query having the only significant runtime.
Dumping the intermediate results to a file is cumbersome, and more importantly I'd like to know how to write my original query such that it runs reasonably. So how do I structure this three-table-join such that it runs as fast as I know is possible?
UPDATE: I've added a prefix index on Status(16), which changes my EXPLAIN profile rows to 153, 378, and 1 respectively (since the first row has a key to use). The JOIN version of my query now takes ~6 minutes, which is tolerable but still considerably slower than the manual version. I'd still like to know why the join is performing sorely suboptimally, but it may be that one can't create independent subqueries in buggy MySQL 5.1. If enough time passes I'll accept Add Index as the solution to my problem, although it's not exactly the answer to my question.
In the end I did end up manually recreating every step of the join on disk. Tens of thousands of files each with a thousand queries was still significantly faster than anything I could get my version of MySQL to do. But since that process would be horribly specific and unhelpful for the layman, I'm accepting ypercube's answer of Add (Partial) Indexes.
What you can do to speed up the query:
Add an index on Status. Even if you don't change the type to VARCHAR, you can still add a partial index:
ALTER TABLE Ticket
ADD INDEX status_idx
Status(16) ;
I assume that the Primary key of the Join table is (InventoryID, TicketID). You can add another index on (TicketID, InventoryID) as well. This may not benefit this particular query but it will be helpful in other queries you'll have.
The answer on why this happens is that the optimizer does not always choose the best plan. You can try this variation of your query and see how the EXPLAIN plan differs and if there is any efficiency gain:
SELECT SUBSTRING(i.TStampString, 1, 16)
FROM
( SELECT (DISTINCT) j.InventoryID
FROM Ticket t
JOIN JoinTable j
ON t.ID = j.TicketID
WHERE t.Status = 'Regroup'
) AS tmp
JOIN Inventory i
ON tmp.InventoryID = i.ID
GROUP BY SUBSTRING(i.TStampString, 1, 16) ;
try giving the first substring-clause an alias and using it in the group-by.
SELECT SUBSTRING(i.TStampString, 1, 16) as blaa FROM Ticket t JOIN JoinTable j
ON t.ID = j.TicketID JOIN Inventory i ON j.InventoryID = i.ID WHERE t.Status
= 'Regroup' GROUP BY blaa;
also avoid the join altogether since you dont need it..
SELECT distinct(SUBSTRING(i.TStampString, 1,16)) from inventory i where i.ID in
( select id from JoinTable j where j.TicketID in
(select id from Ticket t where t.Status = 'Regroup'));
would that work?
btw. you do have an index on the Status field ?

Is column order important in mysql?

I read somewhere that column order in mysql is important. I believe they were referring to the indexed columns.
QUESTION: If column order is important, when and why is it important?
The reason I ask is because I have a table in mysql similar to the one below.
The primary index is on the left and I have an index on the far right. Is this bad?
It is a MyISAM table and will be used predominantly for selects (no inserts, deletes or updates).
-----------------------------------------------
| Primary index | data1| data2 | d3| Index |
-----------------------------------------------
| 1 | A | cat | 1 | A |
| 2 | B | toads | 3 | A |
| 3 | A | yabby | 7 | B |
| 4 | B | rabbits | 1 | B |
-----------------------------------------------
Column order is only important when defining indexes, as this affects whether an index is suitable to use in executing a query. (This is true of all RBDMS's, not just MySQL)
e.g.
Index defined on columns MyIndex(a, b, c) in that order.
A query such as
select a from mytable
where c = somevalue
probably won't use that index to execute the query (depends on several factors such as row count, column selectivity etc)
Whereas, it will most likely choose to use an index defined as MyIndex2(c,a,b)
Update: see use-the-index-luke.com (thanks Greg).

SQL Query Optimization

I am trying to speed up this django app (note: I didn't design this... just stuck maintaining it) and the biggest bottle neck seems to be these queries that are being generated by the admin. We have a content class that 4-5 other sub-classes inherit from and anytime the master list is pulled up in the admin a query like this is generated:
SELECT `content_content`.`id`,
`content_content`.`issue_id`,
`content_content`.`slug`,
`content_content`.`section_id`,
`content_content`.`priority`,
`content_content`.`group_id`,
`content_content`.`rotatable`,
`content_content`.`pub_status`,
`content_content`.`created_on`,
`content_content`.`modified_on`,
`content_content`.`old_pk`,
`content_content`.`content_type_id`,
`content_image`.`content_ptr_id`,
`content_image`.`caption`,
`content_image`.`kicker`,
`content_image`.`pic`,
`content_image`.`crop_x`,
`content_image`.`crop_y`,
`content_image`.`crop_side`,
`content_issue`.`id`,
`content_issue`.`special_issue_name`,
`content_issue`.`web_publish_date`,
`content_issue`.`issue_date`,
`content_issue`.`fm_name`,
`content_issue`.`arts_name`,
`content_issue`.`comments`,
`content_section`.`id`,
`content_section`.`name`,
`content_section`.`audiodizer_id`
FROM `content_image`
INNER
JOIN `content_content`
ON `content_image`.`content_ptr_id` = `content_content`.`id`
INNER
JOIN `content_issue`
ON `content_content`.`issue_id` = `content_issue`.`id`
INNER
JOIN `content_section`
ON `content_content`.`section_id` = `content_section`.`id`
WHERE NOT ( `content_content`.`pub_status` = -1 )
ORDER BY `content_issue`.`issue_date` DESC LIMIT 30
I ran an EXPLAIN on this and got the following:
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | content_image | ALL | PRIMARY | NULL | NULL | NULL | 40499 | Using temporary; Using filesort |
| 1 | SIMPLE | content_content | eq_ref | PRIMARY,issue_id,content_content_issue_id,content_content_section_id,content_content_pub_status | PRIMARY | 4 | content_image.content_ptr_id | 1 | Using where |
| 1 | SIMPLE | content_section | eq_ref | PRIMARY | PRIMARY | 4 | content_content.section_id | 1 | |
| 1 | SIMPLE | content_issue | eq_ref | PRIMARY | PRIMARY | 4 | content_content.issue_id | 1 | |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
Now, from what I've read, I need to somehow figure out how to make the query to content_image not be terrible; however, I'm drawing a blank on where to start.
Currently, judging by the execution plan, MySQL is starting with content_image, retrieving all rows, and only thereafter using primary keys on the other tables: content_image has a foreign key to content_content, and content_content has foreign keys to content_issue and content_section. Also, only after all the joins are complete can it make much use of the ORDER BY content_issue.issue_date DESC LIMIT 30, since it can't tell which of these joins might fail, and therefore, how many records from content_issue will really be needed before it can get the first thirty rows of output.
So, I would try the following:
Change JOIN content_issue to JOIN (SELECT * FROM content_issue ORDER BY issue_date DESC LIMIT 30) content_issue. This will allow MySQL, if it starts with content_issue and works its way to the other tables, to grab a very small subset of content_issue.
Note: properly speaking, this changes the semantics of the query: it means that only records from at most the last 30 content_issues will be retrieved, and therefore that if some of those issues don't have published contents with images, then fewer than 30 records will be retrieved. I don't have enough information about your data to gauge whether this change of semantics would actually change the results you get.
Also note: I'm not suggesting to remove the ORDER BY content_issue.issue_date DESC LIMIT 30 from the end of the query. I think you want it in both places.
Add an index on content_issue.issue_date, to optimize the above subquery.
Add an index on content_image.content_ptr_id, so MySQL can work its way from content_content to content_image without doing a full table scan.