Sum of averages raw query - mysql

I have the following code that I have to optimize:
These are the models:
class Question(models.Model):
question_id = models.CharField(max_length=20)
label = models.CharField(max_length=255, verbose_name='Question')
class Property(models.Model):
name = models.CharField(max_length=200)
class Response(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
submit_date = models.DateTimeField()
score = models.IntegerField(null=True, blank=True)
is_null = models.BooleanField(default=False)
ignore = models.BooleanField(default=False)
property = models.ForeignKey(Property, on_delete=models.CASCADE)
class Plan(models.Model):
name = models.CharField(max_length=100)
questions = models.ManyToManyField(Question, through='PlanQuestion')
start_date = models.DateField(null=True)
completion_date = models.DateField(null=True)
class PlanQuestion(models.Model):
question = models.ForeignKey(Question, on_delete=models.CASCADE)
plan = models.ForeignKey(Plan, on_delete=models.CASCADE)
I first iterate over the plans then plan questions like this:
plans = Plan.objects.filter(
start_date__isnull=False, completion_date__isnull=False
)
for plan in plans:
plan_questions = plan.questions.through.objects.filter(plan=plan)
for plan_question in plan_questions:
# run the below query for each plan_question here
In the above code for each plan question this query is run to calculate the average of score:
SELECT AVG(score) AS average_score
FROM Response WHERE question_id=%(question_id)s
AND DATE(submit_date) >= %(stard_date)s AND DATE(submit_date) <= %(end_date)s
The problem is that:
If let us say Plan1 has 5 questions:
P1 => Avg(Q1) + Avg(Q2) + Avg(Q3) + Avg(Q4) + Avg(Q5)
The query is run for each question which calculates the average score for each response (one question can have many responses) so for P1, 5 queries are run, and let us say it takes 0.5 seconds to execute one query then it would take 2.5 seconds (5 * 0.5) to run 5 queries for one plan. Now If we increase the number of Plans each having 5 questions then it would increase the time exponentially.
I want a way to reduce the number of these queries so that for each question I don't have to run queries separately. How to combine all the queries of question in one query ?. Maybe I can use union but I don't get how would I write a single query using that or maybe there might be a better solution than a union.
I also tried to add prefech_related but that did no improvement.
Edit:
Create Tables:
CREATE TABLE `Response` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`question_id` int(11) NOT NULL,
`score` int(11) DEFAULT NULL
PRIMARY KEY (`id`),
KEY `Response_25110688` (`question_id`),
CONSTRAINT `question_id_refs_id_2dd82bdb` FOREIGN KEY (`question_id`) REFERENCES `Question` (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=157533450 DEFAULT CHARSET=latin1
CREATE TABLE `Question` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`question_id` varchar(20) NOT NULL,
`label` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=353 DEFAULT CHARSET=latin1
CREATE TABLE `Plan` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_date` date DEFAULT NULL,
`completion_date` date DEFAULT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=687 DEFAULT CHARSET=latin1
CREATE TABLE `PlanQuestion` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`plan_id` int(11) NOT NULL,
`question_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `PlanQuestion_plan_id_de8df699_fk_Plan_id` (`plan_id`),
KEY `PlanQuestion_question_id_49c10d5b_fk_Question_id` (`question_id`),
CONSTRAINT `PlanQuestion_plan_id_de8df699_fk_Plan_id` FOREIGN KEY (`plan_id`) REFERENCES `Plan` (`id`),
CONSTRAINT `PlanQuestion_question_id_49c10d5b_fk_Question_id` FOREIGN KEY (`question_id`) REFERENCES `Question` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2130 DEFAULT CHARSET=latin1
CREATE TABLE `Property` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(200) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=188651 DEFAULT CHARSET=latin1
Here is the full query:
SELECT id, COUNT(*) AS count, AVG(int_val) AS int_average
FROM Response WHERE question_id=%(question_id)s
AND property_id=%(property_id)s and is_null=0
AND Response.ignore=0 AND DATE(submit_date) >= %(stard_date)s
AND DATE(submit_date) <= %(end_date)s

This does not make a lot of sense:
SELECT id, COUNT(*) AS count, AVG(int_val) AS int_average
FROM Response
WHERE question_id=%(question_id)s
AND DATE(submit_date) >= %(stard_date)s
AND DATE(submit_date) <= %(end_date)s
Without a GROUP BY, the COUNT and AVG will be totals for the one "question_id". But then if there is a different id for each row, which id are you hoping for?
OK, assuming id is removed, it needs this composite index with the columns in this order:
INDEX(question_id, submit_date)
Meanwhile, remove INDEX(question_id) because it will be in the way.
Sorry, but sometimes performance requires changes.
Secondly... "for plan_question in plan_questions" implies that you want that to be run for every "question"?
Then get rid of the loop and do all the work at the same time:
SELECT question_id, COUNT(*) AS count, AVG(int_val) AS int_average
FROM Response
WHERE DATE(submit_date) >= %(start_date)s
AND DATE(submit_date) <= %(end_date)s
GROUP BY question_id
This will return one row per question; then you can loop through the resultset to deliver the output.
Good news: Even if you don't add the above index, this will work better than what you have now.
Also... cur_date = datetime.now().date() could be removed from the app code; instead, use simply CURDATE() in SQL to get just the date or NOW() to get the date+time.
Indexing Getting rid of "for plan_question in plan_questions" will be the biggest benefit. The query (as I wrote it) already benefits from the index on question_id. However, adding INDEX(submit_date) might run faster if the date range is narrow.
If there are other clauses in the WHERE, we need to see them. There may be other indexes to suggest.
More
SELECT id, COUNT(*) AS count
FROM response
-- (and not JOINing to any other tables)
GROUP BY id;
This query always has a count of 1 because each id occurs in response exactly once.
SELECT
-- (without id)
COUNT(*) AS count
FROM response
-- (and not JOINing to any other tables)
-- (without GROUP BY)
;
This query always returns exactly 1 row.
Still More
Based on
WHERE question_id=%(question_id)s
AND property_id=%(property_id)s and is_null=0
AND Response.ignore=0 AND DATE(submit_date)...
you need
INDEX(question_id, property_id, is_null, ignore)
and drop INDEX(question_id).
But... My statement about doing a single query instead of an app loop still stands.
JOINing to Plan
SELECT r.question_id,
COUNT(*) AS count,
AVG(r.int_val) AS int_average,
p.plan -- perhaps you want to say which "plan" is involved?
FROM Plans AS p
JOIN PLanQuestions AS pq ON pq.plan_id = p.plan_id
JOIN Responses AS r ON r.question_id = pq.question_id
WHERE p.... -- optionally filter on which plans to include
AND pq.... -- optionally filter on the other columns in pq
AND r.... -- optionally filter on which responses to include
ORDER BY ... -- optionally sort the results by any column(s) in any table(s)
And remove the two single-column indexes in PlanQuestions, replacing them by two 2-column indexes:
INDEX(plan_id, question_id),
INDEX(question_id, plan_id)
AND DATE(submit_date) <= %(end_date)s
GROUP BY question_id
Sargable
DATE(submit_date) >= "..." is "not sargable" This means that an index involving col cannot help with the test. Since submit_date is of datatype DATE, this is semantically identical and faster:
submit_date >= "..."

Related

MySQL - how to optimize query with order by

I am trying to generate a list of the 5 most recent history items for for a collection of user tasks. If I remove the order by the execution drops from ~2 seconds to < 20msec.
Indexes are on
h.task_id
h.mod_date
i.task_id
i.user_id
This is the query
SELECT h.*
, i.task_id
, i.user_id
, i.name
, i.completed
FROM h
, i
WHERE i.task_id = h.task_id
AND i.user_id = 42
ORDER
BY h.mod_date DESC
LIMIT 5
Here is the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE i ref PRIMARY,UserID UserID 4 const 3091 Using temporary; Using filesort
1 SIMPLE h ref TaskID TaskID 4 myDB.i.task_id 7
Here are the show create tables:
CREATE TABLE `h` (
`history_id` int(6) NOT NULL AUTO_INCREMENT,
`history_code` tinyint(4) NOT NULL DEFAULT '0',
`task_id` int(6) NOT NULL,
`mod_date` datetime NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`history_id`),
KEY `TaskID` (`task_id`),
KEY `historyCode` (`history_code`),
KEY `modDate` (`mod_date`)
) ENGINE=InnoDB AUTO_INCREMENT=185647 DEFAULT CHARSET=latin1
and
CREATE TABLE `i` (
`task_id` int(6) NOT NULL AUTO_INCREMENT,
`user_id` int(6) NOT NULL,
`name` varchar(60) NOT NULL,
`due_date` date DEFAULT NULL,
`create_date` date NOT NULL,
`completed` tinyint(1) NOT NULL DEFAULT '0',
`task_description` blob,
PRIMARY KEY (`task_id`),
KEY `name_2` (`name`),
KEY `UserID` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12085 DEFAULT CHARSET=latin1
INDEX(task_id, mod_date, history_id) -- in this order
Will be "covering" and the columns will be in the optimal order
Also, DROP
KEY `TaskID` (`task_id`)
So that the Optimizer won't be tempted to use it.
Try changing the index on h.task_id so it's this compound index.
CREATE OR REPLACE INDEX TaskID ON h(task_id, mod_date DESC);
This may (or may not) allow MySql to shortcut some or all the extra work in your ORDER BY ... LIMIT ... request. It's a notorious performance anti pattern, by the way, but sometimes necessary.
Edit the index didn't help. So let's try a so-called deferred join so we don't have to ORDER and then LIMIT all the data from your h table.
Start with this subquery. It retrieves only the primary key values for the rows involved in your results, and will generate just five rows.
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
Why this subquery? It handles the work-intensive ORDER BY ... LIMIT operation while manipulating only the primary keys and the date. It still must sort tons of rows only to discard all but five, but the rows it has to handle are much shorter. Because this subquery does the heavy work, you focus on optimizing it, rather than the whole query.
Keep the index I suggested above, because it covers the subquery for h.
Then, join it to the rest of your query like this. That way you'll only have to retrieve the expensive h.description column for the five rows you care about.
SELECT h.* , i.task_id, i.user_id , i.name, i.completed
FROM h
JOIN i ON i.task_id = h.task_id
JOIN (
SELECT h.history_id, i.task_id
FROM h
JOIN i ON h.task_id = i.task_id
WHERE i.user_id = 42
ORDER BY h.mod_date
LIMIT 5
) selected ON h.history_id = selected.history_id
AND i.task_id = selected.task_id
ORDER BY h.mod_date DESC
LIMIT 5

MySQL query with IN clause loses performance

I have a table to store data from csv files. It is a large table (over 40 million rows). This is its structure:
CREATE TABLE `imported_lines` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`day` date NOT NULL,
`name` varchar(256) NOT NULL,
`origin_id` int(11) NOT NULL,
`time` time(3) NOT NULL,
`main_index` tinyint(4) NOT NULL DEFAULT 0,
`transaction_index` tinyint(4) NOT NULL DEFAULT 0,
`data` varchar(4096) NOT NULL,
`error` bit(1) NOT NULL,
`expressions_applied` bit(1) NOT NULL,
`count_records` smallint(6) NOT NULL DEFAULT 0,
`client_id` tinyint(4) NOT NULL DEFAULT 0,
`receive_date` datetime(3) NOT NULL,
PRIMARY KEY (`id`,`client_id`),
UNIQUE KEY `uq` (`client_id`,`name`,`origin_id`,`receive_date`),
KEY `dh` (`day`,`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (`client_id`) PARTITIONS 15 */
When I perform a SELECT with one day filter, it returns data very quick (0.4 s). But, as I increase date range, it becomes slow, until gets a timeout error.
This is the query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines FORCE INDEX (dh)
WHERE client_id = 1
AND day >= '2017-07-02' AND day <= '2017-07-03'
AND name IN ('name1', 'name2', 'name3', ...)
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I think the IN clause may be losing performance. I also tried to add uq index to this query, which gave a little gain (FORCE INDEX (dh, uq)).
Plus, I tried to INNER JOIN (SELECT name FROM providers WHERE id = 2) prov ON prov.name = il.name but doesn't result in a quicker query as well.
EDIT
EXPLAINing the query
id - 1
select_type - SIMPLE
table - imported_lines
type - range
possible_keys - uq, dh
key - dh
key_len - 261
ref - NULL
rows - 297988
extra - Using where; Using temporary; Using filesort
Any suggestions what it should do?
I have done a few changes, adding a new index with multiple columns (as suggested by #Uueerdo) and rewritten query as another user suggested too (but he deleted his answer).
I ran a few EXPLAIN PARTITIONS with queries, tested with SQL_NO_CACHE in order to guarantee it wouldn't use cache and searching data for one whole month now takes 1.8s.
It's so much faster!
This is what I did:
ALTER TABLE `imported_lines` DROP INDEX dh;
ALTER TABLE `imported_lines` ADD INDEX dhc (`day`, `name`, `client_id`);
Query:
SELECT origin_id, error, main_index, transaction_index,
expressions_applied, name, day,
COUNT(id) AS total, SUM(count_records) AS sum_records
FROM imported_lines il
INNER JOIN (
SELECT id FROM imported_lines
WHERE client_id = 1
AND day >= '2017-07-01' AND day <= '2017-07-31'
AND name IN ('name1', 'name2', 'name3', ...)
) AS il_filter
ON il_filter.id = il.id
WHERE il.client_id = 1
GROUP BY origin_id, error, main_index, transaction_index, expressions_applied, name, day;
I realized using INNER JOIN, EXPLAIN PARTITIONS it began to use index. Also with WHERE il.client_id = 1, query reduces the number of partitions to look up.
Thanks for your help!

select taking 8 seconds. improve ideas

I have this select to get chat (like facebook inbox).
It will show most recent messages, grouping by user who sent them.
SELECT c.id, c.from, c.to, c.sent, c.message, c.recd FROM chat c
WHERE c.id IN(
SELECT MAX(id) FROM chat
WHERE (`to` = 1 and `del_to_status` = '0') or (`from` = 1 and `del_from_status` = '0')
GROUP BY CASE WHEN 1 = `to` THEN `from` ELSE `to` END
)
ORDER BY id DESC
limit 60
The problem is it is taking about 8 seconds.
`chat` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`from` int(11) UNSIGNED NOT NULL,
`to` int(11) UNSIGNED NOT NULL,
`message` text NOT NULL,
`sent` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`recd` tinyint(1) NOT NULL DEFAULT '0',
`del_from_status` tinyint(1) NOT NULL DEFAULT '0',
`del_to_status` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `from` (`from`),
KEY `to` (`to`),
FOREIGN KEY (`from`) REFERENCES cadastro (`id`),
FOREIGN KEY (`to`) REFERENCES cadastro (`id`)
)
any ideas of indexing or re-writing this select to get better speed?
I am assuming chat.id is indexed. If not, of course you should add an index.
If it is indexed, MySQL is often very slow with sub selects.
One thing you can do is convert your sub select to a temporary table and join with it.
It will look something like
CREATE TEMPORARY TABLE IF NOT EXISTS max_chat_ids
( INDEX(id) )
ENGINE=MEMORY
AS ( 'SELECT MAX(id) as id FROM chat
WHERE (`to` = 1 and `del_to_status` = '0') or (`from` = 1 and `del_from_status` = '0')
GROUP BY CASE WHEN 1 = `to` THEN `from` ELSE `to` END' );
then, you need to just join with the temp table:
SELECT c.id, c.from, c.to, c.sent, c.message, c.recd FROM chat c
join max_chat_ids d on c.id=d.id
ORDER BY c.id DESC
limit 60
temp tables only live during the duration of the session, so if you test this in phpmyadmin remember to execute both queries together with ';' between them.
If you try this share your result.
I'll assume the column id is already indexed since it probably is the primary key of the table. If it's not the case, add the index:
create index ix1_chat on chat (id);
Then, if the selectivity of the subquery is good then an index will help. The selectivity is the percentage of rows the select is reading compared to the total number of rows. Is it 50%, 5%, 0.5%? If it's 5% or less then the following index will help:
create index ix2_chat on chat (`to`, del_to_status, `from`, del_from_status);
As a side note, please don't use reserved words for column names: I'm talking about the from column. It just makes life difficult for everyone.

Optimize a query

How can I proceed to make my response time more faster, approximately the average time of response is 0.2s ( 8039 records in my items table & 81 records in my tracking table )
Query
SELECT a.name, b.cnt FROM `items` a LEFT JOIN
(SELECT guid, COUNT(*) cnt FROM tracking WHERE
date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day ) GROUP BY guid) b ON
a.`id` = b.guid WHERE a.`type` = 'streaming' AND a.`state` = 1
ORDER BY b.cnt DESC LIMIT 15 OFFSET 75
Tracking table structure
CREATE TABLE `tracking` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`ip` int(11) NOT NULL,
`date` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `i1` (`ip`,`guid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=4303 DEFAULT CHARSET=latin1;
Items table structure
CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`guid` int(11) DEFAULT NULL,
`type` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`embed` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`description` text,
`tags` varchar(255) DEFAULT NULL,
`date` int(11) DEFAULT NULL,
`vote_val_total` float DEFAULT '0',
`vote_total` float(11,0) DEFAULT '0',
`rate` float DEFAULT '0',
`icon` text CHARACTER SET ascii,
`state` int(11) DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=9258 DEFAULT CHARSET=latin1;
Your query, as written, doesn't make much sense. It produces all possible combinations of rows in your two tables and then groups them.
You may want this:
SELECT a.*, b.cnt
FROM `items` a
LEFT JOIN (
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE `date` > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day)
GROUP BY guid
) b ON a.guid = b.guid
ORDER BY b.cnt DESC
The high-volume data in this query come from the relatively large tracking table. So, you should add a compound index to it, using the columns (date, guid). This will allow your query to random-access the index by date and then scan it for guid values.
ALTER TABLE tracking ADD INDEX guid_summary (`date`, guid);
I suppose you'll see a nice performance improvement.
Pro tip: Don't use SELECT *. Instead, give a list of the columns you want in your result set. For example,
SELECT a.guid, a.name, a.description, b.cnt
Why is this important?
First, it makes your software more resilient against somebody adding columns to your tables in the future.
Second, it tells the MySQL server to sling around only the information you want. That can improve performance really dramatically, especially when your tables get big.
Since tracking has significantly fewer rows than items, I will propose the following.
SELECT i.name, c.cnt
FROM
(
SELECT guid, COUNT(*) cnt
FROM tracking
WHERE date > UNIX_TIMESTAMP(NOW() - INTERVAL 1 day )
GROUP BY guid
) AS c
JOIN items AS i ON i.id = c.guid
WHERE i.type = 'streaming'
AND i.state = 1;
ORDER BY c.cnt DESC
LIMIT 15 OFFSET 75
It will fail to display any items for which cnt is 0. (Your version displays the items with NULL for the count.)
Composite indexes needed:
items: The PRIMARY KEY(id) is sufficient.
tracking: INDEX(date, guid) -- "covering"
Other issues:
If ip is an IP-address, it needs to be INT UNSIGNED. But that covers only IPv4, not IPv6.
It seems like date is not just a "date", but really a date+time. Please rename it to avoid confusion.
float(11,0) -- Don't use FLOAT for integers. Don't use (m,n) on FLOAT or DOUBLE. INT UNSIGNED makes more sense here.
OFFSET is naughty when it comes to performance -- it must scan over the skipped records. But, in your query, there is no way to avoid collecting all the possible rows, sorting them, stepping over 75, and only finally delivering 15 rows. (And, with no more than 81, it won't be a full 15.)
What version are you using? There have been important changes to the Optimization of LEFT JOIN ( SELECT ... ). Please provide EXPLAIN SELECT for each query under discussion.

Optimize table to reduce index size

I have this schema which saves chat messages. Currently I have about 100k rows which is about 5.5MB of data. Index size is 6.5MB. When data size was ~4MB index size was ~3MB so it's growing exponentially?
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`author` int(11) unsigned DEFAULT NULL,
`time` int(10) unsigned DEFAULT NULL,
`text` text,
`dest` int(11) unsigned DEFAULT NULL,
`type` tinyint(4) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `history` (`author`,`dest`,`id`) USING BTREE,
KEY `messages_ibfk_1` (`dest`),
FULLTEXT KEY `msg` (`text`),
CONSTRAINT `au` FOREIGN KEY (`author`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `messages_ibfk_1` FOREIGN KEY (`dest`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=105895 DEFAULT CHARSET=utf8;
The main query that I'm running against this table and that I've tried to optimize it for is when I need to show paginated history for a chat between 2 people
SELECT id, time, text, dest, type, author
FROM `messages`
WHERE (
(author = ? AND dest = ?) OR (author = ? AND dest = ?)
) AND id <= ? ORDER BY id DESC LIMIT ?, 25
The other queries for history are identical except they have additional filters for a search term or date range.
Is there anything that can be done to reduce index size and maintain optimal performance?
Don't worry about the growth of the indexes. It is probably a fluke; certainly not "exponential".
Assuming the main issue is performance of
SELECT id, time, text, dest, type, author
FROM `messages`
WHERE (
(author = ? AND dest = ?) OR (author = ? AND dest = ?)
) AND id <= ? ORDER BY id DESC LIMIT ?, 25
I see three techniques that will help significantly: Change OR to UNION, deal with LIMIT in UNION, and don't use OFFSET for pagination.
( SELECT id, time, text, dest, type, author
FROM `messages`
WHERE author = ? -- one author & dest
AND dest = ?
AND id < ? -- where you "left off"
ORDER BY id DESC
LIMIT 25
) UNION ALL
( SELECT id, time, text, dest, type, author
FROM `messages`
WHERE author = ? -- the other author & dest
AND dest = ?
AND id < ? -- same as above
ORDER BY id DESC
LIMIT 25
)
ORDER BY id DESC
LIMIT 25; -- get the desired 25 from the 50 above
Pagination discussion explains why the OFFSET should be removed. It discusses other techniques, including using 26 (in all three places) instead of 25 so that you know if this is the 'last' page.
On the first iteration, AND id < ? could be left off. Or (simpler), you could substitute a very large number.
Your index (author,dest,id) is optimal for my formuation.
This complex formulation will shine as messages gets bigger and/or the user pages farther through the list.