mysql having... > avg() doesn't work as expected - mysql

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.
two views are like below:
create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary
group by user_id
);
and second using having and avg:
create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)
);
What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries.
But the result only comes in only 1 user who have 3 diaries.
my relative tables are:
CREATE TABLE IF NOT EXISTS `school` (
`school_id` int(11) NOT NULL,
`school_name` varchar(45) NOT NULL,
`location` varchar(45) NOT NULL,
`master` varchar(45) NOT NULL,
`numbers_of_student` int(11) NOT NULL,
PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `user_diary` (
`diary_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`title` varchar(45) NOT NULL,
`content` varchar(255) NOT NULL,
`addtime` DATETIME NOT NULL,
PRIMARY KEY (`diary_id`,`user_id`),
KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Is it any problems with cross join? or something else?
thanks a lot!

You can't use avg that way. In my personal movie database,
select * from movie having year > avg(year);
produces nothing, and
select * from movie having year > (select avg (year) from movie);
produces the expected result.

You must calculate the average in a separate subquery.
Something like:
select ...
from ...
group by ...
having diary_num > (
select avg(diary_num)
from ...)
You can fill in the blanks with what makes sense

Something like this should return the resultset you are looking for:
SELECT u.user_id
, u.realname
, c.diary_num
, s.school_name
-- , a.diary_avg
FROM ( SELECT d.user_id
, COUNT(DISTINCT d.diary_id) AS diary_num
FROM user_diary d
) c
JOIN user u
ON u.user_id = c.user_id
JOIN school s
ON s.school_id = u.school_id
JOIN ( SELECT AVG(v.diary_num) AS diary_avg
FROM ( SELECT t.user_id
, COUNT(DISTINCT t.diary_id) AS diary_num
FROM user_diary t
) v
) a
ON a.diary_avg < c.diary_num
ORDER BY 1
The inline view aliased as c gets us the diary_num (count) for each user.
The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.
As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:
( SELECT COUNT(DISTINCT t.diary_id)
/ NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
FROM user v
LEFT
JOIN user_diary t
ON t.user_id = v.user_id
) a
This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.
NOTE
The CROSS keyword has no influence on the MySQL optimizer.
We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.
But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.
One more question: Does MySQL support for "View's SELECT contains a subquery in the FROM clause"?
A: Really old versions (3.x ?) of MySQL did not support subqueries. But certainly, MySQL 5.1 and later do support subqueries.
To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, e.g.
SELECT v.*
FROM (
SELECT 1 AS foo
) v

Related

Trying to join 2 tables based on the most recent timestamp *before* a specific date

Here is my SQL statement, which seemed to function perfectly well before we created a new database. This approach seems to work just fine on another, similarly structured, pair of tables.
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u.timestamp)
FROM tele2_usage
WHERE u.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
What I'm attempting to do here is join the details table with the usage table, the usage rows just contains the iccid of a sim card, a timestamp, and their current usage in bytes. What this should do is find the most recent usage record before the specified date (2022-01-08 09:30:00). This should give me a set of sims, each joined with it's most recent usage record before the specified time, however I usually get zero results on this particular combo of tables.
Specifically though, it does match any records where the date is the exact same as specified in the query, but not dates that are before or equal to the specified date. Can anybody help me with where I'm going wrong. This query worked fine in a previous database, we're rebuilding our systems and this has now appeared as an issue.
Thanks in advance for any help.
Edit
Here's some more information, that I hope will make the question make more sense. So here is an outline of the details table, I've removed some of the columns but this is at least illustrative of table.
CREATE TABLE `tele2_details` (
`iccid` VARCHAR(255) NOT NULL,
`msisdn` VARCHAR(255) NOT NULL,
`status` VARCHAR(255) NOT NULL,
`ratePlan` VARCHAR(255) NOT NULL,
`communicationPlan` VARCHAR(255) NOT NULL,
PRIMARY KEY (`iccid`)
)
COLLATE='utf8mb4_0900_ai_ci'
ENGINE=InnoDB
;
Then we also have a usages table, which stores sample of the data usage of sim cards, along with timestamps...
CREATE TABLE `tele2_usage` (
`id` INT NOT NULL AUTO_INCREMENT,
`iccid` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`msisdn` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`timestamp` DATETIME NOT NULL,
`ctd_data_usage` BIGINT NOT NULL,
`ctd_sms_usage` BIGINT NOT NULL,
`ctd_voice_usage` BIGINT NOT NULL,
`session_count` INT NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8mb4_hr_0900_ai_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10116319
;
The query I'm trying to create should return a set of sim details, joined with a usage record which is closest to, but not after, a particular time.
So if you look at the original query at the top, I'm trying to join the details onto the usage record which is **closest to, but not after 2022-01-08 09:30:00 **
I hope that makes sense.
Let's have a look at a particular sim
SELECT iccid, msisdn, status, ratePlan, communicationPlan FROM tele2_details WHERE iccid='xxxx203605100034xxxx'
results in 1 match
"xxxx203605100034xxxx" "xxxx9120012xxxx" "ACTIVATED" "Pay as use - Existing Business" "Data LTE"
And if I look in the usage table for that same sim I can see many records that should satisfy my conditions
SELECT id, iccid, TIMESTAMP, ctd_data_usage FROM tele2_usage WHERE iccid='xxxx203605100034xxxx' AND TIMESTAMP <= '2022-01-08 09:30:00'
results in
"10096279" "xxxx203605100034xxxx" "2022-01-08 09:01:00" "77517560"
"10092271" "xxxx203605100034xxxx" "2022-01-08 08:01:03" "77002733"
"10088263" "xxxx203605100034xxxx" "2022-01-08 07:01:11" "76270445"
"10084255" "xxxx203605100034xxxx" "2022-01-08 06:01:05" "76270445"
of which I would like to select the first record (with the 09:01 timestamp) for joining on to the details record. I can get that timestamp with the following query
SELECT MAX(timestamp)
FROM tele2_usage
WHERE TIMESTAMP <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s') AND iccid='xxxx203605100034xxxx'
which results in '2022-01-08 09:01:00', which is exactly what I want. So now I put it all together...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(timestamp)
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon' AND iccid='xxxx203605100034xxxx'
And i get nothing! I would expect to get back the details record joined with that particular sim, but actually I get zero results and I don't understand why.
Ideally I would remove the final AND for the iccid and I would expect to recieve a set of all the sims for that client with the usage from closest to but not after the specified date.
So with that explanation, does anyone know why I'm not getting any records? I have a similar table for another sim provider that structured exactly the same, with a details table and a usage table and this query works just fine on that table. I simply can't understand why this doesn't work.
Edit 2
#Serg suggested trying to alias the subquery (I think that's what it's called) which resulted in the following code...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage AS u2
WHERE u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
Unfortunately this still resulted in zero results.
You should correlate the subquery:
SELECT *
FROM tele2_details AS d INNER JOIN tele2_usage AS u
ON d.iccid = u.iccid
AND u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage u2
WHERE d.iccid = u2.iccid AND u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE d.accountCustom1='Horizon';
Or, with a join to an aggregation query:
SELECT *
FROM tele2_details AS d
INNER JOIN tele2_usage AS u ON d.iccid = u.iccid
INNER JOIN (
SELECT iccid, MAX(timestamp) timestamp
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
GROUP BY iccid
) m ON m.iccid = u.iccid AND m.timestamp = u.timestamp
WHERE d.accountCustom1='Horizon';

MySQL - add clauses to left join

I have a table called properties (p) and another table called certificates (c). There can be more than one certificate allocated against each property or no certificate at all. I need to produce a query that uses a join and only displays one certificate from the certificates table per property. The one certificate that is shown needs to be the one with the most recent expiry date. There is a field in the certificates table named 'certificate_expiry_date'. The simple join would be p.property_id = c.certificate_property but this currently outputs all certificates.
My Query Attempt
Here's my query so far;
SELECT DISTINCT t.tenancy_property, t.*, p.*, c.* FROM tenancy t
INNER JOIN property p
on t.tenancy_property = p.property_id
LEFT JOIN
(
SELECT *
FROM certificate
WHERE certificate_expiry_date > CURDATE()
ORDER BY certificate_expiry_date DESC
LIMIT 1
) c ON p.property_id = c.certificate_property
WHERE t.tenancy_type='1' AND p.property_mains_gas_supply='1' AND p.property_availability='2' ORDER BY t.tenancy_id DESC LIMIT {$startpoint} , {$per_page}
This query executes fine but doesn't seem to take into account the left join on the certificates table.
Table structure for table certificate
CREATE TABLE IF NOT EXISTS `certificate` (
`certificate_id` int(11) NOT NULL AUTO_INCREMENT,
`certificate_property` int(11) DEFAULT NULL,
`certificate_type` tinyint(4) DEFAULT NULL,
`certificate_reference` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`certificate_start_date` date DEFAULT NULL,
`certificate_expiry_date` date DEFAULT NULL,
`certificate_notes` text COLLATE utf8_bin,
`certificate_renewal_instructed` tinyint(4) DEFAULT NULL,
`certificate_renewal_contractor` int(11) DEFAULT NULL,
PRIMARY KEY (`certificate_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=219 ;
If we only need to return one or two columns from the certificates table, we can sometimes use correlated subqueries in the SELECT list.
This approach has some performance implications for large tables; but for some use cases, with appropriate indexes available, this can be a workable approach.
SELECT p.id
, p.somecol
, ( SELECT c.col
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_col
, ( SELECT c.date_added
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_date_added
FROM property p
WHERE ...
ORDER BY ...
Updated answer with your updated information
Something like this?
(Note: This answer assumes that each property has at least one certificate, or else the sub-query qMostRecentExpire may fail)
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
Updated answer after knowing that some properties may have no certificates
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
, certificates as c -- inner join : properties that
where -- has not cert will be dropped
p.property_id = c.certificate_property

Get response time per day of week

I have 3 tables:
CREATE TABLE `ticket` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`sid` varchar(50) NOT NULL,
`open_date` datetime NOT NULL,
PRIMARY KEY (`tid`),
KEY `sid` (`sid`,`open_date`),
KEY `open_date` (`open_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_reply` (
`rid` int(11) NOT NULL AUTO_INCREMENT,
`tid` int(11) NOT NULL,
`reply_date` datetime NOT NULL,
PRIMARY KEY (`rid`),
KEY `tid` (`tid`,`reply_date`),
KEY `reply_date` (`reply_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `subscription` (
`sid` varchar(50) NOT NULL,
`response_time` int(11) NOT NULL DEFAULT '24',
PRIMARY KEY (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I'm trying to get the sum of response times the first ticket reply is from when the ticket was opened and group it by DAYNAME (maybe by MONTH also). Currently I have this SQL:
SELECT
t.tid,
DAYNAME(t.open_date) AS day_opened,
SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min,
SUM(s.response_time * 60) AS response_time_min
FROM ticket t
INNER JOIN ticket_reply tr ON tr.tid = t.tid
INNER JOIN subscription s ON s.sid = t.sid
GROUP BY
t.tid #group by tid as ticket_reply may return many
ORDER BY t.open_date DESC;
So first challenge I have is getting the first ticket_reply row which I solved by GROUP BY, I tried to get a subquery in the join but it was still returning a row per ticket_reply row.
So now I want to start grouping by DAYNAME and maybe MONTH but if I add it to the GROUP BY it doesn't group:
GROUP BY
t.tid,
DAYNAME(t.open_date)
Have tried DAYNAME before tid but that didn't make any difference.
So I have a couple questions, is there a better way to get the first row in ticket_reply and then group by the DAYNAME? I have a feeling getting the first row in a subquery may fix the grouping.
It is grouping, but because you have t.tid in the GROUP BY clause, and the tid column is unique in the ticket table, multiple rows from ticket are not getting collapsed, each is going to be on its own row.
It's not exactly clear what result you want to return.
(The SUM(s.response_time) expression in the SELECT list seems a bit odd, given that you could be matching multiple rows from ticket_reply.)
Given your existing statement, it looks like you might want to use an inline view to return the "earliest" reply_date for each ticket, in place of the reference to the ticket_reply table.
JOIN /*ticket_reply*/
( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr ON tr.tid = t.tid
(Unfortunately, materializing the inline view (populating and accessing an intermediate "derived table") can be the source of a performance issue.)
As another option, rather than performing a JOIN operation, you could consider using a correlated subquery in the SELECT list. That is, in place of the reference to tr.reply_date, you could do something like:
(SELECT MIN(r.reply_date) FROM ticket_reply r WHERE r.tid = t.tid)
and remove the JOIN to the ticket_reply table.
But, repeated execution of that subquery (once for each row returned), can also be a performance issue for large sets.
But the "big" question is whether you need to add up s.response_time for each occurrence of a matching ticket_reply (as your current query is doing), or whether you just need to include the s.response_time once for each ticket?
That is, if there are three ticket_reply for a given ticket, do we need to "triple" the value of response_time that we add to the row?
If you need to include the response_time in the total for each `ticket_reply, then this:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 * tr.cnt_replies AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
, COUNT(1) AS cnt_replies
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
If you only need to include the response_time in the total once for each ticket, remove the references to cnt_replies:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
To GROUP BY month, just change the first expression in the SELECT list, and the reference in the GROUP BY clause.

Can't find a way to reduce 2 SQL queries to 1 without killing performance

I have the following SQL query which works absolutely fine:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
FROM `pass`
WHERE `passer` IN ('48717','33305','49413','1640')
AND `receiver` IN ('48717','33305','49413','1640');
The numbers in the IN clause are player ID's, and can be obtained from another table in the database called player. Each row in this table has a player ID, a team_id and a match_id which is a foreign key to the match table.
I would like to automatically obtain those player ID's using the match_id. I can do this as follows:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
FROM `pass`
WHERE `passer` IN
(
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID)
AND `receiver` IN
(
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID
)
)
However, apparentyly using subqueries is infamously slow and indeed, it's far too slow to use. Even using join, as follows, is far too slow:
SELECT COUNT(*), COUNT(DISTINCT `fk_match_id`)
from `pass` st1
INNER JOIN `player` st2
ON (st1.passer = st2.player_id OR st1.receiver = st2.player_id);
That too, is far too slow. So want to know if it is possible to do what I can do in 2 queries in effectively 0.0 seconds (fetching the players id's in one query and then running the first query takes virtually no time at all) in just one query, or if that is completely impossible.
Any help would be greatly appreciated.
Thanks a lot!
EDIT::
The relevant table structures are as follows:
Player:
Pass:
I want to calculate the number of passes every player has made to another player in a given line up in history. I have a match id and a team id. I can obtain the players involved in a particular match for a team by querying the player table:
SELECT player_id
FROM `player`
WHERE `team_id` = someTeamID
AND `match_id` = someMatchID
This returns something like:
1803,1930,13310,1764,58845,15157,51938,2160,18892,12002,4101,14668,80979,59013
I then want to query the pass table and return every row where one of those id's is in the passer and the receiver columns.
You need a composite index on (passer, receiver):
After adding it, try the JOIN:
SELECT COUNT(*), COUNT(DISTINCT fk_match_id)
FROM pass
INNER JOIN player AS p
ON pass.passer = p.player_id
INNER JOIN player AS r
ON r.player_id = pass.passer ;
If you want these results for a specific (team_id, match_id) combination, add an (team_id, match_id, player_id) index and then use:
SELECT COUNT(*), COUNT(DISTINCT fk_match_id)
FROM pass
INNER JOIN player AS p
ON p.team_id = someTeamID
AND p.match_id` = someMatchID
AND p.player_id = pass.passer
INNER JOIN player AS r
ON r.team_id = someTeamID
AND r.match_id` = someMatchID
AND r.player_id = pass.receiver ;

MySQL Query is Extremely Slow

Hello I am looking for ways to optimize the mysql query, basically I am fetching the articles for the user which belong to category_id = 25 and source_id not in a table where I store source id's from which user has unsubscribed.
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from sources n
INNER JOIN articles a
ON (a.source_id = n.source_id)
WHERE n.category_id = 25
AND n.source_id NOT IN(select
source_id
from news_sources_deselected
WHERE user_id = 5)
ORDER BY a.article_publish_date DESC
Schema for Articles Table
CREATE TABLE IF NOT EXISTS `articles` (<br>
`article_id` int(255) NOT NULL auto_increment,<br>
`article_title` varchar(255) NOT NULL,<br>
`source_id` int(255) NOT NULL,<br>
`article_publish_date` bigint(255) NOT NULL,<br>
`article_details` text NOT NULL,<br>
PRIMARY KEY (`article_id`),<br>
KEY `source_id` (`source_id`),<br>
KEY `article_publish_date` (`article_publish_date`)<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Contains articles.';
Structure for Sources table
CREATE TABLE IF NOT EXISTS `sources` (<br>
`source_id` int(255) NOT NULL auto_increment,<br>
`category_id` int(255) NOT NULL,<br>
`source_name` varchar(255) character set latin1 NOT NULL,<br>
`user_id` int(255) NOT NULL,<br>
PRIMARY KEY (`source_id`),<br>
KEY `category_id` (`category_id`),<br>
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='News Sources.'
The articles table has around 0.3 Million records and sources table contains around 1000 records, the query takes around 180 seconds to execute.
Any help will be greatly appreciated.
Try using a derieved query with IS NULL condition. You explain says there is a dependent subquery. Ignore using it and use derieved query for your problem. This will increase the performance
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from sources n
INNER JOIN articles a
ON (a.source_id = n.source_id)
LEFT JOIN (SELECT *
FROM news_sources_deselected
WHERE user_id = 5) AS nsd
ON nsd.source_id = n.source_id
WHERE n.category_id = 25
AND nsd.source_id IS NULL
ORDER BY a.article_publish_date DESC
Use EXPLAIN in front of your query and analyze results.
Here you can find how to start your optimization work.
I see few issues you could check.
You're not using relations despite using InnoDB engine.
You're selecting fields without index.
You're selecting all rows at once.
Do you need all those rows at once? Maybe consider splitting this query to multiple shards (paging)?
Try this query
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from
sources n
INNER JOIN
articles a
ON
n.category_id = 25 AND
a.source_id = n.source_id
INNER JOIN
news_sources_deselected nsd
ON
nsd.user_id <> 5 AND n.source_id = nsd.source_id
ORDER BY
a.article_publish_date DESC
I have removed the extra query and added news_sources_deselected in join by accepting all source_id for user_id other than with id 5.
Or we can go for using only needed records for join as user raheelshan has mentioned
select
a.article_id,
a.article_title,
a.source_id,
a.article_publish_date,
a.article_details,
n.source_name
from
(select
*
from
sources
where
category_id = 25) n
INNER JOIN
articles a
ON
a.source_id = n.source_id
INNER JOIN
(select
*
from
news_sources_deselected
where
user_id <> 5) nsd
ON
n.source_id = nsd.source_id
ORDER BY
a.article_publish_date DESC
Hope this helps..
I fixed the issue by partitioning the table, but I am still open to suggestions.