Optimizing MySQL LEFT JOIN query - mysql

My goal is to select articles where the primary_category_id (articles table) or any of the
secondary categories (articles_secondary_categories join table) are a given value. In this example query, category 1. I tried using other types of joins but the caveat here is that an article might not have any secondary categories.
SELECT DISTINCT articles.*
FROM articles
LEFT JOIN articles_secondary_categories AS categories
ON categories.article_id = articles.id
WHERE
(
primary_category_id = 1
OR
categories.category_id = 1
)
AND articles.state = "published"
AND edition_id = 1
ORDER BY publish_at DESC
LIMIT 10;
Any help optimizing this or alternatives are welcome. In a DB with 4k articles and 7k articles_secondary_categories (not categories) it takes 5 seconds to run this query.

You can reverse the query on the secondary categories:
(SELECT articles.*
FROM articles
WHERE primary_category_id = 1)
UNION DISTINCT
(SELECT articles.*
FROM articles_secondary_categories AS categories
JOIN articles ON (categories.article_id = articles.id)
WHERE categories.category_id = 1
GROUP BY articles_id)
ORDER BY publish_at DESC
LIMIT 10;
It should give you a decent speed boost - just make sure you index categories.articles_id

Avoid using OR in your where clause. Optimizers usually don't use indexes with OR predicates.
Try moving the categories.category_id = 1 into the join condition:
SELECT articles.*
FROM articles
LEFT JOIN articles_secondary_categories AS categories
ON categories.article_id = articles.id and categories.category_id = 1
WHERE 1 in (ifnull(categories.category_id, primary_category_id), primary_category_id)
AND articles.state = "published"
AND edition_id = 1
ORDER BY publish_at DESC
LIMIT 10;
The key to this query is 1 in (ifnull(categories.category_id, primary_category_id), primary_category_id), which says "if we got a join to categories, use that in the list, otherwise use the primary_category_id, and in all cases use the primary_category_id.

Related

How to replace sub-query by joins?

How can i get rif of theses subqueries ?
(all tables have columns Created and LastEdited as timestamps)
table Process
- ID
- Title
table ProcessHistory
- ID
- ProcessID
- HistoryID
table History
- ID
- Title (new, open, closed etc.)
When i try to get a list of processes with cols of the last status title i do:
SELECT DISTINCT Process.*, History.Title AS HistoryTitle, History.ID AS HistoryID
FROM `Process`
LEFT JOIN ProcessHistory AS ProcessHistory ON Process.ID=ProcessHistory.ProcessID
LEFT JOIN History AS History ON HistoryID=ProcessHistory.HistoryID
WHERE History.ID = (
SELECT HistoryID FROM ProcessHistory
WHERE ProcessID=Process.ID
ORDER BY ProcessHistory.ID DESC LIMIT 1
)
GROUP BY Process.ID
ORDER BY Process.ID DESC LIMIT 0, 100
When i try to get a list filtered by a specific status
(Where the latest HistoryID is 1 - "all open Processes")
SELECT DISTINCT Process.*, History.Title AS HistoryTitle, History.ID AS HistoryID
FROM `Process`
LEFT JOIN ProcessHistory AS ProcessHistory ON Process.ID=ProcessHistory.ProcessID
LEFT JOIN History AS History ON HistoryID=ProcessHistory.HistoryID
WHERE History.ID=(
SELECT HistoryID FROM ProcessHistory
WHERE HistoryID =1
AND ProcessID=Process.ID ORDER BY ProcessHistory.ID DESC LIMIT 1
)
GROUP BY Process.ID
ORDER BY Process.ID DESC LIMIT 0, 100
For performance reasons i want to get rid of theses subqueries?
How can i replace the subquery?
Thanks in advance !
Query you posted not give proper result as per ORDER BY ProcessHistory.ID DESC LIMIT 1
Try below query as per result of your query
SELECT DISTINCT p.*, h.Title AS HistoryTitle,h.ID AS HistoryID
FROM
Process p JOIN ProcessHistory ph ON p.ID=ph.ProcessID and ph.HistoryID=1
JOIN History h ON h.ID=ph.HistoryID
GROUP BY p.ID
ORDER BY p.ID DESC LIMIT 0, 100;
Here the sql fiddle:
sql fiddle link
If you want other result then comment.
You'll still need to make another select to filter out the rest of the process history, but you should use a derived table instead of a subquery, like this sqlfiddle:
SELECT Process.*, History.Title AS HistoryTitle
FROM Process
JOIN (
SELECT ProcessID, max(HistoryID) as HistoryID
FROM ProcessHistory
GROUP BY ProcessID
) PH ON PH.ProcessID = Process.ID
JOIN History ON History.ID = PH.HistoryID
ORDER BY Process.ID DESC LIMIT 0, 100
As Himanshu Patel pointed out, your query to "get a list filtered by a specific status (Where the latest HistoryID is 1 - "all open Processes")" will not produce the desired effect. It will simply return all processes that have a HistoryID of 1. See this sqlfiddle.
Instead, you want to use a derived table to get those latest process history ids, join them with the processes, and filter on the history ids like this sqlfiddle:
SELECT DISTINCT Process.*, History.Title AS HistoryTitle
FROM Process
JOIN (
SELECT ProcessID, max(HistoryID) as HistoryID
FROM ProcessHistory
GROUP BY ProcessID
) PH ON PH.ProcessID = Process.ID
JOIN History ON History.ID = PH.HistoryID
WHERE PH.HistoryID = 1
ORDER BY Process.ID DESC LIMIT 0, 100
An alternative approach would be to create a view that filters the ProcessHistory table to the latest history per process and join on that. YMMV, but in some cases, performance can be improved that way.
SELECT *
FROM Process AS p
LEFT JOIN (
SELECT ph.ProcessID, ph.HistoryID, h.Title AS HistoryTitle
FROM ProcessHistory AS ph
JOIN History AS h ON ph.HistoryID = h.ID
WHERE h.ID = 1
) AS phh ON p.ID = phh.ProcessID
ORDER BY p.ID DESC LIMIT 100
Due to the actual subquery, used to get "last" row related to each "process", you can neither convert it to a JOIN nor can you use it as a separate query to set a session variable.

How can I use a join to combine these two queries?

I have two tables: articles and modifications. I want users to easily be able to revert their article back to it's original state if they realize they shouldn't have modified it. Instead of using an extra query to find the id of the article's newest modification, I would like to use a join. So I want to get the information from the articles table and then join the modifications table to it to return the associated row. This is what I have now:
<?php
$query = "
SELECT
article_id, title, content
FROM articles
WHERE article_id = ".$article_id."
LIMIT 1";
$query_article = $this->db->query($query);
$article = $query_article->row_array();
$query_mod = "
SELECT
modification_id, article_id, title, content, date
FROM modifications
WHERE article_id = ".$article_id."
ORDER BY modification_id DESC
LIMIT 1";
$query_mod = $this->db->query($query);
if($query_mod->num_rows() > 0){
$mod = $query_mod->row_array();
$article_title = $mod['title'];
$article_content = $mod['content'];
} else {
$article_title = $article['title'];
$article_content = $article['content'];
}
How could I combine these two queries into one using a join?
SELECT
a.title, a.content, a.article_id
m.modification_id AS mod_id, m.title AS mod_title, m.content AS mod_content
FROM articles AS a
LEFT JOIN modifications AS m ON (...)
WHERE a.article_id = 1
LIMIT 1
Your skeletal attempt at a query is pretty much correct except for the empty ON clause, which merely needs to identify equality between article_id on the two tables. It is correct to use a LEFT JOIN since you need to return the article regardless of a match in modifications.
SELECT
a.title, a.content, a.article_id,
m.modification_id AS mod_id, m.title AS mod_title, m.content AS mod_content
FROM
articles AS a
LEFT JOIN modifications AS m ON a.article_id = m.article_id
WHERE a.article_id = 1
ORDER BY mod_id DESC LIMIT 1
However, your PHP logic shows that you are conditionally using the title,content from the modifications table if it is present. For that, you may use COALESCE() directly in the SQL to return the first non-null argument, so if the LEFT JOIN has no match, article will be used.
SELECT
-- COALESCE to prefer the modifications value if non-null
COALESCE(m.title,a.title) AS title,
COALESCE(m.content, a.content) AS content,
a.article_id,
m.modification_id AS mod_id
FROM
articles AS a
LEFT JOIN modifications AS m ON a.article_id = m.article_id
WHERE a.article_id = 1
ORDER BY mod_id DESC LIMIT 1
Here's a demonstration: http://sqlfiddle.com/#!2/1085c/1
Because you're only attempting to return details for one article_id, no subqueries are needed. This gets a little more complicated if you want to return the latest for multiple article_id, requiring a subquery join with a MAX() aggregate.
SELECT
COALESCE(m.title,a.title) AS title,
COALESCE(m.content, a.content) AS content,
a.article_id,
m.modification_id AS mod_id
FROM
articles AS a
-- Join articles against a subquery to get the most recent mod_id only
LEFT JOIN (
SELECT article_id, MAX(modification_id) AS mod_id
FROM modifications
GROUP BY article_id
) mm ON mm.article_id = a.article_id
-- and then join that mod_id against the rest of the modifications table
LEFT JOIN modifications m ON mm.mod_id = m.modification_id
WHERE article_id IN (<multiple criteria for article_id>)
http://sqlfiddle.com/#!2/14051/2

Left join order by

I have property pictures in table with their sort_order starting from 0 to number of pictures.
What I would like to do is select pictures but I would like it to start from 2.
My approach was:
SELECT * FROM property_photos AS pp1
JOIN property_photos AS pp2 ON pp1.p_id = pp2.p_id
where pp2.sort_order =2
and pp2.sort_order <2
and pp1.sort_order >2
and pp1.p_id = 3
So what I am trying to gain here is the sort order would be like 2,0,1,3,4,5,6,7
so I need a self join but my query doesn't work
you don't need a join on this,
SELECT *
FROM property_photos
WHERE p_id = 3
ORDER BY (sort_order = 2) DESC, sort_order

Improve SQL Query performance with JOIN

I've got the following, slow performing, SQL query:
SELECT *
FROM news_events
WHERE 1 AND (user_id = 2416) OR id IN(SELECT content_id FROM likes WHERE user_id = 2416)
ORDER BY id DESC
LIMIT 0,10
The news_events table has indexes on user_id. And the likes table has an index on user_id.
To try to improve performance I have re-written the query using an INNER JOIN the following way:
SELECT a.*
FROM news_events a
INNER JOIN likes b ON (a.id = b.content_id)
WHERE (a.user_id = 2416) OR (b.user_id = 2416)
ORDER BY a.id DESC
LIMIT 0,10
But performance doesn't improve either. I've run explain on this last query and this is the result:
I appreciate any pointer on what I could do to improve the performance of this query.
SELECT *
FROM
(
SELECT a.*
FROM news_events a
WHERE a.user_id = 2416
UNION
SELECT ne.*
FROM news_events ne
INNER JOIN likes l
ON ne.id=l.contentid
WHERE l.user_id = 2416
)
ORDER BY 1 DESC
LIMIT 0,10
Try this query -
SELECT * FROM news_events ne
LEFT JOIN (SELECT content_id FROM likes WHERE user_id = 2416) l
ON ne.user_id = 2416 OR ne.id = l.content_id
ORDER BY
ne.id DESC
LIMIT
0, 10
These columns should be indexed: news_events.user_id, news_events.id, likes.user_id, likes.content_id.
Your query is quite good enough. Posted queries by mates are also fine. But, if you are having large set of data and you did not rebuild indexes since long then, you need to rebuild indexes on both tables.
It is a standard protocol that db admin need to rebuild all the indexes timely as well as recompile all the objects+packages in the db.
I hope it will help :)
Keep querying!

Keeping returned records from mysql unique

Is there a quick way to make sure the records you return from a MySQL JOIN query are unique?
The code below could potentially bring back the same category twice. Its the category ID which should be distinct!
SELECT
exp_categories.cat_name, exp_categories.cat_id, exp_categories.cat_url_title
,exp_category_posts.entry_id, exp_channel_titles.status
FROM (exp_categories
LEFT JOIN exp_category_posts
ON exp_categories.cat_id = exp_category_posts.cat_id)
LEFT JOIN exp_channel_titles
ON exp_category_posts.entry_id = exp_channel_titles.entry_id
WHERE exp_categories.group_id = 2
AND exp_category_posts.entry_id IS NOT NULL
AND exp_channel_titles.status = 'open'
ORDER BY RAND()
LIMIT 2
If I understand what you need you could close your query with:
GROUP BY exp_categories.cat_id
ORDER BY RAND()
LIMIT 2
The problem here is that you're joining one-to-many, so you will see as many rows as there are records in the "many" table. If you only want to bring back one row per category, you need to either only query categories or decide what criteria you want to use to choose single values from exp_category_posts.
One example option is to query for the most recent post in a category and join on that resultset instead of exp_category_posts.
SELECT
exp_categories.cat_name, exp_categories.cat_id, exp_categories.cat_url_title
,exp_category_posts.entry_id, exp_channel_titles.status
FROM (exp_categories
LEFT JOIN exp_category_posts
ON exp_categories.cat_id = exp_category_posts.cat_id)
LEFT JOIN exp_channel_titles
ON exp_category_posts.entry_id = exp_channel_titles.entry_id
WHERE exp_categories.group_id = 2
AND exp_category_posts.entry_id IS NOT NULL
AND exp_channel_titles.status = 'open'
GROUP BY exp_categories.cat_id
ORDER BY RAND()
LIMIT 2
edit: Adding the column you want to be distinct in the GROUP BY clause will ensure there's only 1 returned. Also, the ORDER BY RAND() is heavily discouraged, see: http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/