ORDER BY and LIMIT 1 in joined table - mysql

I've been trying to figure this one out for a long time but am starting to give up.
To simplify the case, let's say I've got 2 tables. Main table is articles and I'm left joining it with contracts. The contracts table has an end date. I only want to pick 1 (one) row from here per article, selecting the latest contract_to date.
I've tried tried something like LEFT JOIN contracts ON (contracts.article = articles.id) ORDER BY contract_to DESC LIMIT 1 but obviously it's not working.
How do I go about doing this?
Please pretend that the date ranges on each row in the contracts table below are different.
Also the latest date is not the same for all article contracts, so I can't just determine what the latest date is and then stick it into a WHERE clause.

To get the latest contract_to value, you'll need a MAX() aggregate. The proper way to do this is to use a subquery join to get only the article and MAX(contract_to) values, then join that with the remaining values of the row. Finally, that whole structure can be joined against the articles table.
SELECT
articles.*,
contracts.*
FROM
articles
/* Join against a subquery which returns only the article and latest contract_to */
LEFT JOIN (
SELECT article, MAX(contract_to) AS contract_to
FROM contracts
GROUP BY article
) maxcontract ON articles.article_id = maxcontract.article
/* and join that against the rest of the contracts table, on those two column values */
JOIN contracts
ON maxcontract.article = contracts.article
AND maxcontract.contract_to = contracts.contract_to
Since MySQL is lenient about the contents of the GROUP BY clause, this method may not actually be necessary, joining separately against the contracts table, and you could probably do it with the subquery join alone, but that won't work in most other RDBMS and this is really the right way to do it without relying on MySQL's weird behavior.

MySQL doesn't have the nice analytic functions that some DBMSes offer for this, but you could write (for example):
SELECT ...
FROM articles
LEFT
OUTER
JOIN ( SELECT article,
MAX(contract_to) AS contract_to
FROM contracts
GROUP
BY article
) articles_to_max_contracts
ON articles_to_max_contracts.article = articles.id
LEFT
OUTER
JOIN contracts
ON contracts.article = articles.id
AND contracts.contract_to = articles_to_max_contracts.contract_to
;

To get just the price, you can also do this with a correlated subquery:
select a.*,
(select price
from contracts c
where a.article = c.article
order by contract_from desc
limit 1
) as lastPrice
from articles a;
With an index on contracts(article, contract_from) this should even be relatively efficient.

Related

MySQL Spring complicated query - ways to order and query efficiency

I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.

MySql: order by along with group by - performance

I have the performance problem with query that have order by and group by. I have checked similar problems on SO but I did not find the solution to this:(
I have something like this in my db schema:
pattern has many pattern_file belongs to project_template which belongs to project
Now I want to get projects filtered by some data(additional tables that I join) and want to get the result ordered for example by projects.priority and grouped by patterns.id. I have tried many things and to get the desired result I've figured out this query:
SELECT DISTINCT `projects`.* FROM `projects`
INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
INNER JOIN (SELECT DISTINCT projects.id FROM `projects` INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
WHERE [here my conditions] ORDER BY [here my order]) P
ON P.id = projects.id
WHERE [here my conditions]
GROUP BY patterns.id
ORDER BY [here my order]
From my research I have to INNER JOIN with subquery to conquer the problem "ORDER BY before GROUPing BY" => then I have put the same conditions on the outer query for performance purpose. The order by I had to use again in the outer query too, otherwise the result will be sorted by default.
Now there is real performance problem as I have about 6k projects and when I run this query without any conditions it takes about 15s :/ When I narrow the result by specify the conditions the time drastically dropped down. I've found somewhere that the subquery is run for every outer query row result which could be true when you watch at the execution time :/
Could you please give some advice how I can optimize the query? I do not work much with sql so maybe I do it from the wrong side from the very beginning?
P.S. I have tried WHERE projects.id IN (Select project.id FROM projects ....) and that discarded the performance issue but also discarded the ORDER BY before GROUPing BY
EDIT.
I want to retrieve list of projects, but I want also to filter it and order, and finally I want to get patterns.id unique(that is why I use the group by).
order by in your inner query (p) doesn't make sense (any inner sort will only
have an arbitrary effect).
#Solarflare Unfortunately it does. group by will take first row from grouped result. It preserve the order for join. Well, I believe that it is specific to MySql. Furthermore to keep the order from subquery I could use ORDER BY NULL in outer query :-)
Also, select projects.* ... group by pattern.id is fishy (although MySQL, in contrast to every other dbms, allows you to do this)
so we can assume I retrieve only projects.id, but from docs:
MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause

SQL check if thread timestamp is newer than reply timestamp in JOIN Statement

So I'm kinda new to SQL joins and was thinking on going full overkill probably.
What I want to do is join my four tables together.
What I want to accomplish is that I want all the information from category, and I want it to be matched to the replies with the newest timestamp and then I want to join the t.title which t.id matches r.thread_id
SELECT c.*, t.id, t.title, r.timestamp, u.id, u.username
FROM forum_category AS c
LEFT JOIN forum_threads AS t ON (c.id = t.category_id)
LEFT JOIN forum_replies AS r ON (t.id = r.thread_id
AND r.timestamp =
(
SELECT timestamp
FROM forum_replies
ORDER BY timestamp DESC LIMIT 1
))
LEFT JOIN users AS u ON (r.user_id = u.id)
GROUP BY c.id
As it is now this code seems to work, not having tested it alot.
However I need to expand it to check if t.timestamp is newer than latest r.timestamp and JOIN that one instead then. with the t.title, t.timestamp and t.user_id.
So if a thread is newer than the latest reply.
I know I could make the first post a reply and solve it that way. But I'm not doing that right now if it's possible to solve in the SQL statement.
SQL layout imgur here:
https://imgur.com/a/nCn2a
forum_category:
forum_threads:
forum_replies:
One helpful technique is to use Subqueries to break up the mental logic of what your query is trying to do. Basically, a subquery takes the place of a regular table in any query.
So, first up, we need to get the most recent time stamp in the replies for each thread:
select thread_id, max(timestamp) as LatestReply
from forum_replies
group by thread_id
Let's call this our MostRecentThreadSubquery. So, it would let us do something like:
select * from
forum_threads t
LEFT JOIN
(
select thread_id, max(timestamp) as LatestReply
from forum_replies
group by thread_id
) as MostRecentThreadSubquery
on t.thread_id = MostRecentThreadSubquery.thread_id
Make sense? We're no longer joining the forum_threads table against the forum_replies table - we've made a subquery to help us list the most recent reply for each thread id.
Now, we add the SQL CASE statement, to get something like:
select
thread_id,
CASE WHEN t.timestamp > MostRecentThreadSubquery.LatestReply
THEN t.timestamp
ELSE MostRecentThreadSubquery.LatestReply
END as MostRecentTimestamp
from -- ... the rest of that earlier SQL statement
Okay, so now we've got a query that, for every thread_id, has the most recent timestamp - whether that's from the forum_replies or from the forum_threads table.
... and you guessed it. We're going to make it another subquery. Let's call it our MostRecentPerThread
select *
from forum_category AS c
LEFT JOIN
(
-- ... that previous query ...
) as MostRecentPerThread
on c.thread_id = MostRecentPerThread.thread_id
Make sense? You're using subqueries as a way of logically breaking down your query into smaller components. You no longer have one gigantic query. You've got a small subquery that simply gets the timestamp of the most recent reply. You've got a small subquery that compares that first subquery to the threads table to get the most recent timestamp. And you've got a main query that uses the second subquery to merge it with the categories table.

Best way to write this query?

I am doing a sub-query join to another table as I wanted to be able to sort the results I got back with it, I only need the first row but I need them ordered in a certain way so I would get the lowest id.
I tried adding LIMIT 1 to this but then the full query returned 0 results; so now it has no limit and in the EXPLAIN I have two rows showing they are using the full 10k+ rows of the auction_media table.
I wrote it this way to avoid having to query the auction_media table for each row separately, but now I'm thinking that this way isn't that great if it has to use the whole auction_media table?
Which way is better? The way I have it or querying the auction_media table separately? ...or is there a better way!?
Here is the code:
SELECT
a.auction_id,
a.name,
media.media_url
FROM
auctions AS a
LEFT JOIN users AS u ON u.user_id=a.owner_id
INNER JOIN ( SELECT media_id,media_url,auction_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
ORDER BY media_id ASC
) AS media
ON a.auction_id=media.auction_id
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1')
GROUP BY a.auction_id;
Edit: Through my testing, using the above query seems like it would be the much faster method overall; however I worry if that will still be the case when the auction_media table grows to like 1M rows or something.
edit: As stated in the comments - DISTINCT is not required because the auctions table can only be associated with (at most) one user table row and one row in the inner query.
You may want to try this. The outer query's GROUP BY is replaced with DISTINCT since you don't have any aggregate function. The inner query, was replaced by a query to find the smallest media_id per auction_id, then JOINed back to get the media_url. (Since I didn't know if the media_id and auction_id were a composite unique key, I used the same WHERE clause to help eliminate potential duplicates.)
SELECT
a.auction_id,
a.name,
auction_media.media_url
FROM auctions AS a
LEFT JOIN users AS u
ON u.user_id=a.owner_id
INNER JOIN (SELECT auction_id, MIN(media_id) AS media_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
GROUP BY auction_id) AS media
ON a.auction_id=media.auction_id
INNER JOIN auction_media
ON auction_media.media_id = media.media_id
AND auction_media.auction_id = media.auction_id
AND auction_media.media_type=1
AND auction_media.upload_in_progress=0
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1');

Select corresponding records from another table, but just the last one

I have 2 tables authors and authors_sales
The table authors_sales is updated each hour so is huge.
What I need is to create a ranking, for that I need to join both tables (authors has all the author data while authors_sales has just sales numbers)
How can I create a final table with the ranking of authors ordering it by sales?
The common key is the: authorId
I tried with LEFT JOIN but I must be doing something wrong because I get all the authors_sales table, not just the last.
Any tip in the right direction much appreciated
If you're looking for aggregate data of the sales, you'd want to join the tables, group by the authorId. Something like...
select authors.author_id, SUM(author_sales.sale_amt) as total_sales
from authors
inner join author_sales on author_sales.author_id = authors.author_id
group by authors.author_id
order by total_sales desc
However (I couldn't distinguish from your question whether the above scenario or next is true), if you're only looking for the max value of the author_sales table (if the data in this table is already aggregated), you can join on a nested query for author_sales, such as...
select author.author_id, t.sales from authors
inner join
(select top 1 author_sales.author_id,
author_sales.sale_amt,
author_sales.some_identifier
from author_sales order by some_identifier desc) t
on t.author_id = author.author_id
order by t.sales desc
The some_identifier would be how you determine which record is the most recent for author_sales, whether it is a timestamp of when it was inserted or an incremental primary key, however it is set up. Depending on if the data in author_sales is aggregated already, one of these two should do it for you...
select a.*, sum(b.sales)
from authors as a
inner join authors_sales as b
using authorId
group by b.authorId
order by sum(b.sales) desc;
/* assuming column sales = total for each row in authors_sales */