MySQL aggregate functions with LEFT JOIN - mysql

It's Friday afternoon* and my brain has stopped working. Normally it is me who's answering dumb SQL questions like this, sorry!
I am trying to get one table, along with the highest value of a column of another table by LEFT JOINing the latter to the former.
SELECT
jobs.*,
MAX(notes.`timestamp`) AS complete_date
FROM jobs
LEFT JOIN notes ON (jobs.id=notes.job_id)
WHERE (jobs.status="complete" OR jobs.status="closed")
AND (notes.type="complete" OR notes.type IS NULL)
GROUP BY jobs.id
ORDER BY complete_date ASC
I am trying to get all jobs that meet the WHERE jobs.… criteria, and if they have one, the timestamp of the latest type=complete note associated with that job:
Job ID Complete Date
1 today
2 NULL
4 yesterday
Job 3 don't appear because it don't meet the jobs.status criteria. But what I actually get is:
Job ID Complete Date
1 today
4 yesterday
Job 2 is missing, i.e. the JOIN is acting like an INNER JOIN.
I am sure it's just me having a brain-dead moment, but I can't see why my LEFT (OUTER) JOIN is not giving me all jobs regardless of the value of the note.
Specifically, users can delete notes, so potentially a complete/closed job may not have a type=complete note on it (the notes are entered when the status is changed), I am trying to catch the case when a user closes a job, adds a note, then deletes the note.
* somewhere in the east

Since you have the filter for the notes table in the WHERE clause the JOIN is acting like an INNER JOIN, move it to the JOIN condition:
SELECT
jobs.*,
MAX(notes.`timestamp`) AS complete_date
FROM jobs
LEFT JOIN notes
ON (jobs.id=notes.job_id)
AND (notes.type="complete" OR notes.type IS NULL)
WHERE (jobs.status="complete" OR jobs.status="closed")
GROUP BY jobs.id
ORDER BY complete_date ASC;
This could also be done using a subquery, so you apply the notes filter inside the subquery:
SELECT
jobs.*,
n.complete_date
FROM jobs
LEFT JOIN
(
select job_id, MAX(`timestamp`) AS complete_date
from notes
where (type="complete" OR type IS NULL)
group by job_id
) n
ON (jobs.id=n.job_id)
WHERE (jobs.status="complete" OR jobs.status="closed")
ORDER BY complete_date ASC

Related

MySQL View in place of subquery does not return the same result

The query below is grabbing some information about a category of toys and showing the most recent sale price for three levels of condition (e.g., Brand New, Used, Refurbished). The price for each sale is almost always different. One other thing - the sales table row id's are not necessarily in chronological order, e.g., a toy with a sale id of 5 could have happened later than a toy with a sale id of 10).
This query works but is not performant. It runs in a manageable amount of time, usually about 1s. However, I need to add yet another left join to include some more data, which causes the query time to balloon up to about 9s, no bueno.
Here is the working but nonperformant query:
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN (
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
) AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
But like I said it's slow. The sales table has about 200k rows.
What I tried to do was create the subquery as a view, e.g.,
CREATE VIEW sales_view AS
SELECT date_sold, sold_price, catalog_product_id, condition_id
FROM sales
WHERE invalid = 0 AND condition_id <= 3
ORDER BY date_sold DESC
Then replace the subquery with the view, like
SELECT b.brand_name, t.toy_id, t.toy_name, t.toy_number, tt.toy_type_name, cp.catalog_product_id, s.date_sold, s.condition_id, s.sold_price FROM brands AS b
LEFT JOIN toys AS t ON t.brand_id = b.brand_id
JOIN toy_types AS tt ON t.toy_type_id = tt.toy_type_id
LEFT JOIN catalog_products AS cp ON cp.toy_id = t.toy_id
LEFT JOIN toy_category AS tc ON tc.toy_category_id = t.toy_category_id
LEFT JOIN sales_view AS s ON s.catalog_product_id = cp.catalog_product_id
WHERE tc.toy_category_id = 1
GROUP BY t.toy_id, s.condition_id
ORDER BY t.toy_id ASC, s.condition_id ASC
Unfortunately, this change causes the query to no longer grab the most recent sale, and the sales price it returns is no longer the most recent.
Why is it that the table view doesn't return the same result as the same select as a subquery?
After reading just about every top-n-per-group stackoverflow question and blog article I could find, getting a query that actually worked was fantastic. But now that I need to extend the query one more step I'm running into performance issues. If anybody wants to sidestep the above question and offer some ways to optimize the original query, I'm all ears!
Thanks for any and all help.
The solution to the subquery performance issue was to use the answer provided here: Groupwise maximum
I thought that this approach could only be used when querying a single table, but indeed it works even when you've joined many other tables. You just have to left join the same table twice using the s.date_sold < s2.date_sold join condition and make sure the where clause looks for the null value in the second table's id column.

Problem fetching the all the first table records when joining 3 tables

Tables are Service, Plan, Subscribe. The requirement is to get all the records in Service table, but the records should be ordered based on the most subscription counts. I am able to fetch the records based on the joining condition, but unable to fetch the records in Service table which doesn't have the subscription.
Please note. Every Service record has a Plan. But there can be Service, which don't have a Subscription.
**Service**
service_id
service_name
status
is_active
**Plan**
plan_id
service_id
plan_name
plan_cost
**Subscribe**
subscribe_id
plan_id
person_name
SELECT DISTINCT s.service_id FROM service s JOIN plan p on s.service_id=p.service_id join
subscribe su on p.plan_id=su.plan_id WHERE
(s.status='Published' AND s.is_active=1) GROUP BY su.plan_id
ORDER BY COUNT(su.subscribe_id) DESC
Could someone please look into this problem and help.
This should probably be a three way join between the tables, aggregating by subscription, not by plan. Something like this:
SELECT s.*
FROM service s
LEFT JOIN plan p
ON s.service_id = p.service_id
LEFT JOIN subscribe su
ON p.plan_id = su.plan_id
GROUP BY
s.service_id
ORDER BY
COUNT(su.subscribe_id) DESC;
If you also want to select the subscription count, then you may add the COUNT(su.subscribe_id) term to the select clause. Note that aggregating by service_id alone, while selecting all columns from the service table, should be valid here assuming that service_id is a primary key in that table.
well, the simplest answer is that you want a left join so that you can get all records in service + any records in subscribe that match.
SELECT DISTINCT s.service_id
FROM service s
JOIN plan p on s.service_id=p.service_id
left join subscribe su on p.plan_id=su.plan_id
WHERE
(s.status='Published' AND s.is_active=1)
GROUP BY su.plan_id
ORDER BY COUNT(isnull(su.subscribe_id,0)) DESC
also, you probably want to move the count into the select section. like this:
select * from
SELECT s.service_id, COUNT(isnull(su.subscribe_id,0)) as [cnt]
FROM service s
JOIN plan p on s.service_id=p.service_id
left join subscribe su on p.plan_id=su.plan_id
WHERE
(s.status='Published' AND s.is_active=1)
GROUP BY su.plan_id ) A
ORDER BY [cnt] DESC
I used a SQL server convention for column aliases. If you're using a different DBMS just change it accordingly. If you have cases where service doesn't have a plan you should change that to a left join as well.

Mysql Query where max(time) less than today

I have two tables, the first table ( job ) stores the data and the second table ( job_locations ) stores the locations for each job, I'm trying to show the number of jobs that job locations are less than today
I use the DateTime for the Date Column
unfortunately, the numbers that appear after test the next code are wrong
My code
SELECT *
FROM `job`
left join job_location
on job_location.job_id = job.id
where job_location.cutoff_time < CURDATE()
group by job.id
Please help me to write the working Query.
I think you need to rephrase your query slightly. Select a count of jobs where the cutoff time is earlier than the start of today.
SELECT
j.id,
COUNT(CASE WHEN jl.cutoff_time < CURDATE() THEN 1 END) AS cnt
FROM job j
LEFT JOIN job_location jl;
ON j.id = jl.job_id
GROUP BY
j.id;
Note that the left join is important here because it means that we won't drop any jobs having no matching criteria. Instead, those jobs would still appear in the result set, just with a zero count.
As a note, you can simplify the count (in MySQL). And, assuming that all jobs have at least one location, you don't need a JOIN at all. So:
SELECT jl.job_id, sum( jl.cutoff_time < CURDATE() )
FROM job_location jl
GROUP BY jl.job_id;
If this is not correct (and you need the JOIN), then the condition on the date should go in the ON clause:
SELECT jl.job_id, COUNT(jo.job_id)
FROM job LEFT JOIN
job_location jl
ON jl.job_id = j.id AND jl.cutoff_time < CURDATE()
GROUP BY jl.job_id;

SQL check if thread timestamp is newer than reply timestamp in JOIN Statement

So I'm kinda new to SQL joins and was thinking on going full overkill probably.
What I want to do is join my four tables together.
What I want to accomplish is that I want all the information from category, and I want it to be matched to the replies with the newest timestamp and then I want to join the t.title which t.id matches r.thread_id
SELECT c.*, t.id, t.title, r.timestamp, u.id, u.username
FROM forum_category AS c
LEFT JOIN forum_threads AS t ON (c.id = t.category_id)
LEFT JOIN forum_replies AS r ON (t.id = r.thread_id
AND r.timestamp =
(
SELECT timestamp
FROM forum_replies
ORDER BY timestamp DESC LIMIT 1
))
LEFT JOIN users AS u ON (r.user_id = u.id)
GROUP BY c.id
As it is now this code seems to work, not having tested it alot.
However I need to expand it to check if t.timestamp is newer than latest r.timestamp and JOIN that one instead then. with the t.title, t.timestamp and t.user_id.
So if a thread is newer than the latest reply.
I know I could make the first post a reply and solve it that way. But I'm not doing that right now if it's possible to solve in the SQL statement.
SQL layout imgur here:
https://imgur.com/a/nCn2a
forum_category:
forum_threads:
forum_replies:
One helpful technique is to use Subqueries to break up the mental logic of what your query is trying to do. Basically, a subquery takes the place of a regular table in any query.
So, first up, we need to get the most recent time stamp in the replies for each thread:
select thread_id, max(timestamp) as LatestReply
from forum_replies
group by thread_id
Let's call this our MostRecentThreadSubquery. So, it would let us do something like:
select * from
forum_threads t
LEFT JOIN
(
select thread_id, max(timestamp) as LatestReply
from forum_replies
group by thread_id
) as MostRecentThreadSubquery
on t.thread_id = MostRecentThreadSubquery.thread_id
Make sense? We're no longer joining the forum_threads table against the forum_replies table - we've made a subquery to help us list the most recent reply for each thread id.
Now, we add the SQL CASE statement, to get something like:
select
thread_id,
CASE WHEN t.timestamp > MostRecentThreadSubquery.LatestReply
THEN t.timestamp
ELSE MostRecentThreadSubquery.LatestReply
END as MostRecentTimestamp
from -- ... the rest of that earlier SQL statement
Okay, so now we've got a query that, for every thread_id, has the most recent timestamp - whether that's from the forum_replies or from the forum_threads table.
... and you guessed it. We're going to make it another subquery. Let's call it our MostRecentPerThread
select *
from forum_category AS c
LEFT JOIN
(
-- ... that previous query ...
) as MostRecentPerThread
on c.thread_id = MostRecentPerThread.thread_id
Make sense? You're using subqueries as a way of logically breaking down your query into smaller components. You no longer have one gigantic query. You've got a small subquery that simply gets the timestamp of the most recent reply. You've got a small subquery that compares that first subquery to the threads table to get the most recent timestamp. And you've got a main query that uses the second subquery to merge it with the categories table.

SQL Join with data associated to dates

Currently I have a simple SQL request to get aall group departure date and the associated group size (teamLength) between 2 dates but it doesn't work properly.
SELECT `groups`.`departure`, COUNT(`group_users`.`group_id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users`
ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
In fact, if I have more than 1 group between the 2 dates, only 1 date will be recovered in association with the total number of teamLength.
For exemple, if I have 2 groups in the same interval with, for group 1, 2 people and for group 2, 1 people, the result will be:
Here are 2 screenshots of the current state of my groups and group_users tables:
Is it even possible to do what I want in only 1 SQL request ? Thanks
In addition to what jarlh commented (JOIN with ON). Don't ever group data without an explicit GROUP BY. I don't know why MYSQL still allows this...
Change your query to something like this and you should get the result you are looking for. Currently, the other departure dates get lost in the aggregation.
SELECT
groups.departure,
COUNT(1) as team_length
FROM
groups
INNER JOIN group_users
ON groups.id = group_users.group_id
WHERE
groups.departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY
groups.departure
I think that you have a syntax issue in your query. You are missing the ON statement so your database could be trying to get a cartesian product since there is no join clause.
SELECT `groups`.`departure`, COUNT(`group_users`.`id`) as 'teamLength'
FROM `groups`
INNER JOIN `group_users` ON `groups`.`id` = `group_users`.`group_id`
WHERE departure BETWEEN '2017-03-01' AND '2017-03-31'
GROUP BY `groups`.`departure`
You also are missing the GROUP BYclause which is not mandatory in all RDBS but it is a good practice to set it.