MySQL Statement with multi nested JOINs and Distinct Limited Ordering - mysql

I'm attempting to build a list of results based on three joins
I have created a table of leads, as my sales team takes action on the leads they attach event note records to the leads. 1 lead can have many notes. each note has a timestamp and also a date/time field where they can set a future date in order to schedule call backs and appointments.
I have no trouble building the list, with all my leads associated with their respective event notes, but what I want to do in this particular case is query a smaller list of leads that are associated with only the event note containing the "newest"/highest value in the date_time column.
I've been digging about especially here on stack for the last couple days attempting to get the desired result from my statements. I get either all of the lead records with all of their associated event note records or I get 1, no matter what I utilize ( GROUP BY date_time ASC LIMIT 1) or (ORDER BY date_time ASC LIMIT 1) I've even tried to build a view with only the highest scheduled record for each lead.id.
SELECT
rr_leads.id AS 'Lead',
rr_leads.first,
rr_leads.last,
rr_leads.company,
rr_leads.phone,
rr_leads.email,
rr_leads.city,
rr_leads.zip,
rr_leads.status,
z.noteid,
z.taskid,
z.scheduled,
z.event
FROM rr_leads
LEFT JOIN
(
SELECT
rr_lead_notes.lead_id,
rr_lead_notes.id AS 'noteid',
rr_lead_tasks.id AS 'taskid',
rr_lead_notes.date_time AS 'scheduled',
rr_lead_notes.task_note,
rr_lead_tasks.task_step AS 'event'
FROM rr_lead_notes
LEFT JOIN rr_lead_tasks
ON rr_lead_notes.task_note = rr_lead_tasks.task_step
AND rr_lead_notes.id IS NOT NULL
AND rr_lead_notes.task_note IS NOT NULL
GROUP BY rr_lead_notes.id DESC
) z
ON rr_leads.id = z.lead_id
WHERE rr_leads.id IS NOT NULL
AND z.noteid IS NOT NULL
ORDER BY rr_leads.id DESC

Here is the general idea of getting data associated with a most recent event. You can adjust for your particular situation.
select yourfields
from table1 join othertables etc
join
(select id, max(time_stamp) maxts
from table1
where whatever
group by id) temp on table1.id = temp.id
and table1.time_stamp = maxts
where whatever
Make sure the where clauses in your main query and subquery are the same.

Related

How to select max date with other columns grouped by only one specific column

I want to select the latest assignee for each order before PRODUCTION_TASK_COMPLETED (or the current one if the production task is not completed)
I can't find a way to get the latest assignee. As soon as the assignee changes, since I have to put all the columns in the group by clause which are not aggregated.
Here is something clause to what I want in MySQL (I need it in jooq but I'll rewrite it later)
SELECT
t_user.first_name,
t_user.last_name,
assigneehistory1.order_id,
MAX(assigneehistory1.entry_date)
FROM
t_entity_history_entry AS assigneehistory1
LEFT OUTER JOIN
t_user ON t_user.id = assigneehistory1.assignee_id
WHERE
(assigneehistory1.entry_type = 'ORDER_ASSIGNED'
AND assigneehistory1.entry_date <= COALESCE((SELECT
MAX(assigneehistory2.entry_date)
FROM
t_entity_history_entry AS assigneehistory2
WHERE
(assigneehistory2.entry_type = 'PRODUCTION_TASK_COMPLETED'
AND assigneehistory2.order_id = assigneehistory1.order_id)),
NOW()))
GROUP BY t_user.first_name , t_user.last_name , assigneehistory1.order_id
But as I said, this returns multiple rows for the same order for each different assignee, whereas I just want the latest one.

Incorrect ordering on query with group by clause

So I have the following query:
SELECT sensor.id as `sensor_id`,
sensor_reading.id as `reading_id`,
sensor_reading.reading as `reading`,
from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
sensor_reading.lower_threshold as `lower_threshold`,
sensor_reading.upper_threshold as `upper_threshold`,
sensor_type.units as `unit`
FROM sensor
LEFT JOIN sensor_reading ON sensor_reading.sensor_id = sensor.id
LEFT JOIN sensor_type ON sensor.sensor_type_id = sensor_type.id
WHERE sensor.company_id = 1
GROUP BY sensor_reading.sensor_id
ORDER BY sensor_reading.reading_timestamp DESC
There are three tables in play here. A sensor_type table, which is just used for a single display field (units), a sensor table, which contains information on a sensor, and a sensor_reading table, which contains the individual readings for a sensor. There are multiple readings which apply to a single sensor, and so each entry in the sensor_reading table has a sensor_id which is linked to the ID field in the sensor table with a foreign key constraint.
In theory, this query should return the most recent sensor_reading for EACH unique sensor. Instead, it's returning the first reading for each sensor instead. I've seen a few posts on here with similar issues, but haven't been able to resolve this using any of their answers. Ideally, the query needs to be as efficient as possible, as this table has several thousand readings (and continues to grow).
Does anyone know how I might change this query to return the most recent reading? If I remove the GROUP BY clause, it returns the right order, but I then have to sift through the data to get the most recent for each sensor.
Ideally, I don't want to run sub-queries as this slows things down a lot, and speed is a big factor here.
Thanks!
In theory, this query should return the most recent sensor_reading for EACH unique sensor.
This is a fairly common misconception with the MySQL Group by extension, that allows you to select columns with no aggregation that are not contained in the group by clause. What the documentation states is:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause
So since you are grouping by sensor_reading.sensor_id, MySQL will chose any row from sensor_reading for each sensor_id, then after choosing one row for each sensor_id it will then apply the ordering to the rows that are chosen.
Since you only want the latest row for each sensor, the general approach would be:
SELECT *
FROM sensor_reading AS sr
WHERE NOT EXISTS
( SELECT 1
FROM sensor_reading AS sr2
WHERE sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp > sr.reading_timestamp
);
However, MySQL will optimise LEFT JOIN/IS NULL better than NOT EXISTS so a MySQL specific solution would be:
SELECT sr.*
FROM sensor_reading AS sr
LEFT JOIN sensor_reading AS sr2
ON sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp > sr.reading_timestamp
WHERE sr2.id IS NULL;
So incorporating this into your query, you would end up with:
SELECT sensor.id as `sensor_id`,
sensor_reading.id as `reading_id`,
sensor_reading.reading as `reading`,
from_unixtime(sensor_reading.reading_timestamp) as `reading_timestamp`,
sensor_reading.lower_threshold as `lower_threshold`,
sensor_reading.upper_threshold as `upper_threshold`,
sensor_type.units as `unit`
FROM sensor
LEFT JOIN sensor_reading
ON sensor_reading.sensor_id = sensor.id
LEFT JOIN sensor_type
ON sensor.sensor_type_id = sensor_type.id
LEFT JOIN sensor_reading AS sr2
ON sr2.sensor_id = sensor_reading.sensor_id
AND sr2.reading_timestamp > sensor_reading.reading_timestamp
WHERE sensor.company_id = 1
AND sr2.id IS NULL
ORDER BY sensor_reading.reading_timestamp DESC;
An alternative method for getting the maximum per group is to inner join back to the latest row, so something like:
SELECT sr.*
FROM sensor_reading AS sr
INNER JOIN
( SELECT sensor_id, MAX(reading_timestamp) AS reading_timestamp
FROM sensor_reading
GROUP BY sensor_id
) AS sr2
ON sr2.sensor_id = sr.sensor_id
AND sr2.reading_timestamp = sr.reading_timestamp;
You may find that this is more efficient than the other method, or you may not, YMMV. It basically depends on your data and indexes, and as you have said, subqueries can be an issue in MySQL due to the fact that the full result is matierialised initially.

join 2 tables and sort using max value of one of them

I have talks table, and talk_items table.
Talk has a is_performed, created_at and family_id fields.
Talk_item has created_at field.
(both have more fields but they can be ignored for the question)
A talk can have talk_items and can have none.
I need to sort talks first by the is_performed field (talks that were not performed should be first) and then by the latest date of a talk_item if there is, and if there isn't, by the creation date of the talk itself.
I have this query:
SELECT talks.*,
IFNULL(talk_items.created_at, talks.created_at) AS
sorting_date
FROM `talks`
LEFT JOIN talk_items
ON talk_items.talk_id = talks.id
WHERE talks.family_id = 35536
ORDER BY talks.is_performed ASC,
sorting_date DESC
But this query gives me multiple talks if a talk has more than one talk_item.
How can I get the DISTINCT talk and the date of the most recent talk_item of that talk (the id of the talk_item can be used for that)?
Database id MYSQL.
Thx.
Doesn't adding a
group by talks.id
solve this?
Otherwise something like this:
SELECT talks.*,
IFNULL(talk_items.created_at, talks.created_at) AS
sorting_date
FROM `talks`
LEFT JOIN
(SELECT * FROM talk_items group by talk_id) as items
ON items.talk_id = talks.id
WHERE talks.family_id = 35536
ORDER BY talks.is_performed ASC,
sorting_date DESC

Complex SQL query

I have need to write a query which is a little complicated for me to put together. The basic idea is to match a couple of fields from different tables and then edit another table based on the result.
There are three tables involved:
Schedules: sch_id, date, schedule, event_id
Link_Location_Schedules: id, loc_id, sch_id
Link_Location_Events: id, loc_id, event_id
Now what I need to try and do is:
find schedules that are set after todays date in "Schedules".
for these schedules get location ids from Link_Location_Events where event_ids equal the schedule event id.
for each of the matched schedules (sch_id) and returned locations (loc_id) check if the pair already exist in the Link_Location_Schedules, if not insert them.
Here are some SQL queries I have done for the above, I just need to combine them some how:
SELECT sch_id FROM 'Schedules' WHERE DATE_FORMAT(sports_schedule_insert_date_time, "%Y-%m-%d") >= '2012-11-14';
SELECT loc_id from Link_Location_Events, Schedules WHERE Link_Location_Events.event_id = Schedules.event_id;
sounds like a simple insert from select statement...
insert into Link_Location_Schedules
( loc_id,
sch_id )
select
PreQuery.loc_id,
PreQuery.sch_id
from
( select
s.sch_id,
lle.loc_id
from
Schedules s
join Link_Location_Events lle
on s.event_id = lle.event_id ) PreQuery
LEFT JOIN Link_Location_Schedules lls
on PreQuery.loc_id = lls.loc_id
and PreQuery.sch_id = lls.sch_id
where
lls.loc_id is null
The innermost prequery is to get all possible schedule / location IDs. From that, left-join to the existing location/schedules on those found. Then, the WHERE clause will return only those where NO MATCH WAS FOUND (thus the lls.loc_id is null). Take that result and insert directly into the schedule / location table.

SQL select with inner join, sub select and limit

I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.
What I'm attempting to do is a selective join on two tables called application_info and application_status that are used to store information about open access journal article funding requests.
application_info has general information about the applicant and uses an auto indexing field called Application_ID as a key field. application_status is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID field and an auto indexing field called Status_ID along with a status text and status date field.
Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.
Here's an example of what I am attempting to do in a query that currently throws an error:
-- simplified example
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status
WHERE Status_State_Text LIKE 'Article Status%'
AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
-- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
-- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
ORDER BY Status_ID DESC
-- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;
Eliminating the AND Application_ID = application_info.Application_ID and portion of the sub query along with the LIMIT causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN/MAX operators but have noticed that they return unpredictable rows from the application_status table when they work.
I've also attempted to do sub selects in the ON section of the join, but don't know how to make that work because the end result would always need to return an Application_ID (can both Application_ID and Status_ID be returned and used?).
Any hints on how to get this to work as I'm intending? Can this even be done?
Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.
-- simplified example (now working)
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status AS artstatus_int
WHERE
-- sub query moved one level deeper so current join Application_ID can be passed
-- order by and limit can now be used
Status_ID = (
SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
AND status_State_Text LIKE 'Article Status%'
ORDER BY Status_ID DESC
LIMIT 1
)
ORDER BY Application_ID, Status_ID DESC
-- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well
-- more LEFT JOINS follow
;
You can't have a correlated subquery in the from clause.
Try this idea instead:
select <whatever>
from (select a.*,
(select max(status_id) as maxstatusid
from application_status aps
where aps.application_id = a.application_id
) as maxstatusid
from application
) left outer join
application_status aps
on aps.status_id = a.maxstatusid
. . .
That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.
You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.