SQL select with inner join, sub select and limit - mysql

I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.
What I'm attempting to do is a selective join on two tables called application_info and application_status that are used to store information about open access journal article funding requests.
application_info has general information about the applicant and uses an auto indexing field called Application_ID as a key field. application_status is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID field and an auto indexing field called Status_ID along with a status text and status date field.
Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.
Here's an example of what I am attempting to do in a query that currently throws an error:
-- simplified example
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status
WHERE Status_State_Text LIKE 'Article Status%'
AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
-- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
-- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
ORDER BY Status_ID DESC
-- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;
Eliminating the AND Application_ID = application_info.Application_ID and portion of the sub query along with the LIMIT causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN/MAX operators but have noticed that they return unpredictable rows from the application_status table when they work.
I've also attempted to do sub selects in the ON section of the join, but don't know how to make that work because the end result would always need to return an Application_ID (can both Application_ID and Status_ID be returned and used?).
Any hints on how to get this to work as I'm intending? Can this even be done?
Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.
-- simplified example (now working)
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status AS artstatus_int
WHERE
-- sub query moved one level deeper so current join Application_ID can be passed
-- order by and limit can now be used
Status_ID = (
SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
AND status_State_Text LIKE 'Article Status%'
ORDER BY Status_ID DESC
LIMIT 1
)
ORDER BY Application_ID, Status_ID DESC
-- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well
-- more LEFT JOINS follow
;

You can't have a correlated subquery in the from clause.
Try this idea instead:
select <whatever>
from (select a.*,
(select max(status_id) as maxstatusid
from application_status aps
where aps.application_id = a.application_id
) as maxstatusid
from application
) left outer join
application_status aps
on aps.status_id = a.maxstatusid
. . .
That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.
You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.

Related

How to limit Row result on query when no Primary Key in table

I'm new to SQL as for over 20 years I haven't touched a single code line, so it feels like starting over.
I have a database with two tables, one for Projects, and another one for the Milestones. What I'm trying to achieve is to have a query that will retrieve the latest Milestone logged for each project. That way I can build a report with one project line with the latest update only.
I've managed to build the query to retrieve 1 (One) Milestone Record for each project. However when I've logged more than one update for the same date, the query returns all of them. I've tried to utilize the rowid, but it didn't work.
Here my sample tables:
And the query I've tried to run that currently retrieves more than 1 record when milestone created the same date.
select PROJECT_DATA.PARTNER_NAME as PARTNER_NAME,
PROJECT_DATA.SOLUTION_STATUS as SOLUTION_STATUS,
PROJECT_DATA.STRATEGY_MANAGER as STRATEGY_MANAGER,
PROJECT_DATA.SOLUTION_TYPE as SOLUTION_TYPE,
PROJECT_DATA.INTEGRATION_METHOD as INTEGRATION_METHOD,
PROJECT_MILESTONE.MILESTONE as MILESTONE,
PROJECT_MILESTONE.COMPLETED_ON as COMPLETED_ON,
PROJECT_MILESTONE.NOTES as NOTES
from PROJECT_DATA JOIN PROJECT_MILESTONE PROJECT_MILESTONE ON PROJECT_DATA.ID=PROJECT_MILESTONE.PROJECT_ID
where PROJECT_MILESTONE.COMPLETED_ON = (Select MAX (PROJECT_MILESTONE.COMPLETED_ON)
FROM PROJECT_MILESTONE
WHERE PROJECT_DATA.ID=PROJECT_MILESTONE.PROJECT_ID)
Any help on how to limit the query result to just 1 (newest one) when logged in the same date, will be extremely helpful.
Assuming that Completed on has the time as well along with the date, all you need to do is select top 1 in the ORDER BY DESC
Something like
select top 1 PROJECT_DATA.PARTNER_NAME as PARTNER_NAME,
PROJECT_DATA.SOLUTION_STATUS as SOLUTION_STATUS,
PROJECT_DATA.STRATEGY_MANAGER as STRATEGY_MANAGER,
PROJECT_DATA.SOLUTION_TYPE as SOLUTION_TYPE,
PROJECT_DATA.INTEGRATION_METHOD as INTEGRATION_METHOD,
PROJECT_MILESTONE.MILESTONE as MILESTONE,
PROJECT_MILESTONE.COMPLETED_ON as COMPLETED_ON,
PROJECT_MILESTONE.NOTES as NOTES
from PROJECT_DATA JOIN PROJECT_MILESTONE PROJECT_MILESTONE ON
PROJECT_DATA.ID = PROJECT_MILESTONE.PROJECT_ID
order by COMPLETED_ON DESC
Also, the joining condition has to be specified after the "ON" in joins and then you can use where condition to filter out the data

MySQL Select Latest Row of Specific Value

I'm battling to wrap my head around producing a single MySQL query that would heed the correct results.
I've got a table that is structured as followed:
workflow_status_history:
id reference status
1 308ffn3oneb Lead Received
2 308ffn3oneb Quoted
3 308ffn3oneb Invoiced
4 853442ec2fc Lead Received
As you can see, the workflow_status_history table keeps a history of all the statuses of each workflow on our system, rather than replacing or overwriting the previous status with the new status. This helps with in-depth reporting and auditing. A workflow will always have a starting status of Lead Received.
The problem however is that I need to select the reference field of each row in the table who's latest or only status is Lead Received. So in the example above, field number 4 would return, however fields 1, 2 and 3 would not return because the latest status for that workflow reference is Invoiced. But if 853442ec2fc (field number 4) gets a new status other than Lead Received, it also should not return the next time the query runs.
My current query is as followed:
SELECT *, MAX(id) FROM workflow_status_history WHERE 'status' = 'Lead Received' GROUP BY reference LIMIT 20
This, of course, doesn't return the desired result because the WHERE clause ensures that it returns all the rows that have a Lead Received status, irrespective of it being the latest status or not. So it will always return the first 20 grouped workflow references in the table.
How would I go about producing the correct query to return the desired results?
Thanks for your help.
This is a case for a left join with itself. The idea in this query is:
select all references with status 'Lead Received' which do not have a row with the same reference and a higher ID. I assume you only use the id for determining what is the 'newer' status, no timestamp etc.
SELECT
DISTINCT h1.reference
FROM
workflow_status_history h1 LEFT JOIN workflow_status_history h2 ON
h1.reference = h2.reference AND
h1.id < h2.id
WHERE
h1.status = 'Lead Received' AND
h2.id IS NULL
Although #Martin Schneider answer is correct, Below are 2 other ways to achieve expected output
Using inner join on same table
select a.*
from workflow_status_history a
join (
select reference,max(id) id
from workflow_status_history
group by reference
) b using(reference,id)
where a.status = 'Lead Received';
Using correlated sub query
select a.*
from workflow_status_history a
where a.status = 'Lead Received'
and a.id = (select max(id)
from workflow_status_history
where reference = a.reference)
DEMO

MySQL Statement with multi nested JOINs and Distinct Limited Ordering

I'm attempting to build a list of results based on three joins
I have created a table of leads, as my sales team takes action on the leads they attach event note records to the leads. 1 lead can have many notes. each note has a timestamp and also a date/time field where they can set a future date in order to schedule call backs and appointments.
I have no trouble building the list, with all my leads associated with their respective event notes, but what I want to do in this particular case is query a smaller list of leads that are associated with only the event note containing the "newest"/highest value in the date_time column.
I've been digging about especially here on stack for the last couple days attempting to get the desired result from my statements. I get either all of the lead records with all of their associated event note records or I get 1, no matter what I utilize ( GROUP BY date_time ASC LIMIT 1) or (ORDER BY date_time ASC LIMIT 1) I've even tried to build a view with only the highest scheduled record for each lead.id.
SELECT
rr_leads.id AS 'Lead',
rr_leads.first,
rr_leads.last,
rr_leads.company,
rr_leads.phone,
rr_leads.email,
rr_leads.city,
rr_leads.zip,
rr_leads.status,
z.noteid,
z.taskid,
z.scheduled,
z.event
FROM rr_leads
LEFT JOIN
(
SELECT
rr_lead_notes.lead_id,
rr_lead_notes.id AS 'noteid',
rr_lead_tasks.id AS 'taskid',
rr_lead_notes.date_time AS 'scheduled',
rr_lead_notes.task_note,
rr_lead_tasks.task_step AS 'event'
FROM rr_lead_notes
LEFT JOIN rr_lead_tasks
ON rr_lead_notes.task_note = rr_lead_tasks.task_step
AND rr_lead_notes.id IS NOT NULL
AND rr_lead_notes.task_note IS NOT NULL
GROUP BY rr_lead_notes.id DESC
) z
ON rr_leads.id = z.lead_id
WHERE rr_leads.id IS NOT NULL
AND z.noteid IS NOT NULL
ORDER BY rr_leads.id DESC
Here is the general idea of getting data associated with a most recent event. You can adjust for your particular situation.
select yourfields
from table1 join othertables etc
join
(select id, max(time_stamp) maxts
from table1
where whatever
group by id) temp on table1.id = temp.id
and table1.time_stamp = maxts
where whatever
Make sure the where clauses in your main query and subquery are the same.

Join on 3 tables insanely slow on giant tables

I have a query which goes like this:
SELECT insanlyBigTable.description_short,
insanlyBigTable.id AS insanlyBigTable,
insanlyBigTable.type AS insanlyBigTableLol,
catalogpartner.id AS catalogpartner_id
FROM insanlyBigTable
INNER JOIN smallerTable ON smallerTable.id = insanlyBigTable.catalog_id
INNER JOIN smallerTable1 ON smallerTable1.catalog_id = smallerTable.id
AND smallerTable1.buyer_id = 'xxx'
WHERE smallerTable1.cont = 'Y' AND insanlyBigTable.type IN ('111','222','33')
GROUP BY smallerTable.id;
Now, when I run the query first time it copies the giant table into a temp table... I want to know how I can prevent that? I am considering a nested query, or even to reverse the join (not sure the effect would be to run faster), but that is well, not nice. Any other suggestions?
To figure out how to optimize your query, we first have to boil down exactly what it is selecting so that we can preserve that information while we change things around.
What your query does
So, it looks like we need the following
The GROUP BY clause limits the results to at most one row per catalog_id
smallerTable1.cont = 'Y', insanelyBigTable.type IN ('111','222','33'), and buyer_id = 'xxx' appear to be the filters on the query.
And we want data from insanlyBigTable and ... catalogpartner? I would guess that catalogpartner is smallerTable1, due to the id of smallerTable being linked to the catalog_id of the other tables.
I'm not sure on what the purpose of including the buyer_id filter on the ON clause was for, but unless you tell me differently, I'll assume the fact it is on the ON clause is unimportant.
The point of the query
I am unsure about the intent of the query, based on that GROUP BY statement. You will obtain just one row per catalog_id in the insanelyBigTable, but you don't appear to care which row it is. Indeed, the fact that you can run this query at all is due to a special non-standard feature in MySQL that lets you SELECT columns that do not appear in the GROUP BY statement... however, you don't get to select WHICH columns. This means you could have information from 4 different rows for each of your selected items.
My best guess, based on column names, is that you are trying to bring back a list of items that are in the same catalog as something that was purchased by a given buyer, but without any more than one item per catalog. In addition, you want something to connect back to the purchased item in that catalog, via the catalogpartner table's id.
So, something probably akin to amazon's "You may like these items because you purchased these other items" feature.
The new query
We want 1 row per insanlyBigTable.catalog_id, based on which catalog_id exists in smallerTable1, after filtering.
SELECT
ibt.description_short,
ibt.id AS insanlyBigTable,
ibt.type AS insanlyBigTableLol,
(
SELECT smallerTable1.id FROM smallerTable1 st
WHERE st.buyer_id = 'xxx'
AND st.cont = 'Y'
AND st.catalog_id = ibt.catalog_id
LIMIT 1
) AS catalogpartner_id
FROM insanlyBigTable ibt
WHERE ibt.id IN (
SELECT (
SELECT ibt.id AS ibt_id
FROM insanlyBigTable ibt
WHERE ibt.catalog_id = sti.catalog_id
LIMIT 1
) AS ibt_id
FROM (
SELECT DISTINCT(catalog_id) FROM smallerTable1 st
WHERE st.buyer_id = 'xxx'
AND st.cont = 'Y'
AND EXISTS (
SELECT * FROM insanlyBigTable ibt
WHERE ibt.type IN ('111','222','33')
AND ibt.catalog_id = st.catalog_id
)
) AS sti
)
This query should generate the same result as your original query, but it breaks things down into smaller queries to avoid the use (and abuse) of the GROUP BY clause on the insanlyBigTable.
Give it a try and let me know if you run into problems.

How to select distinct rows from a table without a primary key

I need to show a Notification on user login if there is any unread messages. So if multiple users send (5 messages each) while the user is in offline these messages should be shown on login. Means have to show the last messages from each user.
I use joining to find records.
In this scenario Message from User is not a primary key.
This is my query
SELECT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessageConversations.Message, UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
This is the output obtained from above query.
MessageFrom -582 appears twice. I need only one record of this user.
How is it possible
I'm not entirely sure I totally understand your question - but one approach would be to use a CTE (Common Table Expression).
With this CTE, you can partition your data by some criteria - i.e. your MessageFrom - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria - this is the point that's entirely unclear from your question, whether you even care what the rows for each MessageFrom number are sorted on (do you have some kind of a MessageDate or something that you could order by?) ...
So try something like this:
;WITH PartitionedMessages AS
(
SELECT
umc.MessageFrom, umc.MessageFromUserName,
um.MessageTo, umc.IsGroupChat,
umc.IsLocationChat,
umc.Message, um.UserGroupID, um.LocationID ,
ROW_NUMBER() OVER(PARTITION BY umc.MessageFrom
ORDER BY MessageDate DESC) AS 'RowNum' <=== totally unclear yet
FROM
dbo.UserMessageConversations umc
LEFT OUTER JOIN
dbo.UserMessages um ON umc.UserMessageID = um.UserMessageID
WHERE
umc.MessageTo = 743
AND umc.ReadFlag = 0
)
SELECT
MessageFrom, MessageFromUserName, MessageTo,
IsGroupChat, IsLocationChat,
Message, UserGroupID, LocationID
FROM
PartitionedMessages
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each MessageFrom) - ordered by a "imagined" MessageDate column so that the most recent (the newest) message would be selected.
Does that approach what you're looking for??
If you think of them as same rows, I assume you don't care about the message field.
In this case you can use the DISTINCT clause:
SELECT DISTINCT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
In general with distinct clause you have a row for every distinct group of row attributes.
If your requirement instead is to show a single field for all the messages (example: every message folded in a single message with a separator between them) you can use an aggregate function, but in SQL Server it seems is not that easy.