MySQL Select Latest Row of Specific Value - mysql

I'm battling to wrap my head around producing a single MySQL query that would heed the correct results.
I've got a table that is structured as followed:
workflow_status_history:
id reference status
1 308ffn3oneb Lead Received
2 308ffn3oneb Quoted
3 308ffn3oneb Invoiced
4 853442ec2fc Lead Received
As you can see, the workflow_status_history table keeps a history of all the statuses of each workflow on our system, rather than replacing or overwriting the previous status with the new status. This helps with in-depth reporting and auditing. A workflow will always have a starting status of Lead Received.
The problem however is that I need to select the reference field of each row in the table who's latest or only status is Lead Received. So in the example above, field number 4 would return, however fields 1, 2 and 3 would not return because the latest status for that workflow reference is Invoiced. But if 853442ec2fc (field number 4) gets a new status other than Lead Received, it also should not return the next time the query runs.
My current query is as followed:
SELECT *, MAX(id) FROM workflow_status_history WHERE 'status' = 'Lead Received' GROUP BY reference LIMIT 20
This, of course, doesn't return the desired result because the WHERE clause ensures that it returns all the rows that have a Lead Received status, irrespective of it being the latest status or not. So it will always return the first 20 grouped workflow references in the table.
How would I go about producing the correct query to return the desired results?
Thanks for your help.

This is a case for a left join with itself. The idea in this query is:
select all references with status 'Lead Received' which do not have a row with the same reference and a higher ID. I assume you only use the id for determining what is the 'newer' status, no timestamp etc.
SELECT
DISTINCT h1.reference
FROM
workflow_status_history h1 LEFT JOIN workflow_status_history h2 ON
h1.reference = h2.reference AND
h1.id < h2.id
WHERE
h1.status = 'Lead Received' AND
h2.id IS NULL

Although #Martin Schneider answer is correct, Below are 2 other ways to achieve expected output
Using inner join on same table
select a.*
from workflow_status_history a
join (
select reference,max(id) id
from workflow_status_history
group by reference
) b using(reference,id)
where a.status = 'Lead Received';
Using correlated sub query
select a.*
from workflow_status_history a
where a.status = 'Lead Received'
and a.id = (select max(id)
from workflow_status_history
where reference = a.reference)
DEMO

Related

Cannot query a sum and compare the sum with another total in a mysql query

I want to check 2 databases to see if the money-payments are the same as the total. That is possible, but I get a very long table:
select
transaction_id
,total_low+total_high a
, sum(money_received) b
from
archive_transaction inner join archive_transaction_payment
on archive_transaction.id=archive_transaction_payment.transaction_id
group by transaction_id;
Actually I only want the transactions where the total is wrong!!
So now I want to add a!=b and that gives an invalid query. How to proceed?
Table archive_transaction has 1 row per transaction, but archive_transaction_payment can have multiple payments for one transaction. This makes it complicated for me.
select
transaction_id
,total_low+total_high a
, sum(money_received) b
from archive_transaction inner join archive_transaction_payment
on archive_transaction.id=archive_transaction_payment.transaction_id
where
a!=b
group by transaction_id;
Joins are still problematic for me, but I found an answer without join to find faults in the database.
SELECT id
FROM archive_transaction a
WHERE total_low + total_high != (SELECT Sum(money_received)
FROM archive_transaction_payment b
WHERE a.id = b.transaction_id);
Now I get a short list of problems in my database. Thanks for helping me out.

Getting count from joined table

Here's my problem: I need to get the amount of test cases and issues associated to a project that meet certain conditions (test cases that are successful, and issues that are flaws of the application), but for some reason the amount doesn't add up. I have 10 test cases in a project, of which 6 are successful; and 8 issues, of which only 4 are flaws. However, the respective results for COUNT each show 24, which makes no sense. I did notice, though, that 24 happens to be 6 times 4, but I don't see how the query would multiply them.
Anyway... Can someone help me find which part of my query is wrong? How can I get the correct result? Thanks in advance.
Here's the query:
SELECT
p.codigo_proyecto,
p.nombre,
IFNULL(COUNT(iep.id_incidencia_etapa_proyecto), 0) AS cantidad_defectos,
IFNULL(COUNT(tc.id_test_case), 0) AS test_cases_exitosos,
CASE IFNULL(COUNT(tc.id_test_case), 0) WHEN 0 THEN 'No aplica'
ELSE CONCAT((IFNULL(COUNT(tc.id_test_case), 0) / IFNULL(COUNT(tc.id_test_case), 0)) * 100, '%') END AS tasa_defectos
FROM proyecto p
INNER JOIN etapa_proyecto ep ON p.codigo_proyecto = ep.codigo_proyecto
INNER JOIN incidencia_etapa_proyecto iep ON ep.id_etapa_proyecto = iep.id_etapa_proyecto
INNER JOIN incidencia i ON iep.id_incidencia = i.id_incidencia
INNER JOIN test_case tc ON ep.id_etapa_proyecto = tc.id_etapa_proyecto
INNER JOIN etapa_proyecto ep_ultima ON ep_ultima.id_etapa_proyecto =
(SELECT ep_ultima2.id_etapa_proyecto FROM etapa_proyecto ep_ultima2
WHERE p.codigo_proyecto = ep_ultima2.codigo_proyecto ORDER BY ep_ultima2.fecha_termino_real DESC LIMIT 1)
WHERE p.esta_cerrado = 1
AND i.es_defecto = 1
AND tc.resultado = 'Exitoso'
AND ep_ultima.fecha_termino_real BETWEEN '2015-01-01' AND '2016-12-31';
I would have thought it obvious that you're not going to get the expected output from an aggregate query without a GROUP BY (which suggests you're not really in a position to evaluate any advice given here effectively).
You've not said how the states of your data are represented in the database - so I'm having to make a lot of guesses based on SQL which is clearly very wrong. And I don't speak spanish/portugese or whatever your native language is.
It looks like you are inferring that a defect exists if the primary key of the defects table is null. Primary keys cannot be null. The only way this would make any sort of sense (BTW it still won't give you the answer you're looking for) is to do a LEFT JOIN rather than an INNER JOIN.
But even then a simple COUNT() will consider null cases (no record in source table) as 1 record in the output set.
Then you've got the problem that you will have the product of defects and test cases in your output - consider the case where you have no defects, but 2 tests cases (1,2) - the result of an outer joiun will be:
defect test
------ ----
null 1
null 2
If you just count the rows, you'll get 2 defects in your output.
Taking a simpler schema, this demonstrates the 2 methods for getting the values - note that they have very different performance characteristics.
SELECT project.id
, dilv.defects
, (SELECT COUNT(*)
FROM test_cases) AS tests
FROM project
LEFT JOIN ( SELECT project_id, COUNT(*) AS defects
FROM defect_table
GROUP BY project_id) AS dilv
ON project.id=dilv.project_id

MySQL Statement with multi nested JOINs and Distinct Limited Ordering

I'm attempting to build a list of results based on three joins
I have created a table of leads, as my sales team takes action on the leads they attach event note records to the leads. 1 lead can have many notes. each note has a timestamp and also a date/time field where they can set a future date in order to schedule call backs and appointments.
I have no trouble building the list, with all my leads associated with their respective event notes, but what I want to do in this particular case is query a smaller list of leads that are associated with only the event note containing the "newest"/highest value in the date_time column.
I've been digging about especially here on stack for the last couple days attempting to get the desired result from my statements. I get either all of the lead records with all of their associated event note records or I get 1, no matter what I utilize ( GROUP BY date_time ASC LIMIT 1) or (ORDER BY date_time ASC LIMIT 1) I've even tried to build a view with only the highest scheduled record for each lead.id.
SELECT
rr_leads.id AS 'Lead',
rr_leads.first,
rr_leads.last,
rr_leads.company,
rr_leads.phone,
rr_leads.email,
rr_leads.city,
rr_leads.zip,
rr_leads.status,
z.noteid,
z.taskid,
z.scheduled,
z.event
FROM rr_leads
LEFT JOIN
(
SELECT
rr_lead_notes.lead_id,
rr_lead_notes.id AS 'noteid',
rr_lead_tasks.id AS 'taskid',
rr_lead_notes.date_time AS 'scheduled',
rr_lead_notes.task_note,
rr_lead_tasks.task_step AS 'event'
FROM rr_lead_notes
LEFT JOIN rr_lead_tasks
ON rr_lead_notes.task_note = rr_lead_tasks.task_step
AND rr_lead_notes.id IS NOT NULL
AND rr_lead_notes.task_note IS NOT NULL
GROUP BY rr_lead_notes.id DESC
) z
ON rr_leads.id = z.lead_id
WHERE rr_leads.id IS NOT NULL
AND z.noteid IS NOT NULL
ORDER BY rr_leads.id DESC
Here is the general idea of getting data associated with a most recent event. You can adjust for your particular situation.
select yourfields
from table1 join othertables etc
join
(select id, max(time_stamp) maxts
from table1
where whatever
group by id) temp on table1.id = temp.id
and table1.time_stamp = maxts
where whatever
Make sure the where clauses in your main query and subquery are the same.

SQL select with inner join, sub select and limit

I've been working with this SQL problem for about 2 days now and suspect I'm very close to resolving the issue but just can't seem to find a solution that completely works.
What I'm attempting to do is a selective join on two tables called application_info and application_status that are used to store information about open access journal article funding requests.
application_info has general information about the applicant and uses an auto indexing field called Application_ID as a key field. application_status is used to track the ongoing information about the status of the application (received, under review, funded, denied, withdrawn, etc.) as well as status of the journal article (submitted, accepted, resubmitted, published or rejected) and contains both an Application_ID field and an auto indexing field called Status_ID along with a status text and status date field.
Because we want to keep a running log of application, article, and funding status changes we don't want to overwrite existing rows in the application_status with updated values, but instead want to only show the most recent status values. Because an application will eventually have more than one status change this creates a need to apply some sort of limit on the inner join of the status data to the application data so that only one row is returned for each application ID.
Here's an example of what I am attempting to do in a query that currently throws an error:
-- simplified example
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status
WHERE Status_State_Text LIKE 'Article Status%'
AND Application_ID = application_info.Application_ID -- how to pass the current application_info.Application_ID from the ON clause to here?
-- and Application_ID = 29 -- this would be an option for specific IDs, but not an option for getting a complete list of application IDs with status
-- GROUP BY Application_ID -- reduces the sub query to 1 row (Yeah!) but returns the first row encountered before the ORDER BY comes into play
ORDER BY Status_ID DESC
-- a GROUP BY after the ORDER BY might resolve the issue if we could do a sort first
LIMIT 1 -- only want to get the first (most recent) row, only works correctly if passing an Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- need to get all IDs with statu values as well as for specific ID requests
;
Eliminating the AND Application_ID = application_info.Application_ID and portion of the sub query along with the LIMIT causes the select to work, but returns a row for every status for a given application ID. I've tried messing with using MIN/MAX operators but have noticed that they return unpredictable rows from the application_status table when they work.
I've also attempted to do sub selects in the ON section of the join, but don't know how to make that work because the end result would always need to return an Application_ID (can both Application_ID and Status_ID be returned and used?).
Any hints on how to get this to work as I'm intending? Can this even be done?
Further edit: working query below. The key was to move the sub query in the join one level deeper and then return just a single status ID.
-- simplified example (now working)
SELECT
application_info.*,
artstatus.Status_ID AS Article_Status_ID,
artstatus.Application_ID AS Article_Application_ID,
artstatus.Status_State_Date AS Article_Status_State_Date,
artstatus.Status_State_Text AS Article_Status_State_Text
FROM application_info
LEFT JOIN (
SELECT
Status_ID,
Application_ID,
Status_State_Text,
Status_State_Date,
Status_State_InitiatedBy,
Status_State_ChangebBy,
Status_State_Notes
FROM application_status AS artstatus_int
WHERE
-- sub query moved one level deeper so current join Application_ID can be passed
-- order by and limit can now be used
Status_ID = (
SELECT status_ID FROM application_status WHERE Application_ID = artstatus_int.Application_ID
AND status_State_Text LIKE 'Article Status%'
ORDER BY Status_ID DESC
LIMIT 1
)
ORDER BY Application_ID, Status_ID DESC
-- no need for GROUP BY or LIMIT here because only one row is returned per Application_ID
) AS artstatus
ON application_info.Application_ID = artstatus.Application_ID
-- WHERE application_info.Application_ID = 29 -- works for specific application ID as well
-- more LEFT JOINS follow
;
You can't have a correlated subquery in the from clause.
Try this idea instead:
select <whatever>
from (select a.*,
(select max(status_id) as maxstatusid
from application_status aps
where aps.application_id = a.application_id
) as maxstatusid
from application
) left outer join
application_status aps
on aps.status_id = a.maxstatusid
. . .
That is, put the correlated subquery in the select clause to get the most recent status. Then join this in to the status table to get other information. And, finish the query with other details.
You seem pretty adept at your SQL skills, so it doesn't seem necessary to rewrite the whole query for you.

How to select distinct rows from a table without a primary key

I need to show a Notification on user login if there is any unread messages. So if multiple users send (5 messages each) while the user is in offline these messages should be shown on login. Means have to show the last messages from each user.
I use joining to find records.
In this scenario Message from User is not a primary key.
This is my query
SELECT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessageConversations.Message, UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
This is the output obtained from above query.
MessageFrom -582 appears twice. I need only one record of this user.
How is it possible
I'm not entirely sure I totally understand your question - but one approach would be to use a CTE (Common Table Expression).
With this CTE, you can partition your data by some criteria - i.e. your MessageFrom - and have SQL Server number all your rows starting at 1 for each of those partitions, ordered by some other criteria - this is the point that's entirely unclear from your question, whether you even care what the rows for each MessageFrom number are sorted on (do you have some kind of a MessageDate or something that you could order by?) ...
So try something like this:
;WITH PartitionedMessages AS
(
SELECT
umc.MessageFrom, umc.MessageFromUserName,
um.MessageTo, umc.IsGroupChat,
umc.IsLocationChat,
umc.Message, um.UserGroupID, um.LocationID ,
ROW_NUMBER() OVER(PARTITION BY umc.MessageFrom
ORDER BY MessageDate DESC) AS 'RowNum' <=== totally unclear yet
FROM
dbo.UserMessageConversations umc
LEFT OUTER JOIN
dbo.UserMessages um ON umc.UserMessageID = um.UserMessageID
WHERE
umc.MessageTo = 743
AND umc.ReadFlag = 0
)
SELECT
MessageFrom, MessageFromUserName, MessageTo,
IsGroupChat, IsLocationChat,
Message, UserGroupID, LocationID
FROM
PartitionedMessages
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each MessageFrom) - ordered by a "imagined" MessageDate column so that the most recent (the newest) message would be selected.
Does that approach what you're looking for??
If you think of them as same rows, I assume you don't care about the message field.
In this case you can use the DISTINCT clause:
SELECT DISTINCT
UserMessageConversations.MessageFrom, UserMessageConversations.MessageFromUserName,
UserMessages.MessageTo, UserMessageConversations.IsGroupChat,
UserMessageConversations.IsLocationChat,
UserMessages.UserGroupID,UserMessages.LocationID
FROM
UserMessageConversations
LEFT OUTER JOIN
UserMessages ON UserMessageConversations.UserMessageID = UserMessages.UserMessageID
WHERE
UserMessageConversations.MessageTo = 743
AND UserMessageConversations.ReadFlag = 0
In general with distinct clause you have a row for every distinct group of row attributes.
If your requirement instead is to show a single field for all the messages (example: every message folded in a single message with a separator between them) you can use an aggregate function, but in SQL Server it seems is not that easy.