How can I find the correct prior status row in this table with a SQL query? - mysql

Imagine a workflow for data entry. Some forms come in, they are typed into a system, reviewed, and hopefully approved. However, they can be rejected by a manager and will have to be entered again.
So, an ideal workflow would go like this:
recieved > entered > approved
But this COULD happen:
received > entered > rejected > entered > rejected > approved
At each stage, we record who updated the form to its current status - who entered it, who rejected it, or who approved it. So the forms status table looks like this:
form_id status updated_by updated_at
1 received Bob (timestamp)
1 entered Bob (timestamp)
1 approved Susan (timestamp)
2 received Bob (timestamp)
2 entered Bob (timestamp)
2 rejected Susan (timestamp)
2 entered Carla (timestamp)
2 rejected Susan (timestamp)
2 entered Sam (timestamp)
2 approved Susan (timestamp)
Here's what I'm trying to do: write a rejection report. I want a row for each rejection, and joined to that row, I want to see who did the work that got rejected.
As a human, I can see that, for a given status row with status 'rejected', the row that will tell me who did the faulty work will be the one that
shares the same form_id and
has a prior timestamp closest to the rejection.
But I'm having trouble telling MySQL that.
Can anybody see how to construct this query?

A subselect ended up working for me.
SELECT
`s1`.`form_id`,
(
SELECT
`s2`.`updated_by`
FROM
statuses s2
WHERE
`s2`.`form_id` = `s1`.`form_id`
AND
`s2`.`updated_at` < `s1`.`updated_at`
ORDER BY
`s2`.`updated_at` DESC
LIMIT 1
) AS 'made_rejected_change'
FROM
statuses s1
WHERE
`s1`.`status` = 'rejected'

Another solution that uses subselect (this time not a correlated subquery):
SELECT
w1.*,
w2.entered_by
FROM (
SELECT
wr.form_id,
wr.updated_at AS rejected_at,
wr.updated_by AS rejected_by,
MAX(we.updated_at) AS entered at
FROM workflow wr
INNER JOIN workflow we ON we.status = 'entered'
AND wr.form_id = we.form_id
AND wr.updated_at > we.updated_at
WHERE wr.status = 'rejected'
GROUP BY
wr.form_id,
wr.updated_at,
wr.updated_by
) w1
INNER JOIN workflow w2 ON w1.form_id = w2.form_id
AND w1.entered_at = w2.updated_at
The subselect lists all the rejecters and the immediately preceding entered timestamps. Then the table is joined once again to extract the names corresponding to the entered_at timestamps.

You want to get the rejected timestamp and then figure out the entry that appeared right before it based on the timestamp. I'm assuming that timestamp actually holds a date/time and isn't an SQL server timestamp field (completely different).
declare #rejectedTimestamp timestamp
select #rejectedTimestamp = timestamp
from table
where status = 'rejected'
select top 1 *
from table
where timestamp < #rejectedtimestamp
order by timestamp desc

Related

SQL select with dynamic count of "BETWEEN" conditions based on joined table

I want to add a messenger to my pet project, but I am having difficulty writing database queries. I use MySQL for this service with Hibernate as ORM. Almost all queries was written in HQL, but in principle I can use native queries.
Messenger can contain group conversations. In addition to writing messages, user can enter the conversation, leave it, clear personal message history. User sees all messages when he has been in a conversation, but he can also clear the history and see only messages after the last clearing.
Below I described the simplified structure of two tables important for this task.
Message table:
ID
text
timestamp
1
first_msg
1609459200
2
second_msg
1609545600
Member_event table:
id
user_id
type
timestamp
1
1
1
1609459100
2
1
3
1609459300
3
1
2
1609459400
4
1
1
1609545500
where type:
1 - user entered the chat,
2 - user leaved the chat,
3 - user cleared his own history of messages in the chat
Is it possible to read all chat messages available to the user with one request?
I have no idea how to check conditions dynamically: WHERE message's timestamps are between all "entered-leaved" cycles and after the last "entered" if not leaved BUT only after the last history clearing. If exists.
I think you could proceed with these steps:
take the union of both tables and consider the records in order of time stamp
Use window functions to determine whether the most recent 1 or 2 type was a 1. We can use a running sum where type 1 adds one and type 2 subtracts one (and 3 does nothing to it). With another window function you could determine whether there is still a type 3 following. The combination of these two informations can be translated to a 1 when the line belongs to an interval that must be collected, and a 0 when not.
Filter the previous result to just get the message records, and only those where the calculation was 1.
Here is the query:
with unified as (
select id, text, timestamp, null as type
from message
union
select id, null, timestamp, type
from member_event
where user_id = 1),
validated as (
select unified.*,
sum(case type when 1 then 1 when 2 then -1 else 0 end)
over (order by timestamp
rows unbounded preceding) *
min(case type when 3 then 0 else 1 end)
over (order by timestamp
rows between current row and unbounded following) valid
from unified
order by timestamp)
select id, text, timestamp
from validated
where type is null and valid = 1
order by timestamp
I do not see, how you could match the Member_event table to the Message_table without an additional FOREIGN_KEY. Are you trying to assign the Messages available to the User via Timestamp?
If so try this:
SELECT * FROM MESSAGE_TABLE m
WHERE m.TIMESTAMP BETWEEN
(SELECT TOP 1 TIMESTAMP FROM MEMBER_EVENT_TABLE WHERE type = 1 ORDER BY TIMESTAMP DESC)
AND (SELECT TOP 1 TIMESTAMP FROM MEMBER_EVENT_TABLE WHERE type != 1 ORDER BY TIMESTAMP DESC)
This at least should show the last Messages between join and clean/leave

MySQL - SQL select query with two tables using where, count and having

There are two tables: client and contract.
client table:
client_code INT pk
status VARCHAR
A client can have 1 or more contracts. The client has a status column which specifies if it has valid contracts - the values are 'active' or 'inactive'. The contract is specified for a client with active status.
contract table:
contract_code INT pk
client_code INT pk
end_date DATE
A contract has an end date. A contract end date before today is an expired contract.
REQUIREMENT: A report requires all active clients with contracts, but with all (not some) contracts having expired date. Some example data is shown below:
Client data:
client_code status
----------------------------------
1 active
2 inactive
3 active
4 active
Contract data:
contract_code client_code end_date
-------------------------------------------------------------
11 1 08-12-2018
12 1 09-12-2018
13 1 10-12-2018
31 3 11-31-2018
32 3 10-30-2018
41 4 01-31-2019
42 4 12-31-2018
Expected result:
client_code
-------------
1
RESULT: This client (client_code = 1) has all contracts with expired dates: 08-12-2018, 09-12-2018 and 10-12-2018.
I need some help to write a SQL query to get this result. I am not sure what constructs I have to use - one can point out what I can try. The database is MySQL 5.5.
One approach uses aggregation. We can join together the client and contract tables, then aggregate by client, checking that, for an active client, there exist no contract end dates which occur in the future.
SELECT
c.client_code
FROM client c
INNER JOIN contract co
ON c.client_code = co.client_code
WHERE
c.status = 'active'
GROUP BY
c.client_code
HAVING
SUM(CASE WHEN co.end_date > CURDATE() THEN 1 ELSE 0 END) = 0;
Demo
Note: I am assuming that your dates are appearing in M-D-Y format simply due to the particular formatting, and that end_date is actually a proper date column. If instead you are storing your dates as text, then we might have to make a call to STR_TO_DATE to convert them to dates first.
Is that what you're looking for?
select clients.client_code
from clients
join contracts
on contracts.client_code=clients.client_code
where status='active'
group by clients.client_code
having min(end_date)>curdate()

mysql highly selective query

I have a data set like this:
User Date Status
Eric 1/1/2015 4
Eric 2/1/2015 2
Eric 3/1/2015 4
Mike 1/1/2015 4
Mike 2/1/2015 4
Mike 3/1/2015 2
I'm trying to write a query in which I will retrieve users whose MOST RECENT transaction status is a 4. If it's not a 4 I don't want to see that user in the results. This dataset could have 2 potential results, one for Eric and one for Mike. However, Mike's most recent transaction was not a 4, therefore:
The return result would be:
User Date Status
Eric 3/1/2015 4
As this record is the only record for Eric that has a 4 as his latest transaction date.
Here's what I've tried so far:
SELECT
user, MAX(date) as dates, status
FROM
orders
GROUP BY
status,
user
This would get me to a unqiue record for every user for every status type. This would be a subquery, and the parent query would look like:
SELECT
user, dates, status
WHERE
status = 4
GROUP BY
user
However, this is clearly flawed as I don't want status = 4 records IF their most recent record is not a 4. I only want status = 4 when the latest date is a 4. Any thoughts?
SELECT user, date
, actualOrders.status
FROM (
SELECT user, MAX(date) as date
FROM orders
GROUP BY user) AS lastOrderDates
INNER JOIN orders AS actualOrders USING (user, date)
WHERE actualOrders.status = 4
;
-- Since USING is being used, there is not a need to specify source of the
-- user and date fields in the SELECT clause; however, if an ON clause was
-- used instead, either table could be used as the source of those fields.
Also, you may want to rethink the field names used if it is not too late and user and date are both found here.
SELECT user, date, status FROM
(
SELECT user, MAX(date) as date, status FROM orders GROUP BY user
)
WHERE status = 4
The easiest way is to include your order table a second time in a subquery in your from clause in order to retrieve the last date for each user. Then you can add a where clause to match the most recent date per user, and finally filter on the status.
select orders.*
from orders,
(
select ord_user, max(ord_date) ord_date
from orders
group by ord_user
) latestdate
where orders.ord_status = 4
and orders.ord_user = latestdate.ord_user
and orders.ord_date = latestdate.ord_date
Another option is to use the over partition clause:
Oracle SQL query: Retrieve latest values per group based on time
Regards,

Count number of action and show last date of action

I am querying an audit database to try and find out how many actions each user has completed and when their last action was.
The query I am using is :
SELECT user_id,
count(id) as actions,
datetime
from auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id
ORDER BY `auditing`.`datetime` DESC
This correctly shows me the total number of items but it does not show the correct last date - the date it does show me it quite random i.e. not at the top or bottom of the list but taken from somewhere in the middle. I checked this for a number of entries produced and they are all wrong and do not reflect the latest action.
How can I get it to show me the last (most recent) event in the above query?
Example:
user_id | actions | datetime
1 | 10 | 2014-07-04 16:10:14
2 | 55 | 2014-07-05 11:15:08
3 | 8 | 2014-07-04 22:19:43
Thanks
You should only SELECT columns that are part of your GROUP BY clause or are a result of an aggregate function. You can and probably should configure your database server so that it would complain about your query. It would say something like:
ERROR 1055 (42000): 'datetime' isn't in GROUP BY
The reason behind it is, that you don't tell the database server which datetime value you want (the earliest, the average, the latest?). So in order to get the last value, try this query:
SELECT user_id, count(id) as actions, max(datetime)
FROM auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id
ORDER BY user_id
You can try with this:
SELECT user_id, COUNT(actions), MAX(datetime)
FROM auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id

How can I find days between different paired rows?

I've been racking my brain about how to do this in one query without PHP code.
In a nutshell, I have a table that records email activity. For the sake of this example, here is the data:
recipient_id activity date
1 delivered 2011-08-30
1 open 2011-08-31
2 delivered 2011-08-30
3 delivered 2011-08-24
3 open 2011-08-30
3 open 2011-08-31
The goal: I want to display to users a single number that tells how many recipients open their email within 24 hours.
E.G. "Users that open their email within 24 hours: 13 Readers"
In the case of the sample data, above, the value would be "1". (Recipient one was delivered an email and opened it the next day. Recipient 2 never opened it and recipient 3 waited 5 days.)
Can anyone think of a way to express the goal in a single query?
Reminder: In order to count, the person must have a 'delivered' tag and at least one 'open' tag. Each 'open' tag only counts once per recipient.
** EDIT ** Sorry, I'm using MySQL
Here is a version in mysql.
select count(distinct recipient_id)
from email e1
where e1.activity = 'delivered'
and exists
(select * from email e2
where e1.recipient_id = e2.recipient_id
and e2.activity = 'open'
and datediff(e2.action_date,e1.action_date) <= 1)
The basic principle is that you want to find a delivered row for a recipient that also has an open within 24 hours.
The datediff() is a good way to do the date arithmetic in mysql -- other dbs will vary on exact methods for this step. The rest of the sql will work anywhere.
SQLFiddle here: http://sqlfiddle.com/#!2/c9116/4
Untested, but should work ;) Don't know which SQL dialect you use, so I've used TSQL DATEDIFF function.
select distinct opened.recipient_id -- or count(distinct opened.recipient_id) if you want to know number
from actions as opened
inner join actions as delivered
on opened.recipient_id = delivered.recipient_id and delivered.activity = 'delivered'
where opened.activity = 'open' and DATEDIFF(day, delivered.date, opened.date) <= 1
Edit: I'd confused opened with delivered - now replaced.
Assumptions: MySql, table is called "TABLE"
Ok, I am not 100% on this, because I don't have a copy of the table to run it against, but I think that you could do something like this:
SELECT COUNT(DISTINCT t1.recipient_id) FROM TABLE t1
INNER JOIN TABLE t2 ON t1.recipient_id = t2.recipient_id AND t1.activity != t2.activity
WHERE t1.activity in ('delivered', 'open') AND t2.activity in ('delivered', 'open')
AND ABS(DATEDIFF(t1.date, t2.date)) = 1
Basically, you are joining a table onto itself, where the activities don't match, but recipient_ids do, and the status is either 'delivered' or 'open'. What you would end up getting, is a result that looks like this:
1 delivered 2011-08-30 1 open 2011-08-31
You are then doing a diff between the two dates (with an absolute value, because we don't know which order they will be in) and making sure that it is equal to 1 (or 24 hours).