mysql highly selective query - mysql

I have a data set like this:
User Date Status
Eric 1/1/2015 4
Eric 2/1/2015 2
Eric 3/1/2015 4
Mike 1/1/2015 4
Mike 2/1/2015 4
Mike 3/1/2015 2
I'm trying to write a query in which I will retrieve users whose MOST RECENT transaction status is a 4. If it's not a 4 I don't want to see that user in the results. This dataset could have 2 potential results, one for Eric and one for Mike. However, Mike's most recent transaction was not a 4, therefore:
The return result would be:
User Date Status
Eric 3/1/2015 4
As this record is the only record for Eric that has a 4 as his latest transaction date.
Here's what I've tried so far:
SELECT
user, MAX(date) as dates, status
FROM
orders
GROUP BY
status,
user
This would get me to a unqiue record for every user for every status type. This would be a subquery, and the parent query would look like:
SELECT
user, dates, status
WHERE
status = 4
GROUP BY
user
However, this is clearly flawed as I don't want status = 4 records IF their most recent record is not a 4. I only want status = 4 when the latest date is a 4. Any thoughts?

SELECT user, date
, actualOrders.status
FROM (
SELECT user, MAX(date) as date
FROM orders
GROUP BY user) AS lastOrderDates
INNER JOIN orders AS actualOrders USING (user, date)
WHERE actualOrders.status = 4
;
-- Since USING is being used, there is not a need to specify source of the
-- user and date fields in the SELECT clause; however, if an ON clause was
-- used instead, either table could be used as the source of those fields.
Also, you may want to rethink the field names used if it is not too late and user and date are both found here.

SELECT user, date, status FROM
(
SELECT user, MAX(date) as date, status FROM orders GROUP BY user
)
WHERE status = 4

The easiest way is to include your order table a second time in a subquery in your from clause in order to retrieve the last date for each user. Then you can add a where clause to match the most recent date per user, and finally filter on the status.
select orders.*
from orders,
(
select ord_user, max(ord_date) ord_date
from orders
group by ord_user
) latestdate
where orders.ord_status = 4
and orders.ord_user = latestdate.ord_user
and orders.ord_date = latestdate.ord_date
Another option is to use the over partition clause:
Oracle SQL query: Retrieve latest values per group based on time
Regards,

Related

Efficiently get latest appointment for every person sorted by oldest first

I already asked this question earlier but forgot a few (important) details or got them wrong.
My table in MySQL 8.0.29 looks like this
UserID
Appointment
Description
Bob
2022-06-01
Cleaning
Bob
2022-06-03
Toothache
John
2022-06-02
Braces
I'm trying to get the latest appointment for every person sorted by oldest first.
The query should return
UserID
Appointment
Description
John
2022-06-02
Braces
Bob
2022-06-03
Toothache
Using one of the previous answers I get
SELECT Name, Appointment, Description
FROM (
SELECT Name, Appointment, Description, ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Appointment DESC) rn) t1
WHERE rn = 1
The problem is the database currently has 3 million rows and it'll continue to grow so this query ends up being pretty slow.
My plan is to consume the data in chunks so I'd prefer the query having "pagination". Something like a LIMIT 0, 5000 to get 5000 records at a time.
I'm open to even re-architecting the database if it comes to that.
For now i've resorted to creating a new table that just keeps the latest appointment for each user.
You are halfway there. Use that query as a 'derived table' instead of making it permanent:
SELECT b.*
FROM ( SELECT user_id, MAX(appointment) AS last_date)
FROM tbl
GROUP BY user_id ) AS x
JOIN tbl AS b ON b.user_id = x.user_id
AND b.appointment = x.last_date
And be sure to have INDEX(user_id, appointment)
I would be interested to see if this and the "OVER" approach both give the same results and which is faster.

Count number of action and show last date of action

I am querying an audit database to try and find out how many actions each user has completed and when their last action was.
The query I am using is :
SELECT user_id,
count(id) as actions,
datetime
from auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id
ORDER BY `auditing`.`datetime` DESC
This correctly shows me the total number of items but it does not show the correct last date - the date it does show me it quite random i.e. not at the top or bottom of the list but taken from somewhere in the middle. I checked this for a number of entries produced and they are all wrong and do not reflect the latest action.
How can I get it to show me the last (most recent) event in the above query?
Example:
user_id | actions | datetime
1 | 10 | 2014-07-04 16:10:14
2 | 55 | 2014-07-05 11:15:08
3 | 8 | 2014-07-04 22:19:43
Thanks
You should only SELECT columns that are part of your GROUP BY clause or are a result of an aggregate function. You can and probably should configure your database server so that it would complain about your query. It would say something like:
ERROR 1055 (42000): 'datetime' isn't in GROUP BY
The reason behind it is, that you don't tell the database server which datetime value you want (the earliest, the average, the latest?). So in order to get the last value, try this query:
SELECT user_id, count(id) as actions, max(datetime)
FROM auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id
ORDER BY user_id
You can try with this:
SELECT user_id, COUNT(actions), MAX(datetime)
FROM auditing
WHERE datetime>='2014-03-01 00:00:00'
GROUP BY user_id

Multiple distinct counts with where

I am having an issue creating most efficient query for multiple distinct counts of a column with different where clauses. My MYSQL table looks like this:
id client_id result timestamp
---------------------------------------------------
1 1234566 escalated 2014-01-02 00:00:00
2 1233344 approved 2014-02-03 00:00:00
3 1234566 escalated 2014-01-02 01:00:00
What I am trying to achieve is to build the following data in the return:
Total number of unique client IDs processed from the beginning of time.
Total number of unique client IDs processed escalated from the beginning of time.
Total number of unique client IDs processed approved from the beginning of time.
Count of unique client IDs approved within specified timeframe using between statement on timestamp.
Count of unique client IDs escalated within specified timeframe using between statement on timestamp.
I have thought about running multiple selects, but I think it would be a waste of resources, and possibly if this could be done with a single query it would the best way to handle it, unfortunately my experience is lacking in this area. What I would like would the return to simple contain an alias and the count.
Any help would be appreciated.
You want conditional aggregation, something like:
select count(distinct ClientId) as NumClients,
count(distinct case when result = 'Approved' then ClientId end) as NumApproved,
count(distinct case when result = 'Escalated' then ClientId end) as NumEscalated,
count(distinct case when result = 'Approved' and timestamp between #Time1 and #Time2
then ClientId end) as NumApproved,
count(distinct case when result = 'Escalated' and timestamp between #Time1 and #Time2
then ClientId end) as NumEscalated,
from table t;

Selecting most recent as part of group by (or other solution ...)

I've got a table where the columns that matter look like this:
username
source
description
My goal is to get the 10 most recent records where a user/source combination is unique. From the following data:
1 katie facebook loved it!
2 katie facebook it could have been better.
3 tom twitter less then 140
4 katie twitter Wowzers!
The query should return records 2,3 and 4 (assume higher IDs are more recent - the actual table uses a timestamp column).
My current solution 'works' but requires 1 select to generate the 10 records, then 1 select to get the proper description per row (so 11 selects to generate 10 records) ... I have to imagine there's a better way to go. That solution is:
SELECT max(id) as MAX_ID, username, source, topic
FROM events
GROUP BY source, username
ORDER BY MAX_ID desc;
It returns the proper ids, but the wrong descriptions so I can then select the proper descriptions by the record ID.
Untested, but you should be able to handle this with a join:
SELECT
fullEvent.id,
fullEvent.username,
fullEvent.source,
fullEvent.topic
FROM
events fullEvent JOIN
(
SELECT max(id) as MAX_ID, username, source
FROM events
GROUP BY source, username
) maxEvent ON maxEvent.MAX_ID = fullEvent.id
ORDER BY fullEvent.id desc;

mysql first record retrieval

While very easy to do in Perl or PHP, I cannot figure how to use mysql only to extract the first unique occurence of a record.
For example, given the following table:
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-08-12 09:22:37 2000
John 2010-09-13 10:22:22 500
Sue 2010-09-01 09:07:21 1000
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
I would like to extract the first record for each individual in asc time order.
After sorting the table is ascending time order, I need to extract the first unique record by name.
So the output would be :
Name Date Time Sale
John 2010-09-12 10:22:22 500
Bill 2010-07-25 11:23:23 2000
Sue 2010-06-24 13:23:45 1000
Is this doable in an easy fashion with mySQL?
I think that something along the lines of
select name, date, time, sale from mytable order by date, time group by name;
will get you what you're looking for
you need to perform a groupwise max or groupwise min
see below or http://pastie.org/973117 for an example
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;
In databases, there really is no "first" or "last" record; think of each record as its own, non-positional entity in the table. The only positions they have are when you give them one, say, using ORDER BY.
This will give you what you want. It might not be efficient, but it works.
select Name, Date, Time, Sale from
(select Name, Date, Time, Sale from MyTable
order by Date asc, Time asc) MyTable_subquery_name
group by Name
Note: MyTable_subquery_name is just a dummy name for the subquery. MySQL will give the error ERROR 1248 (42000): Every derived table must have its own alias without it.
If only GROUP BY and ORDER BY were communicative operations, then this wouldn't have to be a subquery.