This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 2 years ago.
I came across this interesting problem. I have a table named email_track to track email status for each category say (invitation, newsletter)
This is how my table data looks,
With these following queries I'm able to get most recent record for each to_email,
with `et2` as (
select `et1`.`category`, `et1`.`to_email`, `et1`.`subject`, `et1`.`status`, ROW_NUMBER() OVER (partition by `to_email` order by `id` desc) as `rn`
from `email_track` `et1`
)
select * from `et2` where `rn` = 1;
select `et1`.`category`, `et1`.`to_email`, `et1`.`subject`, `et1`.`status`, `et2`.`id`
from `email_track` `et1`
left join `email_track` `et2` on (`et1`.`to_email` = `et2`.`to_email` and `et1`.`id` < `et2`.`id`)
where `et2`.`id` is null;
What I'm expecting is for email john#example.com I should get two records one for category invitation and the other for the newsletter. Now, we won't get that result since we partition by to_email
I should get two records one for category invitation and the other for the newsletter. Now, we won't get that result since we partition by to_email.
Adding the category to the partition by clause of the window function should be enough to give your the result that you want:
with et2 as (
select et1.category, et1.to_email, et1.subject, et1.status,
row_number() over(partition by to_email, category order by id desc) as rn
from email_track et1
)
select * from et2 where rn = 1;
Related
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 2 years ago.
I'm trying to run a distinct on four columns in the query below:
select
full_records.id,
full_records.domain_id,
subdomains.name as subdomain_name,
types.name as type_name,
changelog.content as content,
changelog.changed_on
from full_records
inner join subdomains on full_records.subdomain_id = subdomains.id
inner join types on full_records.type_id = types.id
inner join changelog on full_records.id = changelog.full_record_id
where
full_records.domain_id = 2
order by changelog.changed_on desc
and this returns the following:
I'm not sure how to go about altering the query so that it only returns the records that are unique across these four fields.
full_records.domain_id,
subdomains.name as subdomain_name,
types.name as type_name,
changelog.content as content
So if they were unique across those four fields, the rows 2, 3, 4 and 7 would not be in the results. It's basically to identify the latest change for a domain record. Any help would be really appreciated. Thanks.
One pretty simple method is row_number():
with cte as (
select fr.id, fr.domain_id, sd.name as subdomain_name,
t.name as type_name, cl.content, cl.changed_on
from full_records fr join
subdomains sd
on fr.subdomain_id = sd.id join
types t
on fr.type_id = t.id join
changelog cl
on fr.id = cl.full_record_id
where fr.domain_id = 2
)
select cte.*
from (select cte.*,
row_number() over (partition by domain_id, subdomain_name, type_name, content
order by changed_on desc
) as seqnum
from cte
) cte
where seqnum = 1;
Note that I added table aliases so the query is easier to write and to read.
I have the below query:
SELECT users_service.id, name
FROM users_service
LEFT JOIN
(SELECT * FROM activity)
activity ON (users_service.id = activity.user_service_id)
WHERE admin_id = 1
However, this returns as many results from the activity table as exist, ie multiple activity results for each admin_id entry.
I desire to return only the latest row from the activity table for each admin_id.
This could be entry_date or id.
I tried using distinct & max and limit 1, but these all produced strange behavior.
Use ROW_NUMBER():
SELECT us.id, a.name
FROM users_service us LEFT JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY a.user_service id ORDER BY ? DESC) as seqnum
FROM activity
) a
ON u.id = a.user_service_id AND seqnum = 1
WHERE u.admin_id = 1;
The ? is for the column that specifies the "most recent", which your question doesn't clarify.
You did not specify the column by which you determine the most recent activity. I call it datetime_col in the solution below:
SELECT users_service.id
, name
FROM users_service usv
LEFT
JOIN activity act
on act.users_service.id = usv.user_service_id
and act.datetime_col = (select max(datetime_col)
from activity act_
WHERE act_.user_service_id= act.user_service_id)
Let's say we have a person table and survey table. survey is a set of attributes collected from a person at some point in time. Let's say survey has columns address and marriage_status
How do I select all persons whose address or marriage status has changed in the last survey?
Here's how I would write it if MySQL were able to magically interpret my intention:
SELECT *
FROM person
JOIN
(SELECT *
FROM survey
GROUP BY survey.person_id
ORDER BY survey.timestamp DESC
LIMIT 2 EACH) -- of course this part doesn't actually work. Trying to get last 2 records per person
surveys
ON surveys.person_id = person.id
WHERE surveys[0].address != surveys[1].address
OR surveys[0].marriage_status != surveys[1].marriage_status;
OR
SELECT *
FROM person
JOIN
(SELECT MOST RECENT survey FOR EACH person) latest_survey
ON latest_survey.person_id = person.id
JOIN
(SELECT SECOND MOST RECENT survey FOR EACH person) previous_survey
ON previous_survey.person_id = person.id
WHERE latest_survey.address != previous_survey.address
OR latest_survey.marriage_status != previous_survey.marriage_status;
This seems like a relatively straightforward query, but it's driving me crazy. I suspect I have tunnel vision and I'm not approaching this the right way.
EDIT: I am on MySQL v5. Based on the first couple answers, it seems like this might be the time to migrate to v8 (among other reasons)
So here's how I ended up doing it. It's a little long, but I think it's pretty straightforward? This felt amazing to get working.
(Note that underscores are used as prefixes in table aliases to help keep track of subquery depth)
SELECT person.*
FROM person
JOIN (
-- Join full survey data against each 'most recent' survey timestamp
SELECT s1.*
FROM survey s1
JOIN (
-- get most recent timestamp for each person
SELECT _s1.person_id, MAX(_s1.timestamp) timestamp
FROM survey _s1
GROUP BY person_id
) latest_surveys
ON latest_surveys.person_id = s1.person_id and latest_surveys.timestamp = s1.timestamp
) latest
ON latest.person_id = person.id
JOIN (
-- Join full survey data against each 'SECOND most recent' survey timestamp
select s2.*
from survey s2
JOIN (
-- to get SECOND most recent survey timestamp, do similar query, but exclude latest timestamp
SELECT _s2.person_id, MAX(_s2.timestamp) timestamp
FROM survey _s2
JOIN (
-- get most recent timestamp for each person (again)
SELECT __s2.person_id, MAX(__s2.timestamp) timestamp
FROM survey __s2
GROUP BY person_id
) _latest_surveys
-- Note the *NOT* equal here
ON _latest_surveys.person_id = _s2.person_id and _latest_surveys.timestamp != _s2.timestamp
GROUP BY _s2.person_id
) previous_surveys
ON previous_surveys.person_id = s2.person_id and previous_surveys.timestamp = s2.timestamp
) previous
ON previous.person_id = person.id
WHERE latest.address != previous.address
OR latest.marriage_status != previous.marriage_status;
Analytic functions make your question much more tractable. If you are not yet using MySQL 8+, then now would be a good time to upgrade. Assuming you are using MySQL 8+, we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY s.timestamp DESC) rn
FROM person p
INNER JOIN survey s ON p.id = s.person_id
)
SELECT id
FROM cte
GROUP BY id
HAVING
MAX(CASE WHEN rn = 1 THEN address END) <> MAX(CASE WHEN rn = 2 THEN address END) OR
MAX(CASE WHEN rn = 1 THEN marriage_status END) <> MAX(CASE WHEN rn = 2 THEN marriage_status END);
The above query uses a pivot trick to isolate the latest, and second latest, addresses and marriage statuses for each person. It retains person id values for those whose latest and second latest addresses or marriage statuses are not identical.
This might be how you can achieve that:
SELECT *
FROM person
JOIN (
SELECT *,
MAX(survey_date) latest_survey,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(survey_date ORDER BY person_id, survey_date ASC),',',-2),',',1) previous_survey,
SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-1) curadd,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-2),',',1) prevadd,
SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-1) curms,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-2),',',1) prevms
FROM survey GROUP BY person_id
HAVING curadd != prevadd OR curms != prevms) A
ON person.id=A.person_id;
Using GROUP_CONCAT and SUBSTRING_INDEX to combine the data value then separate it again and using those to compare at the end. I know there are a bunch of ways to achieve without all these, like your second example is something that I think can be done but when I think about it, it's going to be a very long query. This query however, since you're not using MySQL 8+ is much shorter but the performance of this query is a concern especially on a large table.
It is not given, but I hope you have at least MySQL 8 or similar to have ability to use Common Table Expression. It can simplify the complex query.
The trick part is getting survey records #1 and #2 for each user. I will do it this way: see cte1 and cte2 definition
WITH
cte1 AS (
SELECT MAX(x1.id) AS id, x1.person_id
FROM survey x1
GROUP BY x1.person_id),
cte2 AS (
SELECT MAX(x2.id) AS id, x2.person_id
FROM survey x2
JOIN cte1 ON cte1.person_id = x2.person_id
AND cte1.id > x2.id
GROUP BY x2.person_id)
SELECT
p.*,
s1.address, s2.address address2,
s1.marriage_status, s2.marriage_status marriage_status2
FROM person AS p
JOIN (
cte1 JOIN survey s1 ON s1.id = cte1.id
) ON cte1.person_id = p.id
JOIN (
cte2 JOIN survey s2 ON s2.id = cte2.id
) ON cte2.person_id = p.id
WHERE
(s1.address <> s2.address)
OR (s1.marriage_status <> s2.marriage_status)
https://www.db-fiddle.com/f/hLwdHiZin4MkdUZ4aBz67H/2
Update: Thanks to Ian, I replaced MIN to MAX to get recent records
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
this one is driving me to drink so I would love some help.
I've got a table with:
act_Address, act_OrderID, act_Date
I'm trying to get the first act_Date for each address we shipped to.
Here's what I've tried but it's been running now for well over an hour so I'm thinking this isn't going to work...
SELECT c.act_Address,
(SELECT o.act_OrderID
FROM tbl_Activity o
WHERE c.act_Address = o.act_Address
ORDER BY o.act_Date
LIMIT 1) AS order_id,
(SELECT d.act_Date
FROM tbl_Activity d
WHERE c.act_Address = d.act_Address
ORDER BY d.act_Date
LIMIT 1) as order_date
FROM tbl_Activity c
I've got to be doing something very wrong, doesn't seem like getting the first date for an address would be that hard, but I'm not that smart.
Your query uses two correlated subqueries to get act_Date and act_OrderID values. Each subquery is executed once for every record of tbl_Activity.
You can use:
SELECT act_Address, MIN(act_Date) AS fist_Date
FROM tbl_Activity
GROUP BY act_Address
to get the first date per address. Then you can use the above query as a derived table and join back to the original table to get the rest of the fields:
SELECT t1.act_Address, t1.act_OrderID, t1.act_date
FROM tbl_Activity AS t1
JOIN (
SELECT act_Address, MIN(act_Date) AS fist_Date
FROM tbl_Activity
GROUP BY act_Address
) AS t2 ON t1.act_Address = t2.act_Address AND t1.act_Date = t2.first_Date
I also propose placing a composite index on (act_Address, act_Date).
You can do this by GROUP BY in a subselect:
SELECT a.act_Address, a.act_OrderID, a.act_Date
FROM (
SELECT a2.act_Address addr, MIN(a2.act_Date) mindate FROM tbl_Activity a2
GROUP BY a2.act_Address
) g, tbl_Activity a
WHERE a.act_Address = g.addr AND a.act_Date = g.mindate;
I use this mysql query to get rank by point. I need to get previous and next item by rank.
For example: item's rank is 99. At item page, I want to show 100th, 101th, 98th and 97th items.
http://erincfirtina.com/apps/urdemo/track.php?tid=10
i need to do related tracks list
Here is my mysql query which get rank:
SELECT
uo.*,
( SELECT COUNT(*) FROM tracks ui WHERE (ui.point, ui.id) >= (uo.point, uo.id) ) AS rank
FROM tracks uo WHERE id = 10
You never asked a question.
One thing I am observing is that you are using a table (uo) inside the subselect that isn't part of the subselect.
Maybe you are looking for:
SELECT uo.*, COUNT(*) AS rank
FROM tracks ui, tracks uo
WHERE (ui.point, ui.id) >= (uo.point, uo.id)
AND uo.id = 10;
Hard for me to test its accuracy with no idea of what your table looks like, or what your question actually is.
This query should work, although it will scale very badly.
SELECT *
FROM (
SELECT
uo.*,
( SELECT COUNT(*) FROM tracks ui WHERE (ui.point, ui.id) >= (uo.point, uo.id) ) AS rank
FROM tracks uo WHERE id = 10
) t
ORDER BY t.rank DESC