MySQL - Run distinct command on certain columns [duplicate] - mysql

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 2 years ago.
I'm trying to run a distinct on four columns in the query below:
select
full_records.id,
full_records.domain_id,
subdomains.name as subdomain_name,
types.name as type_name,
changelog.content as content,
changelog.changed_on
from full_records
inner join subdomains on full_records.subdomain_id = subdomains.id
inner join types on full_records.type_id = types.id
inner join changelog on full_records.id = changelog.full_record_id
where
full_records.domain_id = 2
order by changelog.changed_on desc
and this returns the following:
I'm not sure how to go about altering the query so that it only returns the records that are unique across these four fields.
full_records.domain_id,
subdomains.name as subdomain_name,
types.name as type_name,
changelog.content as content
So if they were unique across those four fields, the rows 2, 3, 4 and 7 would not be in the results. It's basically to identify the latest change for a domain record. Any help would be really appreciated. Thanks.

One pretty simple method is row_number():
with cte as (
select fr.id, fr.domain_id, sd.name as subdomain_name,
t.name as type_name, cl.content, cl.changed_on
from full_records fr join
subdomains sd
on fr.subdomain_id = sd.id join
types t
on fr.type_id = t.id join
changelog cl
on fr.id = cl.full_record_id
where fr.domain_id = 2
)
select cte.*
from (select cte.*,
row_number() over (partition by domain_id, subdomain_name, type_name, content
order by changed_on desc
) as seqnum
from cte
) cte
where seqnum = 1;
Note that I added table aliases so the query is easier to write and to read.

Related

SQL: Select records based on comparison of two most recent associated records

Let's say we have a person table and survey table. survey is a set of attributes collected from a person at some point in time. Let's say survey has columns address and marriage_status
How do I select all persons whose address or marriage status has changed in the last survey?
Here's how I would write it if MySQL were able to magically interpret my intention:
SELECT *
FROM person
JOIN
(SELECT *
FROM survey
GROUP BY survey.person_id
ORDER BY survey.timestamp DESC
LIMIT 2 EACH) -- of course this part doesn't actually work. Trying to get last 2 records per person
surveys
ON surveys.person_id = person.id
WHERE surveys[0].address != surveys[1].address
OR surveys[0].marriage_status != surveys[1].marriage_status;
OR
SELECT *
FROM person
JOIN
(SELECT MOST RECENT survey FOR EACH person) latest_survey
ON latest_survey.person_id = person.id
JOIN
(SELECT SECOND MOST RECENT survey FOR EACH person) previous_survey
ON previous_survey.person_id = person.id
WHERE latest_survey.address != previous_survey.address
OR latest_survey.marriage_status != previous_survey.marriage_status;
This seems like a relatively straightforward query, but it's driving me crazy. I suspect I have tunnel vision and I'm not approaching this the right way.
EDIT: I am on MySQL v5. Based on the first couple answers, it seems like this might be the time to migrate to v8 (among other reasons)
So here's how I ended up doing it. It's a little long, but I think it's pretty straightforward? This felt amazing to get working.
(Note that underscores are used as prefixes in table aliases to help keep track of subquery depth)
SELECT person.*
FROM person
JOIN (
-- Join full survey data against each 'most recent' survey timestamp
SELECT s1.*
FROM survey s1
JOIN (
-- get most recent timestamp for each person
SELECT _s1.person_id, MAX(_s1.timestamp) timestamp
FROM survey _s1
GROUP BY person_id
) latest_surveys
ON latest_surveys.person_id = s1.person_id and latest_surveys.timestamp = s1.timestamp
) latest
ON latest.person_id = person.id
JOIN (
-- Join full survey data against each 'SECOND most recent' survey timestamp
select s2.*
from survey s2
JOIN (
-- to get SECOND most recent survey timestamp, do similar query, but exclude latest timestamp
SELECT _s2.person_id, MAX(_s2.timestamp) timestamp
FROM survey _s2
JOIN (
-- get most recent timestamp for each person (again)
SELECT __s2.person_id, MAX(__s2.timestamp) timestamp
FROM survey __s2
GROUP BY person_id
) _latest_surveys
-- Note the *NOT* equal here
ON _latest_surveys.person_id = _s2.person_id and _latest_surveys.timestamp != _s2.timestamp
GROUP BY _s2.person_id
) previous_surveys
ON previous_surveys.person_id = s2.person_id and previous_surveys.timestamp = s2.timestamp
) previous
ON previous.person_id = person.id
WHERE latest.address != previous.address
OR latest.marriage_status != previous.marriage_status;
Analytic functions make your question much more tractable. If you are not yet using MySQL 8+, then now would be a good time to upgrade. Assuming you are using MySQL 8+, we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY s.timestamp DESC) rn
FROM person p
INNER JOIN survey s ON p.id = s.person_id
)
SELECT id
FROM cte
GROUP BY id
HAVING
MAX(CASE WHEN rn = 1 THEN address END) <> MAX(CASE WHEN rn = 2 THEN address END) OR
MAX(CASE WHEN rn = 1 THEN marriage_status END) <> MAX(CASE WHEN rn = 2 THEN marriage_status END);
The above query uses a pivot trick to isolate the latest, and second latest, addresses and marriage statuses for each person. It retains person id values for those whose latest and second latest addresses or marriage statuses are not identical.
This might be how you can achieve that:
SELECT *
FROM person
JOIN (
SELECT *,
MAX(survey_date) latest_survey,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(survey_date ORDER BY person_id, survey_date ASC),',',-2),',',1) previous_survey,
SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-1) curadd,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-2),',',1) prevadd,
SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-1) curms,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-2),',',1) prevms
FROM survey GROUP BY person_id
HAVING curadd != prevadd OR curms != prevms) A
ON person.id=A.person_id;
Using GROUP_CONCAT and SUBSTRING_INDEX to combine the data value then separate it again and using those to compare at the end. I know there are a bunch of ways to achieve without all these, like your second example is something that I think can be done but when I think about it, it's going to be a very long query. This query however, since you're not using MySQL 8+ is much shorter but the performance of this query is a concern especially on a large table.
It is not given, but I hope you have at least MySQL 8 or similar to have ability to use Common Table Expression. It can simplify the complex query.
The trick part is getting survey records #1 and #2 for each user. I will do it this way: see cte1 and cte2 definition
WITH
cte1 AS (
SELECT MAX(x1.id) AS id, x1.person_id
FROM survey x1
GROUP BY x1.person_id),
cte2 AS (
SELECT MAX(x2.id) AS id, x2.person_id
FROM survey x2
JOIN cte1 ON cte1.person_id = x2.person_id
AND cte1.id > x2.id
GROUP BY x2.person_id)
SELECT
p.*,
s1.address, s2.address address2,
s1.marriage_status, s2.marriage_status marriage_status2
FROM person AS p
JOIN (
cte1 JOIN survey s1 ON s1.id = cte1.id
) ON cte1.person_id = p.id
JOIN (
cte2 JOIN survey s2 ON s2.id = cte2.id
) ON cte2.person_id = p.id
WHERE
(s1.address <> s2.address)
OR (s1.marriage_status <> s2.marriage_status)
https://www.db-fiddle.com/f/hLwdHiZin4MkdUZ4aBz67H/2
Update: Thanks to Ian, I replaced MIN to MAX to get recent records

Sorting and grouping a join result? [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
I have the following query
SELECT DISTINCT XCS_TASK.WORKFLOW_ID,
XCS_TASK.COMPLETED_BY,
XCS_WORKFLOW.OBJECT_KEY,
XCS_WORKFLOW.OBJECT_TYPE_ID,
XCS_WORKFLOW.END_DATE_TIME,
XCS_WORKFLOW.START_DATE_TIME
FROM `XCS_TASK`
inner JOIN XCS_WORKFLOW ON
XCS_TASK.WORKFLOW_ID = XCS_WORKFLOW.WORKFLOW_ID
WHERE TASK_TYPE_ID = 124
GROUP BY XCS_WORKFLOW.OBJECT_KEY
ORDER BY XCS_WORKFLOW.START_DATE_TIME DESC
The problem is that I want to get the latest record for that OBJECT_KEY. I know above query is wrong because it groups by and then sorts the result of it. I looked in using the MAX(DATE) function but I couldn't get it to work in this scenario. Any help or pointers would be appreciated.
You could try joining the aggregated result for OBJECT_KEY and max date (eg: start_date_time)
SELECT
XCS_TASK.WORKFLOW_ID,
XCS_TASK.COMPLETED_BY,
XCS_WORKFLOW.OBJECT_KEY,
XCS_WORKFLOW.OBJECT_TYPE_ID,
XCS_WORKFLOW.END_DATE_TIME,
XCS_WORKFLOW.START_DATE_TIME
FROM `XCS_TASK`
INNER JOIN XCS_WORKFLOW ON XCS_TASK.WORKFLOW_ID = XCS_WORKFLOW.WORKFLOW_ID
INNER JOIN (
SELECT
XCS_WORKFLOW.OBJECT_KEY,
MAX( XCS_WORKFLOW.START_DATE_TIME ) max_date
FROM XCS_WORKFLOW
GROUP BY OBJECT_KEY
) t ON t.OBJECT_KEY = XCS_WORKFLOW.OBJECT_KEY
AND XCS_WORKFLOW.OBJECT_KEY = t.max_date
WHERE TASK_TYPE_ID = 124

Getting First Order Date for Each Customer [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 4 years ago.
this one is driving me to drink so I would love some help.
I've got a table with:
act_Address, act_OrderID, act_Date
I'm trying to get the first act_Date for each address we shipped to.
Here's what I've tried but it's been running now for well over an hour so I'm thinking this isn't going to work...
SELECT c.act_Address,
(SELECT o.act_OrderID
FROM tbl_Activity o
WHERE c.act_Address = o.act_Address
ORDER BY o.act_Date
LIMIT 1) AS order_id,
(SELECT d.act_Date
FROM tbl_Activity d
WHERE c.act_Address = d.act_Address
ORDER BY d.act_Date
LIMIT 1) as order_date
FROM tbl_Activity c
I've got to be doing something very wrong, doesn't seem like getting the first date for an address would be that hard, but I'm not that smart.
Your query uses two correlated subqueries to get act_Date and act_OrderID values. Each subquery is executed once for every record of tbl_Activity.
You can use:
SELECT act_Address, MIN(act_Date) AS fist_Date
FROM tbl_Activity
GROUP BY act_Address
to get the first date per address. Then you can use the above query as a derived table and join back to the original table to get the rest of the fields:
SELECT t1.act_Address, t1.act_OrderID, t1.act_date
FROM tbl_Activity AS t1
JOIN (
SELECT act_Address, MIN(act_Date) AS fist_Date
FROM tbl_Activity
GROUP BY act_Address
) AS t2 ON t1.act_Address = t2.act_Address AND t1.act_Date = t2.first_Date
I also propose placing a composite index on (act_Address, act_Date).
You can do this by GROUP BY in a subselect:
SELECT a.act_Address, a.act_OrderID, a.act_Date
FROM (
SELECT a2.act_Address addr, MIN(a2.act_Date) mindate FROM tbl_Activity a2
GROUP BY a2.act_Address
) g, tbl_Activity a
WHERE a.act_Address = g.addr AND a.act_Date = g.mindate;

Mysql query with group by and order by involving several tables

I'm having a problem regarding a query because i don't have all the records and i don't know why
This is the query
SELECT `ebspma_paad_ebspma`.`semana_dias`.`dia`,`ebspma_paad_ebspma`.`req_material_sala`.`sala`, `ebspma_paad_ebspma`.`req_material_tempo`.`inicio`, `ebspma_paad_ebspma`.`sala_ocupacao`.`id_ocup`, `ebspma_paad_ebspma`.`turmas`.`turma`
FROM `ebspma_paad_ebspma`.`sala_ocupacao`
INNER JOIN `ebspma_paad_ebspma`.`semana_dias`
ON (`sala_ocupacao`.`id_dia` = `semana_dias`.`id_dia`)
INNER JOIN `ebspma_paad_ebspma`.`req_material_sala`
ON (`sala_ocupacao`.`id_sala` = `req_material_sala`.`idsala`)
LEFT JOIN `ebspma_paad_ebspma`.`req_material_tempo`
ON (`sala_ocupacao`.`id_tempo` = `req_material_tempo`.`idtempo`)
LEFT JOIN `ebspma_paad_ebspma`.`turmas`
ON (`sala_ocupacao`.`id_turma` = `turmas`.`id_turma`)
where`ebspma_paad_ebspma`.`sala_ocupacao`.`id_turma` = '$turma'
GROUP BY `ebspma_paad_ebspma`.`sala_ocupacao`.`id_dia` , `ebspma_paad_ebspma`.`req_material_tempo`.`inicio` ASC";
Running this query i have almost records but this is a school timetable and when a class is divided in 2 groups i have two classrooms for this class. With this query i have only one group
For exemple the class start at 1 PM in two classrooms (27 and 31), with this query i should have at 1 PM the classroom X is on 27 and 31 classroom, but i have only the first one
Image to check http://postimg.org/image/u24r35fkz/
And my database image http://postimg.org/image/hyvpb1qz1/ce7a7320/
So what's wrong with my query?
Thanks
UPDATE 1
I have simplified my query to
SELECT t2.`dia` , t3.`sala` , t4.`inicio` , t1.`id_ocup` , t5.`turma`
FROM `ebspma_paad_ebspma`.`sala_ocupacao` AS t1
INNER JOIN `ebspma_paad_ebspma`.`semana_dias` AS t2 ON ( t1.`id_dia` = t2.`id_dia` )
INNER JOIN `ebspma_paad_ebspma`.`req_material_sala` AS t3 ON ( t1.`id_sala` = t3.`idsala` )
LEFT JOIN `ebspma_paad_ebspma`.`req_material_tempo` AS t4 ON ( t1.`id_tempo` = t4.`idtempo` )
LEFT JOIN `ebspma_paad_ebspma`.`turmas` AS t5 ON ( t1.`id_turma` = t5.`id_turma` )
WHERE t1.`id_turma` =12
GROUP BY t1.`id_dia` , t3.`idsala` , t4.`inicio`
Now i can see all the classes but not in the right order, the order should be given by t4.inicio and by day (id dia)
You are not grouping by sala so MySQL will behave badly and give you a random row that fits the other requirements. Better functioning database engines would give you an error saying you haven't aggregated or grouped all result columns.
If you add sala to GROUP BY you should see the difference.
For the ordering: you're not asking the database to ORDER BY anything so the rows will be in whatever order they happen to come out. Probably want to add ORDER BY t4.inicio, t1.id_dia to handle that.

MySQL select distinct records [duplicate]

This question already has answers here:
MySQL query, MAX() + GROUP BY
(7 answers)
Closed 9 years ago.
I'm trying to write a query that will pull out the "best" record from a list of values:
SELECT s.swimmerName, r.resultTimeText, r.resultAgeGroup, r.resultEventID, v.venueName
FROM tblResults r
JOIN tblEvents e ON e.eventID = r.resultEventID
JOIN tblSwimmers s ON r.resultSwimmerID = s.swimmerID
JOIN tblVenues v ON e.resultVenueID = v.venueID
WHERE s.swimmerGender = %s
AND r.resultStroke = %s
GROUP BY s.swimmerName
This selects all of my records but people are listed twice with different times (a consequence of the DISTINCT I know). What would be the best way to select the best time for each person?
You can use mysql's quirky grouping to help you:
SELECT * FROM (
SELECT s.swimmerName, r.resultTimeText, r.resultAgeGroup, r.resultEventID, v.venueName
FROM tblResults r
JOIN tblEvents e ON e.eventID = r.resultEventID
JOIN tblSwimmers s ON r.resultSwimmerID = s.swimmerID
JOIN tblVenues v ON e.resultVenueID = v.venueID
WHERE s.swimmerGender = %s
AND r.resultStroke = %s
ORDER BY 2) x
GROUP BY swimmerName
ORDER BY 2
This works by first ordering the data fastest to slowest, then grouping by name. In myswl (only) grouping by not all non-aggregated columns has the effect of filtering only the first row encountered for each unique combination of grouped by colums.