Finding unique users from linked values - mysql

I have values in my Table of this form.
id | val1 | val2
--------------------
1 | e1 | m1
2 | e1 | m2
3 | e2 | m2
4 | e3 | m1
5 | e4 | m3
6 | e5 | m3
7 | e5 | m4
8 | e4 | m5
From this, I have to recover unique users like this and give them a unique id to identify.
User1 -> (val1 : e1, e2, e3 | val2: m1, m2)
e1 <-> m1, e1 <-> m2, m1 <-> e3, e2 <-> m2 ( <-> means linked).
e1 is connected to m1.
e1 is connected to m2.
m2 is connected to e2.
So e1,m1 are connected to e2.
Similarly, we find e1, e2, e3, m1, m2 all are linked. We need to identify these chains.
User2 -> (val1 : e4, e5 | val2: m3, m4, m5)
I have written two queries based on grouping my val1 and then by val2 separately and joining them in code (Java).
I want this to do this directly in MySQL/BigQuery query itself as we are building some reports on this.
Is this possible in a single query? Please help.
Thank you.
Update :
Desired output -
[
{
id : user1,
val1 : [e1, e2, e3],
val2 : [m1, m2]
},
{
id : user2,
val1 : [e4, e5],
val2 : [m3, m4, m5]
}
]
or
id | val1 | val2 | UUID
------------------------
1 | e1 | m1 | u1
2 | e1 | m2 | u1
3 | e2 | m2 | u1
4 | e3 | m1 | u1
5 | e4 | m3 | u2
6 | e5 | m3 | u2
7 | e5 | m4 | u2
8 | e4 | m5 | u2
To make it simple, assuming values of val1 and val2 are nodes and are connected if present in the same row.
The rows of the table form graphs (user1, user2) and we need to identify these graphs.

Wanted to jump-in with option of solving your task with pure BigQuery (Standard SQL)
Pre-requisites / assumptions: source data is in sandbox.temp.id1_id2_pairs
You should replace this with your own or if you want to test with dummy data from your question - you can create this table as below (of course replace sandbox.temp with your own project.dataset)
Make sure you set respective destination table
Note: you can find all respective Queries (as text) at the bottom of this answer, but for now I am illustrating my answer with screenshots - so all is presented - query, result and used options
So, there will be three steps:
Step 1 - Initialization
Here, we just do initial grouping of id1 based on connections with id2:
As you can see here - we created list of all id1 values with respective connections based on simple one-level connection through id2
Output table is sandbox.temp.groups
Step 2 - Grouping Iterations
In each iteration we will enrich grouping based on already established groups.
Source of Query is output table of previous Step (sandbox.temp.groups) and Destination is the same table (sandbox.temp.groups) with Overwrite
We will continue iterations till when count of found groups will be the same as in previous iteration
Note: you can just have two BigQuery Web UI Tabs opened (as it is shown above) and without changing any code just run Grouping and then Check again and again till iteration converge
(for specific data that I used in pre-requisites section - I had three iterations - first iteration produced 5 users, second iteration produced 3 users and third iteration produced again 3 users - which indicated that we done with iterations.
Of course, in real life case - number of iterations could be more than just three - so we need some sort of automation (see respective section at the bottom of answer).
Step 3 – Final Grouping
When id1 grouping is completed - we can add final grouping for id2
Final result now is in sandbox.temp.users table
Used Queries (do not forget to set respective destination tables and overwrites when needed as per above described logic and screenshots):
Pre-requisites:
#standardSQL
SELECT 1 id, 'e1' id1, 'm1' id2 UNION ALL
SELECT 2, 'e1', 'm2' UNION ALL
SELECT 3, 'e2', 'm2' UNION ALL
SELECT 4, 'e3', 'm1' UNION ALL
SELECT 5, 'e4', 'm3' UNION ALL
SELECT 6, 'e5', 'm3' UNION ALL
SELECT 7, 'e5', 'm4' UNION ALL
SELECT 8, 'e4', 'm5' UNION ALL
SELECT 9, 'e6', 'm6' UNION ALL
SELECT 9, 'e7', 'm7' UNION ALL
SELECT 9, 'e2', 'm6' UNION ALL
SELECT 888, 'e4', 'm55'
Step 1
#standardSQL
WITH `yourTable` AS (select * from `sandbox.temp.id1_id2_pairs`
), x1 AS (SELECT id1, STRING_AGG(id2) id2s FROM `yourTable` GROUP BY id1
), x2 AS (SELECT id2, STRING_AGG(id1) id1s FROM `yourTable` GROUP BY id2
), x3 AS (
SELECT id, (SELECT STRING_AGG(i ORDER BY i) FROM (
SELECT DISTINCT i FROM UNNEST(SPLIT(id1s)) i)) grp
FROM (
SELECT x1.id1 id, STRING_AGG((id1s)) id1s FROM x1 CROSS JOIN x2
WHERE EXISTS (SELECT y FROM UNNEST(SPLIT(id1s)) y WHERE x1.id1 = y)
GROUP BY id1)
)
SELECT * FROM x3
Step 2 - Grouping
#standardSQL
WITH x3 AS (select * from `sandbox.temp.groups`)
SELECT id, (SELECT STRING_AGG(i ORDER BY i) FROM (
SELECT DISTINCT i FROM UNNEST(SPLIT(grp)) i)) grp
FROM (
SELECT a.id, STRING_AGG(b.grp) grp FROM x3 a CROSS JOIN x3 b
WHERE EXISTS (SELECT y FROM UNNEST(SPLIT(b.grp)) y WHERE a.id = y)
GROUP BY a.id )
Step 2 - Check
#standardSQL
SELECT COUNT(DISTINCT grp) users FROM `sandbox.temp.groups`
Step 3
#standardSQL
WITH `yourTable` AS (select * from `sandbox.temp.id1_id2_pairs`
), x1 AS (SELECT id1, STRING_AGG(id2) id2s FROM `yourTable` GROUP BY id1
), x3 as (select * from `sandbox.temp.groups`
), f AS (SELECT DISTINCT grp FROM x3 ORDER BY grp
)
SELECT ROW_NUMBER() OVER() id, grp id1,
(SELECT STRING_AGG(i ORDER BY i) FROM (SELECT DISTINCT i FROM UNNEST(SPLIT(id2)) i)) id2
FROM (
SELECT grp, STRING_AGG(id2s) id2 FROM f
CROSS JOIN x1 WHERE EXISTS (SELECT y FROM UNNEST(SPLIT(f.grp)) y WHERE id1 = y)
GROUP BY grp)
Automation:
Of course, above "process" can be executed manually in case if iterations converge fast - so you will end up with 10-20 runs. But in more real-life cases you can easily automate this with any client of your choice

Related

Group overlapping ranges of data in MySQL

Is there an easy way avoiding the usage of cursors to convert this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 3 |
+-------+------+-------+
| X | 2 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 12 |
+-------+------+-------+
| Y | 12 | 13 |
+-------+------+-------+
Into this:
+-------+------+-------+
| Group | From | Until |
+-------+------+-------+
| X | 1 | 4 |
+-------+------+-------+
| Y | 5 | 7 |
+-------+------+-------+
| X | 8 | 10 |
+-------+------+-------+
| Y | 11 | 13 |
+-------+------+-------+
So far I've tried to assign an ID to each row and GROUP BY that ID, but I can't get any closer without using cursors.
SELECT `Group`, `From`, `Until`
FROM ( SELECT `Group`, `From`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`From` > t2.`From`
AND t1.`From` <= t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t3
JOIN ( SELECT `Group`, `Until`, ROW_NUMBER() OVER (PARTITION BY `Group` ORDER BY `From`) rn
FROM test t1
WHERE NOT EXISTS ( SELECT NULL
FROM test t2
WHERE t1.`Until` >= t2.`From`
AND t1.`Until` < t2.`Until`
AND t1.`Group` = t2.`Group` ) ) t4 USING (`Group`, rn)
fiddle
Must work at any overlapping type (partially overlapped, adjacent, fully included).
Will not work if From and/or Until is NULL.
Could you add an explanation in English? – ysth
1st subquery searches joined ranges starts (see the fiddle - it is executed separately) - it searches for From value in a group which is not in the middle/end of any other range (start point equiality allowed).
2nd subquery do the same for joined ranges Until.
Both additionally enumerates found values ascending.
Outer query simply joins each range start and its finish into one row.
If you are using MYSQL version 8+ then you can use row_number to get the desired result:
Demo
SELECT MIN(`FROM`) START,
MAX(`UNTIL`) END,
`GROUP` FROM (
SELECT A.*,
ROW_NUMBER() OVER(ORDER BY `FROM`) RN_FROM,
ROW_NUMBER() OVER(PARTITION BY `GROUP` ORDER BY `UNTIL`) RN_UNTIL
FROM Table_lag A) X
GROUP BY `GROUP`, (RN_FROM - RN_UNTIL)
ORDER BY START;
You can do this with window functions only, using some gaps-and-island technique.
The idea is to build group of consecutive record having the same group and overlapping ranges, using lag() and a window sum(). You can then aggregate the groups:
select grp, min(c_from) c_from, max(c_until) c_until
from (
select
t.*,
sum(lag_c_until < c_from) over(partition by grp order by c_from) mygrp
from (
select
t.*,
lag(c_until, 1, c_until) over(partition by grp order by c_from) lag_c_until
from mytable t
) t
) t
group by grp, mygrp
The column names you chose conflict with SQL keywords (group, from), so I renamed them to grp, c_from and c_until.
Demo on DB Fiddle - with credits to ysth for creating the fiddle in the first place:
grp | c_from | c_until
:-- | -----: | ------:
X | 1 | 4
Y | 5 | 7
X | 8 | 10
Y | 11 | 13
I would use a recursive CTE for this:
with recursive intervals (`Group`, `From`, `Until`) as (
select distinct t1.Group, t1.From, t1.Until
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.Group=t2.Group
and t1.From between t2.From and t2.Until+1
and (t1.From,t1.Until) <> (t2.From,t2.Until)
)
union all
select t1.Group, t1.From, t2.Until
from intervals t1
join Table_lag t2
on t2.Group=t1.Group
and t2.From between t1.From and t1.Until+1
and t2.Until > t1.Until
)
select `Group`, `From`, max(`Until`) as Until
from intervals
group by `Group`, `From`
order by `From`, `Group`;
The anchor expression (select .. where not exists (...)) finds all the group & from that won't combine with some earlier from (so has one row for each row in our eventual output):
Then the recursive query adds rows for merged intervals for each of our rows.
Then just group by group and from (those are awful column names) to get the biggest
interval for each starting group/from.
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9efa508504b80e44b73c952572394b76
Alternatively, you can do it with a straightforward set of joins and subqueries, with no CTE or window functions needed:
select
interval_start_range.grp,
interval_start_range.start,
max(merged.finish) finish
from (
select
interval_start.grp,
interval_start.start,
min(later_interval_start.start) next_start
from (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) interval_start
left join (
select distinct t1.grp, t1.start, t1.finish
from Table_lag t1
where not exists (
select 1
from Table_lag t2
where t1.grp=t2.grp
and t1.start between t2.start and t2.finish+1
and (t1.start,t1.finish) <> (t2.start,t2.finish)
)
) later_interval_start
on interval_start.grp=later_interval_start.grp
and interval_start.start < later_interval_start.start
group by interval_start.grp, interval_start.start
) as interval_start_range
join Table_lag merged
on merged.grp=interval_start_range.grp
and merged.start >= interval_start_range.start
and (interval_start_range.next_start is null or merged.start < interval_start_range.next_start)
group by interval_start_range.grp, interval_start_range.start
order by interval_start_range.start, interval_start_range.grp
(I have renamed the columns here to not need backticks.)
Here there's a select to get all the starts of the reportable intervals we will report, joined to another similar select (you could use a CTE to avoid the redundancy) to find the following start of a reportable interval for the same group (if there is one). That's wrapped in a subquery to get the group, the start value, and the start value of the following reportable interval. Then it just needs to join all the other records that start within that range and pick the maximum ending value.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=151cc933489c299f7beefa99e1959549

How to show count of records

I have the following data
ReasonId Team Division Location
2 A L1
3 B D1 L2
2 A D2 L1
2 A D3 L3
I want to show the count grouped by the ReasonId for each team,division & location. There could be instances where division could be null.
I am trying something like this,
SELECT
COUNT(*) AS TotalRequests, Reason, team
FROM
reports
GROUP BY Reason , team
UNION SELECT
COUNT(*) AS TotalRequests, Reason, location
FROM
reports
GROUP BY Reason , location
UNION SELECT
COUNT(*) AS TotalRequests, Reason, division
FROM
reports
WHERE
ISNULL(division) = 0
GROUP BY Reason , division
;
The output I am getting for the above is,
TotalRequests Reason team
1 2
3 2 A
1 3 B
1 3 D1
1 2 D2
1 2 D3
2 2 L1
1 3 L2
1 2 L3
Is it possible to get an output that looks like this,
ReasonId Team TotalByTeam Location TotalByLocation Division TotalByDivision
2 A 3 L1 2 0
2 A 3 L3 1 D2 1
2 A 3 L3 1 D3 1
3 B 1 L2 1 D1 1
I am using mysql 8.0.17 Here's a sample schema and dbfiddle of same
CREATE TABLE `reports` (
`Reason` int(11) DEFAULT NULL,
`Team` text,
`Division` text,
`Location` text
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO reports (Reason,Team,Division,Location) values (2, 'A',null,'L1');
INSERT INTO reports (Reason,Team,Division,Location) values (3, 'A','D1','L2');
INSERT INTO reports (Reason,Team,Division,Location) values (2, 'A','D2','L1');
INSERT INTO reports (Reason,Team,Division,Location) values (2, 'A','D3','L3');
You should use analytic functions COUNT(...) OVER (...) for this. They are available in MySQL since version 8.0.
select
reasonid,
team,
count(team) over (partition by team) as total_by_team,
location,
count(location) over (partition by location) as total_by_location,
division,
count(division) over (partition by division) as total_by_division
from reports;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=79891554331e8222041ec34eea3fc4ee
Try this below script-
Demo Here
SELECT A.ReasonId,
A.Team,
(SELECT COUNT(*) FROM your_table B WHERE B.ReasonId = A.ReasonId AND B.Team = A.Team) TotalByTeam,
A.Division,
(SELECT COUNT(*) FROM your_table B WHERE B.ReasonId = A.ReasonId AND B.Division = A.Division) TotalByDivision,
A.Location,
(SELECT COUNT(*) FROM your_table B WHERE B.ReasonId = A.ReasonId AND B.Location = A.Location) TotalByLocation
FROM your_table A

Select duplicates while concatenating every one except the first

I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.
if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle
I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4

Using 'GROUP BY' while preferring rows associated in another table

I have a table tbl_entries with the following structure:
+----+------+------+------+
| id | col1 | col2 | col3 |
+----+------+------+------+
| 11 | a | b | c |
| 12 | d | e | a |
| 13 | a | b | c |
| 14 | X | e | 2 |
| 15 | a | b | c |
+----+------+------+------+
And another table tbl_reviewlist with the following structure:
+----+-------+------+------+------+
| id | entid | cola | colb | colc |
+----+-------+------+------+------+
| 1 | 12 | N | Y | Y |
| 2 | 13 | Y | N | Y |
| 3 | 14 | Y | N | N |
+----+-------+------+------+------+
Basically, tbl_reviewlist contains reviews about the entries in tbl_entries. However, for some known reason, the entries in tbl_entries are duplicated. I am extracting the unique records by the following query:
SELECT * FROM `tbl_entries` GROUP BY `col1`, `col2`, `col3`;
However, any one of the duplicate rows from tbl_entries will be returned no matter they have been reviewed or not. I want the query to prefer those rows which have been reviewed. How can I do that?
EDIT: I want to prefer rows which have been reviewed but if there are rows which have not been reviewed yet it should return those as well.
Thanks in advance!
Have you actually tried anything?
A hint: The SQL standard requires that every column in the result set of a query with a group by clause must be either
a grouping column
an aggregate function — sum(), count(), etc.,
a constant value/literal, or
an expression derived solely from the above.
Some broken implementations (and I believe MySQL is one of them) allow other columns to be included and offer their own...creative...behavior. If you think about it, group by essentially says to do the following:
Order this table by the grouping expressions
Partition it into subsets based on the group by sequence
Collapse each such partition into a single row computing the aggregate expressions as you go.
Once you've done that, what does it mean to ask for something that isn't uniform across the collapsed group partition?
If you have a table foo containing columns A, B, C, D and E and say something like
select A,B,C,D,E from foo group by A,B,C
per the standard, you should get a compile error. Deviant implementations [usually] treat this sort of query as the [rough] equivalent of
select *
from foo t
join ( select A,B,C
from foo
group by A,B,C
) x on x.A = t.A
and x.B = t.B
and x.C = t.C
But I wouldn't necessarily count on that without review the documentation for the specific implementation that your are using.
If you want to find just reviewed entries, then something like this:
select *
from tbl_entries t
where exists ( select *
from tbl_reviewlist x
where x.entid = t.id
)
will do you. If, however, you want to find reviewed entries that are duplicated on col1, col2 and col3 then something like this should do you:
select *
from tbl_entries t
join ( select col1,col2,col3
from tbl_entries x
group by col1,col2,col3
having count(*) > 1
) d on d.col1 = t.col1
and d.col2 = t.col2
and d.col3 = t.col3
where exists ( select *
from tbl_reviewlist x
where x.entid = t.id
)
Since your problem statement is rather unclear, another take might be something along these lines:
select t.col1 ,
t.col2 ,
t.col3 ,
t.duplicate_count ,
coalesce(x.review_count,0) as review_count
from ( select col1 ,
col2 ,
col3 ,
count(*) as duplicate_count
from tbl_entries
group by col1 ,
col2 ,
col3
) t
left join ( select cola, colb, colc , count(*) as review_count
from tbl_reviewList
group by cola, colb, colc
having count(*) > 1
) x on x.cola = t.col1
and x.colb = t.col2
and x.colc = t.col3
order by sign(coalesce(x.review_count,0)) desc ,
t.col1 ,
t.col2 ,
t.col3
This query
summarizes the entries table, developing a count of how many time seach col1/2/3 combination exists.
summarizes the review table, developing a count of reviews for each cola/b/c combination
joins them together matching cols a:1, b:2 c:3
orders them
preferring reviewed items to non-reviewed items by placing them first,
then by the col1/2/3 values.
I think there's a way with less repetition, but this should be a start:
select
tbl_entries.ID,
col1,
col2,
col3,
cola, -- ... you get the idea ...
from (
select coalesce(min(entid), min(tbl_entries.ID)) as favID
from tbl_entries left join tbl_reviewlist on entid = tbl_entries.ID
group by col1, col2, col3
) as A join tbl_entries on tbl_entries.ID = favID
left join tbl_reviewlist on entid = tbl_entries.ID
Basically you distill the desired output to a list of core ID's and then re-map back to the data...
SELECT e.col1, e.col2, e.col3,
COALESCE(MIN(r.entid), MIN(e.id)) AS id
FROM tbl_entries AS e
LEFT JOIN tbl_reviewlist AS r
ON r.entid = e.id
GROUP BY e.col1, e.col2, e.col3 ;
Tested at SQL-Fiddle

SQL statement for querying with multiple conditions including 3 most recent dates

I need help in finding the rows that correspond to the most recent date, the next most recent and the one after that, where some condition ABC is "Y" and group it by a column name XYZ ASC but XYZ can appear multiple times. So, say XYZ is 50, then for the rows in the three years, the XYZ will be 50. I have the following code that executes but returns only two rows out of thousands which is impossible. I tried executing just the date condition but it returned dates that were less than or equal to MAX(DATE)-3 as well. Don't know where I am going wrong.
select * from money.cash where DATE =(
select
MAX(DATE)
from
money.cash
where
DATE > (select MAX(DATE)-3 from money.cash)
)
GROUP BY XYZ ASC
having ABC = "Y";
The structure of the table is as follows (only a schematic, not the real thing).
Comp_ID DATE XYZ ABC $$$$ ....
1 2012-1-1 10 Y SOME-AMOUNT
2 2011-1-1 10 Y
3 2006-1-1 10 Y
4 2011-1-1 20 Y
5 2002-1-1 20 Y
6 2000-1-1 20 Y
7 1998-1-1 20 Y
The desired o/p would be the first three rows for XYZ=10 in ascending order and the most recent 3 dates for XYZ=20.
LAST AND IMPORTANT-This table's values keeps changing as new data comes in. So, the o/p(which will be in a new table) must reflect the dynamics in the 1st/original/above TABLE.
MySQL doesn't have functionallity that is friendly to greatest-n-per-group queries.
One option would be...
- Find the MAX(Date) per group (XYZ)
- Then use that result to find the MAX(Date) of all records before that date
- Then do it again for all records before that date
It's really innefficient, but MySQL hasn't got the functionality required to do this efficiently. Sorry...
CREATE TABLE yourTable
(
comp_id INT,
myDate DATE,
xyz INT,
abc VARCHAR(1)
)
;
INSERT INTO yourTable SELECT 1, '2012-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 2, '2011-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 3, '2006-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 4, '2011-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 5, '2002-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 6, '2000-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 7, '1998-01-01', 20, 'Y';
SELECT
yourTable.*
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
yourTable.XYZ,
MAX(yourTable.myDate) AS MaxDate
FROM
yourTable
WHERE
yourTable.ABC = 'Y'
GROUP BY
yourTable.XYZ
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
INNER JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate >= lookup.MaxDate
WHERE
yourTable.ABC = 'Y'
ORDER BY
yourTable.comp_id
;
DROP TABLE yourTable;
There are other options, but they're all a bit hacky. Search SO for greatest-n-per-group mysql.
My results using your example data:
Comp_ID | DATE | XYZ | ABC
------------------------------
1 | 2012-1-1 | 10 | Y
2 | 2011-1-1 | 10 | Y
3 | 2006-1-1 | 10 | Y
4 | 2011-1-1 | 20 | Y
5 | 2002-1-1 | 20 | Y
6 | 2000-1-1 | 20 | Y
Here's another way, hopefully more efficient than Dems' answer.
Test it with an index on (abc, xyz, date):
SELECT m.xyz, m.date --- for all columns: SELECT m.*
FROM
( SELECT DISTINCT xyz
FROM money.cash
WHERE abc = 'Y'
) AS dm
JOIN
money.cash AS m
ON m.abc = 'Y'
AND m.xyz = dm.xyz
AND m.date >= COALESCE(
( SELECT im.date
FROM money.cash AS im
WHERE im.abc = 'Y'
AND im.xyz = dm.xyz
ORDER BY im.date DESC
LIMIT 1
OFFSET 2 --- to get 3 latest rows per xyz
), DATE('1000-01-01') ) ;
If you have more than rows with same (abc, xyz, date), the query may return more than 3 rows per xyz (all tied in 3rd place will all be shown).