Select Matched Pairs from Two Tables - mysql

I need to select matched pairs from two tables containing similarly structured data. "Matched Pair" here means two rows that reference each other in the 'match' column.
A single-table matched pair example:
TABLE
----
id | matchid
1 | 2
2 | 1
ID 1 and 2 are a matched pair because each has a match entry for the other.
Now the real question: what is the best (fastest) way to select the matched pairs that appear in both tables:
Table ONE (id, matchid)
Table TWO (id, matchid)
Example data:
ONE TWO
---- ----
id | matchid id | matchid
1 | 2 2 | 3
2 | 3 3 | 2
3 | 2
4 | 5
5 | 4
The desired result is a single row with IDs 2 and 3.
RESULT
----
id | id
2 | 3
This is because 2 & 3 are a matched pair in table ONE and in table TWO. 4 & 5 are a matched pair in table ONE but not TWO, so we don't select them. 1 and 2 are not a match pair at all since 2 does not have a matching entry for 1.
I can get the matched pairs from one table with this:
SELECT a.id, b.id
FROM ONE a JOIN ONE b
ON a.id = b.matchid AND a.matchid = b.id
WHERE a.id < b.id
How should I build a query that selects only the matching pairs that appear in both tables?
Should I:
Select the query above for each table and WHERE EXISTS them together?
Select the query above for each table and JOIN them together?
Select the query above then JOIN table TWO twice, once for 'id' and once for 'matchid'?
Select the query above for each table and loop through to compare them back in php?
Somehow filter table TWO down so we only have to look at the IDs in matched pairs in table ONE?
Do something totally different?
(Since this is a question of efficiency, it is worth noting that the matches will be quite sparse, maybe 1/1000 or less, and each table will have 100,000+ rows.)

I think I get your point. You want to filter the records in which the pairs exists on both tables.
SELECT LEAST(a.ID, a.MatchID) ID, GREATEST(a.ID, a.MatchID) MatchID
FROM One a
INNER JOIN Two b
ON a.ID = b.ID AND
a.matchID = b.matchID
GROUP BY LEAST(a.ID, a.MatchID), GREATEST(a.ID, a.MatchID)
HAVING COUNT(*) > 1
SQLFiddle Demo

Try this Query:
select
O.id,
O.matchid
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
Edited Query:
select distinct
Least(O.id,O.matchid) ID,
Greatest(O.id,O.matchid) MatchID
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
and (CAST(O.matchid as CHAR(50))+'~'+CAST(O.id as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
SQL Fiddle

Naive version, which checks all the four rows that need to exist:
-- EXPLAIN ANALYZE
WITH both_one AS (
SELECT o.id, o.matchid
FROM one o
WHERE o.id < o.matchid
AND EXISTS ( SELECT * FROM one x WHERE x.id = o.matchid AND x.matchid = o.id)
)
, both_two AS (
SELECT t.id, t.matchid
FROM two t
WHERE t.id < t.matchid
AND EXISTS ( SELECT * FROM two x WHERE x.id = t.matchid AND x.matchid = t.id)
)
SELECT *
FROM both_one oo
WHERE EXISTS (
SELECT *
FROM both_two tt
WHERE tt.id = oo.id AND tt.matchid = oo.matchid
);
This one is simpler :
-- EXPLAIN ANALYZE
WITH pair AS (
SELECT o.id, o.matchid
FROM one o
WHERE EXISTS ( SELECT * FROM two x WHERE x.id = o.id AND x.matchid = o.matchid)
)
SELECT *
FROM pair pp
WHERE EXISTS (
SELECT *
FROM pair xx
WHERE xx.id = pp.matchid AND xx.matchid = pp.id
)
AND pp.id < pp.matchid
;

Related

mysql query that does not return an id with multiple entries

I wasn't sure how to word the title, but here is what I am trying to do. I have a table where the id can have multiple entries
id | number
___________
1 | 90
1 | 88
2 | 88
3 | 88
I want a query that will return all ids that don't contain the number 90, so only 2 and 3 in this example. I have tried the below, but it still returns the id of 1 since it also has a number of 88.
SELECT DISTINCT id FROM table WHERE number NOT IN (90)
One way of getting the result is by using NOT EXISTS. Basically what it does it it gets all ID which has 90 in the inner query and the NOT EXISTS only shows all ID not in the inner query.
SELECT A.*
FROM TableName a
WHERE NOT EXISTS (SELECT NULL
FROM TableName B
WHERE a.ID = b.ID
AND b.number = 90)
Here's a Demo.
An alternative is by using LEFT JOIN which yields the same result as above.
SELECT a.*
FROM TableName a
LEFT JOIN TableName b
ON a.ID = b.ID
AND b.number = 90
WHERE b.id IS NULL
Here's a Demo.
You can use subquery:
SELECT id
FROM table
WHERE id NOT IN (SELECT id FROM table WHERE number = 90)
You can use aggregation as illustrated below for better performance:
SELECT ID
FROM YourTable
GROUP BY ID
HAVING NOT INSTR(GROUP_CONCAT(`number`),'90');
Demo on SQL Fiddle.

How to join two tables, with distinct columns on either side?

I have two tables I'm trying to join to produce a unique set of data for a third table, but having trouble doing this properly.
The left table has an id field, as well as a common join field (a).
The right table has the common join field (a), and another distinct field (b)
I'm trying to extract a result-set of id and b, where neither id nor b are duplicated.
I have an SQL fiddle set up: http://www.sqlfiddle.com/#!9/208de/3/0
The ideal results should be:
id | b
---+---
1 | 1
2 | 2
3 | 3
Each id and b value appears only once (it's only coincidence they match here, that can't be assumed always).
Thanks
What about a CTE along with a DISTINCT, Would that work?
WITH
cte1 (ID, B)
AS
(
SELECT DISTINCT Table1.ID
FROM Table1
WHERE Table1.ID IS NOT NULL
GROUP BY Table1.ID
)
SELECT DISTINCT
Table2.b
FROM Table2 AS sp
INNER JOIN cte1 AS ts
ON sp.b <> ts.ID
ORDER BY ts.ID DESC

Replace category names with ids saving hierarchy

I have such data table:
category_name | subcategory_name | other data
---------------------------------------------
fruits | apples | ...
fruits | oragnes | ...
What is the best way to replace category and subcategory names with its ids, moving them to other table? Getting this in result:
category table:
id | name | parent_id
------------------------
1 | fruits | 0
2 | apples | 1
3 | oragnes| 1
data table:
category_id | subcategory_id | other data
---------------------------------------------
1 | 2 | ...
1 | 3 | ...
I can do all manually using some select distinct and join queries , but is there any better way?
There likely is an easier way to do this; but mysql doesn't contain many of the window functions I'd like to user here, nor common table expressions, nor views with user variables.... So I'm left with a bit of a mess but it seems to work...
I'm assuming your current model could have N-Levels.
I'm also assuming a subCategory doesn't exist under multiple categories..
What this does is
Generate a set of data which contains the "names" we had to find all the names of categories without parents and union those to the names with parents.
Then we assigned a user variables (#) to generate a new ID each item. Two different variables were used so numbering didn't wrap from one query to the next.
We then copied that query into two separate sub queries (A,B in my example) and joined them to get the parent ids.
SQL Fiddle
-- distinct used to get 1 record for each parent and an outer wraper to return just desired results
Select Distinct A.ID as ID, A.SubCategory_name name, B.ID as Parent_ID from (
---this select assigns a row number for each named value
Select #rn:=#rn+1 ID, t1.*
from (
--Get just parents without any parent
Select NULL as Category_Name, F1.category_name as SubCategory_name
FROM (Select distinct category_Name from foo) F1
LEFT JOIN Foo F2
on F1.Category_name = F2.SubCategory_name
where F2.SubCategory_name is null
UNION ALL
--get just children of parents
SELECT category_name, subcategory_Name from foo) T1
-- Used to get a row number assigned
CROSS JOIN (SELECT #rn:=0) t2
-- used to ensure same order applied to both queries so numbers match
-- though now that I think about it I don't think we need numbers in 2nd query
order by Category_name, SubCategory_Name) A
LEFT JOIN (
Select #r:=#r+1 ID, t1.*
from (
Select NULL as Category_Name, F1.category_name as SubCategory_name
FROM (Select distinct category_Name from foo) F1
LEFT JOIN Foo F2
on F1.Category_name = F2.SubCategory_name
where F2.SubCategory_name is null
UNION ALL
SELECT category_name, subcategory_Name from foo) T1
CROSS JOIN (SELECT #r:=0) t2
order by Category_name, SubCategory_Name) B
on B.SubCategory_Name = A.Category_name
and from the above select you could create table, or populate a table.

Select all rows that have same ID

I have this table:
ID | Part
1 | A
1 | B
1 | C
2 | B
2 | C
3 | A
3 | D
3 | E
4 | B
4 | D
and want a query that will grab all ID's that have an A, and return a list of all other parts with that ID.
e.g: Want Parts related to B:
Part | Count
A | 1
C | 2
D | 1
What I have currently:
SELECT * FROM tble WHERE ID IN (SELECT DISTINCT ID FROM tble t WHERE Part = ?)
GROUP BY Part ORDER BY COUNT(Part) DESC
This works, but is quite slow and I'm looking to improve it, but having difficulty
Your query is not unreasonable, although the distinct is unnecessary and I would use exists rather than in. And, the outer select needs to be fixed for the aggregation
SELECT t.part, COUNT(*)
FROM tble t
WHERE EXISTS (SELECT 1 FROM tble t2 WHERE t2.ID = t.ID AND t2.Part = ?)
GROUP BY t.Part
ORDER BY COUNT(*) DESC;
Then, to optimize this query, you want an index:
create index idx_tble_id_part on tble(id, part);
Simplify this.. Once you have the logic down, then add back in the SELECT * FROM..
SELECT Part, COUNT(Part) As Count_of_Part
GROUP BY Part ORDER BY COUNT(Part) DESC
Do a join from the table back to itself on ID, and then count the distinct values that pop up:
SELECT b.part, COUNT(DISTINCT b.id)
FROM
table as a
INNER JOIN table as b ON
a.id = b.id AND
a.part <> b.part
WHERE
a.part = 'B'
GROUP BY b.part
This can be simply done by joining back to the table:
SELECT t1.part
,count(*)
FROM tble t1
INNER JOIN tble t ON t.id = t1.id
AND t.part = 'B'
AND t1.part <> t.part
GROUP BY t1.part
SQL Fiddle Demo
You should be able to do this by grouping the data.
Try something like this:
SELECT part, COUNT(id) AS TotalCountId
FROM TABLE_NAME
GROUP BY ID

Select distinct records in mysql

My table
ANONYMOUS
ONE TWO
1 2
2 1
1 2
3 1
Now i want to select distinct set of one and two.
My selected list should be
ANONYMOUS
ONE TWO
1 2
3 1
Your question isn't very clear, but I guess you mean this:
SELECT DISTINCT one, two
FROM yourtable AS T1
WHERE one <= two
OR NOT EXISTS
(
SELECT *
FROM yourtable AS T2
WHERE T1.one = T2.two
AND T1.two = T2.one
)
It finds rows with (one, two) where the reversed pair (two, one) does not exist. If both exist, it chooses the pair such that one < two. It also selects rows where the values are equal.
See it working online: sqlfiddle
If you would prefer to use a JOIN instead of NOT EXISTS you can do that:
SELECT DISTINCT T1.one, T1.two
FROM yourtable AS T1
LEFT JOIN yourtable AS T2
ON T1.one = T2.two
AND T1.two = T2.one
WHERE T1.one <= T1.two
OR T2.one IS NULL
See it working online: sqlfiddle
SELECT DISTINCT a.*
FROM `ANONYMOUS` a
LEFT JOIN `ANONYMOUS` b ON (a.one=b.two and a.two=b.one)
WHERE b.one is null or a.one<b.one
ORDER BY 1,2