Find records which cross-reference each other - mysql

I want to extract all the rows from a database table, where the rows cross-reference each other.
My table contains 2 rows: ref1 & ref2
Table example:
ID ref1 ref2
01 23 83
02 77 55
03 83 23
04 13 45
In this case, I want my query to return only rows 01 and 03, because they cross-reference each other.
Is this possible using a single query, or will I need to iterate the entire table manually?
I'm using MySQL.

A simple JOIN can do that in a straight forward manner;
SELECT DISTINCT a.*
FROM mytable a
JOIN mytable b
ON a.ref1 = b.ref2 AND a.ref2 = b.ref1;
An SQLfiddle to test with.

select
*
from
tbl t1
where
exists (
select
'x'
from
tbl t2
where
t1.ref1 = t2.ref2 and
t1.ref2 = t2.ref1
)

Let me first present the solution in a context of a table that is meant for representing trees but faces this issue (resulting in a cross reference that is not anymore part of the tree).
Note: If your root tree record(s) reference themselves then you need to filter them out ( a.id!=b.id) as below, to keep only cross-referencing records.
-- case of a tree(s)
select
a.id currentId,
b.id parentId,
b.parent_id grandParentId
from my_table a
join my_table b on
a.parent_id=b.id
and b.parent_id=a.id
and a.id!=b.id;
Now in your case, the above query can be written as (Again considering that records referencing themselves are allowed we add and a.ref1!=b.ref2):
-- case of a graph(s)
select
a.ref1 theCurrent,
b.ref1 theRef,
b.ref2 theRefsRef
from my_table a
join my_table b on
a.ref2=b.ref1
and b.ref2=a.ref1
and a.ref1!=b.ref2;

Related

SQL Query Help Conditional Join

Say I have three tables as such:
Group Table
OGID
OGC
OGCD
56
300
TAS
81
TA
CAL
Structure Table
OSID
L1D
L2D
44
56
81
Contract
ContractID
44
Im giving the ContractID and I want to create a Table that has the follow:
ContractID
Structure L1D
Structure L2D
Group OGID
Group OGC
Group OGCD
Group OGID
Group OGC
Group OGCD
44
56
81
56
300
TAS
81
TA
CAL
What would be the best way to go about this in SQL?
There is also the problem that L2D can be null and anytime I try to make INNER JOIN statements to join the tables, the NULL ones are ignore.
SELECT
Contract.ContractID, Structure.L1D, Structure L2D, Group.OGID, Group.OGC, Group.OGID, Group2.OGID, Group2.OGC, Group2.OGID,
FROM
(
SELECT Structure.OSID, Structure.L1d, Group.OGID, Group.OGC, Group.OGCD
FROM Structure
INNER JOIN Group
ON (Structure.L1D = Group.OGID)
) T1
INNER JOIN
(
SELECT Structure.OSID, Structure.L1D, Group2.OGID, Group2.OGC, Group2.OGCD
FROM Structure
INNER JOIN Group2
ON (Structure.L2D = Group2.OGID)
) T2
ON (T1.OSID = T2.OSID OR T2.OSID = NULL)
See DBFIDDLE
SELECT
s.OSID,
s.L1D,
s.L2D,
g1.OGID,
g1.OGC,
g1.OGCSD,
g2.OGID,
g2.OGC,
g2.OGCSD
FROM structure s
LEFT JOIN `group` g1 ON g1.OGID = s.L1D
LEFT JOIN `group` g2 ON g2.OGID = s.L2D
The DBFIDDLE also shows what happens when L2D has the value NULL.
P.S. Generally you should not use, or at least try to avoid, Reserved Words as table names, like GROUP.

Find errors in a sequence

I have an activity changelog of officers that become active/inactive.
OfficerID ChangeTo ChangeDate
1 active 2017-05-01
1 active 2017-05-02
1 inactive 2017-05-04
6 active 2013-09-09
6 inactive 2016-04-14
6 recruit 2016-06-22
6 active 2016-06-23
6 inactive 2017-04-30
In the case above, officer id 1 is active from 1st of May until the 4th of May.
This is essentially a 'housekeeping' task. The second row is not required and should be deleted. I would like to do this within a MySQL procedure that is linked an event on a schedule. I need a query that can identify these rows, but I'm not sure how.
In a previous system, I had looped through an ordered list and compared the current value against the previous row's value. I have read that loops in MySQL are not encouraged, so I'm trying to figure out how to do this with queries alone.
I tried the following:
SELECT
a.ActivityID, a.OfficerID, a.ChangeTo, a.ChangeDate
FROM
tbl_Officers_Activity as a
INNER JOIN tbl_Officers_Activity AS b
ON a.OfficerID = b.OfficerID
AND a.ChangeDate > b.ChangeDate
AND a.ChangeTo = b.ChangeTo
INNER JOIN tbl_Officers_Activity AS c
ON a.OfficerID = c.OfficerID
AND a.ChangeDate < c.ChangeDate
AND a.ChangeTo <> c.ChangeTo
ORDER BY
OfficerID,
ChangeDate;
I was hoping I could somehow embed the criteria I need into the joins, but I'm at a loss. any help would be greatly appreciated.
This is what you need
SQLFIddle Demo
select a1.officerid,a1.changedate,a1.changeto_a as changeto
From
(select a.officerid,a.changedate,max(a.changeto) as changeto_a,count(*) as rnk
from tbl_Officers_Activity a
inner join tbl_Officers_Activity b
on a.OfficerID=b.OfficerID
and a.ChangeDate>=b.ChangeDate
group by a.officerid,a.changedate) a1
left join
(select a.officerid,a.changedate,max(a.changeto) as changeto_b,count(*) +1 as rnk
from tbl_Officers_Activity a
inner join tbl_Officers_Activity b
on a.OfficerID=b.OfficerID
and a.ChangeDate>=b.ChangeDate
group by a.officerid,a.changedate) b1
on a1.officerid=b1.officerid
and a1.rnk=b1.rnk
where changeto_a = changeto_b
Explanation:
MySQL doesn't have row_Number function, so first I had to derive it. I used this query to get row_number, which is names as rnk in the query. Call the table a1.
(select a.officerid,a.changedate,max(a.changeto) as changeto_a,count(*) as rnk
from tbl_Officers_Activity a
inner join tbl_Officers_Activity b
on a.OfficerID=b.OfficerID
and a.ChangeDate>=b.ChangeDate
group by a.officerid,a.changedate)
Now as MySQL doesn't have LEAD function also, I derived it by using the above query again, and changing the rnk to rnk+1, calling it b1.
Now to replicate LEAD, I left joined a1 with b1
Now using a where clause to find same changeto, you can get your output.
My solution in the end was similar to what Utsav posted, however, after comparing his solution to mine, I found mine to be accurate, and his inaccurate.
For my solution, I create a temporary table for the initial ordered list with an auto incremental primary key. Then clone the temp table and join on itself with the primary pk equal to itself +1 and also equal to the status change.
my procedure ends up like this:
-- create temp table
DROP TABLE IF EXISTS tmp_act;
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_act (
AutoID INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
ActivityID INT(11),
OfficerID INT(11),
ChangeTo text
);
-- temp data
insert into tmp_act (ActivityID,OfficerID,ChangeTo)
SELECT
a.ActivityID, a.OfficerID, a.ChangeTo
FROM
tbl_Officers_Activity as a
ORDER BY
OfficerID,
ChangeDate;
-- housekeeping
DROP TABLE IF EXISTS tmp_act1;
DROP TABLE IF EXISTS tmp_act2;
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_act1 AS (SELECT * FROM tmp_act);
CREATE TEMPORARY TABLE IF NOT EXISTS tmp_act2 AS (SELECT * FROM tmp_act);
SELECT
a.AutoID, a.ActivityID, a.OfficerID, a.ChangeTo
FROM
tmp_act1 as a
INNER JOIN tmp_act2 AS b
ON a.OfficerID = b.OfficerID
AND a.ChangeTo = b.ChangeTo
AND a.AutoID = b.AutoID+1;

SQL: Can't understand how to select from my tables

I need help with a data extraction. I'm an sql noob and I think I have a serious issue with my data design skills. DB system is MYSQL running on Linux.
Table A is structured like this one:
TYPE SUBTYPE ID
-------------------
xyz aaa 0001
xyz aab 0001
xyz aac 0001
xyz aad 0001
xyz aaa 0002
xyz aaj 0002
xyz aac 0002
xyz aav 0002
Table B is:
TYPE1 SUBTYPE1 TYPE2 SUBTYPE2
-------------------------------------
xyz aaa xyz aab
xyz aac xyz aad
Looking at whole table A, I need to extract all rows where both type and subtype are present as columns in a single table B row. Of course this condition is never met since A.subtype can't be at same time equal to B.subtype1 AND B.subtype2 ...
In the example the result set for id should be:
xyz aaa 0001
xyz aab 0001
xyz aac 0001
xyz aad 0001
I m trying to use a join with 2 AND conditions, but of course I got an empty set.
EDIT:
#Barmar thank you for your support. It seems that I m really near the final solution. Just to keep things clear, I opened this thread with a shortened and simplified data structure, just to highlight the point where I was stuck.
I thought about your solution, and is acceptable to have both result on a single row. Now, I need to reduce execution time.
First join takes about 2 minutes to complete, and it produce around 23Million of rows. The second join (table B) is probably taking longer.
In fact, I need 3 hours to have the final set of 10 millions of rows. How can we impove things a bit? I noticed that mysql engine is not threaded, and the query is only using a single CPU. I indexed all fields used by join, but I m not sure its the right thing to do...since I m not a DBA
I suppose also having to rely on VARCHAR comparison for such a big join is not the best solution. Probably I should rewrite things using numerical ID that should be faster..
Probably split things into different query will help parallelism. thanks for a feedback
You can join Table A with itself to find all combinations of types and subtypes with the same ID, then compare them with the values in Table B.
SELECT t1.type AS type1, t1.subtype AS subtype1, t2.type AS type2, t2.subtype AS subtype2, t1.id
FROM TableA AS t1
JOIN TableA AS t2 ON t1.id = t2.id AND NOT (t1.type = t2.type AND t1.subtype = t2.subtype)
JOIN TableB AS b ON t1.type = b.type1 AND t1.subtype = b.subtype1 AND t2.type = b.type2 AND t2.subtype = b.subtype2
This returns the two rows from Table A as a single row in the result, rather than as separate rows, I hope that's OK. If you need to split them up, you can move this into a subquery and join it back with the original table A to return each row.
SELECT a.*
FROM TableA AS a
JOIN (the above query) AS x
ON a.id = x.id AND
((a.type = x.type1 AND a.subtype = x.subtype1)
OR
(a.type = x.type2 AND a.subtype = x.subtype2))
DEMO
You can use EXISTS:
SELECT a.*
FROM TableA a
WHERE EXISTS(
SELECT 1
FROM TableB b
WHERE
(b.Type1 = a.Type AND b.SubType1 = a.SubType)
OR (b.Type2 = a.Type AND b.SubType2 = a.SubType)
)
AND a.ID = '0001'
ONLINE DEMO
You can use Join like this :
Select A.Type, A.SubType, A.ID from a_table A JOIN b_table B
ON (A.Type = B.Type1 AND A.SubType = B.SubType1) OR
(A.Type = B.Type2 AND A.SubType = B.SubType2)
But I think there is a problem in your design, you have same values in Table A with different ID and there is no any condition on ID !
Instead of storing Type and SubType in Table B, you can store an unique ID of each record of Table A to Table B, then you can think about better ways to get results you want ...
Edit :
With UNION of two joins you can get that result :
Select A.Type, A.SubType, A.ID from A_table A
JOIN b_table B1 ON A.Type = B1.Type1 AND A.SubType = B1.SubType1
WHERE (B1.Type2, B1.SubType2) IN (SELECT Type, SubType FROM A_table) AND ID = '0001'
UNION
Select A.Type, A.SubType, A.ID from A_table A
JOIN b_table B2 ON A.Type = B2.Type2 AND A.SubType = B2.SubType2
WHERE (B2.Type1, B2.SubType1) IN (SELECT Type, SubType FROM A_table) AND ID = '0001'
But as I say, I think there is a design problem, it seems better that each type and subtype have an unique ID in Table A and work with this ID on Table B

How to left join or inner join a table itself

I have this data in a table, for instance,
id name parent parent_id
1 add self 100
2 manage null 100
3 add 10 200
4 manage null 200
5 add 20 300
6 manage null 300
How can I left join or inner join this table itself so I get this result below?
id name parent
2 manage self
4 manage 10
6 manage 20
As you can I that I just want to query the row with the keyword of 'manage' but I want the column parent's data in add's row as the as in manage's row in the result.
Is it possible?
EDIT:
the simplified version of my actual table - system,
system_id parent_id type function_name name main_parent make_accessible sort
31 30 left main Main NULL 0 1
32 31 left page_main_add Add self 0 1
33 31 left page_main_manage Manage NULL 0 2
my actual query and it is quite messy already...
SELECT
a.system_id,
a.main_parent,
b.name,
b.make_accessible,
b.sort
FROM system AS a
INNER JOIN -- self --
(
SELECT system_id, name, make_accessible, sort
FROM system AS s2
LEFT JOIN -- search --
(
SELECT system_id AS parent_id
FROM system AS s1
WHERE s1.function_name = 'page'
) AS s1
ON s1.parent_id = s2.parent_id
WHERE s2.parent_id = s1.parent_id
AND s2.system_id != s1.parent_id
ORDER BY s2.sort ASC
) b
ON b.system_id = a.parent_id
WHERE a.function_name LIKE '%manage%'
ORDER BY b.sort ASC
result I get currently,
system_id main_parent name make_accessible sort
33 NULL Main 0 1
but I am after this,
system_id main_parent name make_accessible sort
33 self Main 0 1
You just need to reference the table twice:
select t1.id, t1.name, t2.id, t2.name
from TableA t1
inner join TableA t2
on t1.parent_id = t2.Id
Replace inner with left join if you want to see roots in the list.
UPDATE:
I misread your question. It seems to me that you always have two rows, manage one and add one. To get to "Add" from manage:
select system.*, (select parent
from system s2
where s2.parent_id = system.parent_id
and s2.name = 'add')
AS parent
from system
where name = 'manage'
Or, you might split the table into two derived tables and join them by parent_id:
select *
from system
inner join
(
select * from system where name = 'add'
) s2
on system.parent_id = s2.parent_id
where system.name = 'manage'
This will allow you to use all the columns from s2.
Your data does not abide to a child-parent hierarchical structure. For example, your column parent holds the value 10, which is not the value of any id, so a child-parent association is not possible.
In other words, there's nothing that relates the record 2,manage,null to the record 1,add,self, or the record 4,manage,null to 3,add,10, as you intend to do in your query.
To represent hierarchical data, you usually need a table that has a foreign key referencing it's own primary key. So your column parent must reference the column id, then you can express a child-parent relationship between manage and add. Currently, that's not possible.
UPDATED: Joining by parent_id, try:
select m.id, m.name, a.parent
from myTable m
join myTable a on m.parent_id = a.parent_id and a.name = 'add'
where m.name = 'manage'
Change the inner join to a left join if there may not be a corresponding add row.

Mysql scenario - Get all tasks even if there is no entry?

I have three tables
Tasks with columns Taskid, Taskname
TaskAllocations with columns Taskid, EmpNum
TaskEntries with columns TaskId, EmpNum, WorkedDate, Hoursspent
Now I want to get all the task entries along a particular week. Here my problem is even if there is no Taskentry for a particular task I should get atleast a row with that TaskId, and Taskname with Hoursspent as Null in the query's resultset. I have been trying to get this with the below query.
SELECT A.TaskId,
B.TaskName,
SUM( C.HoursSpent ) as TotalHours ,
C.WorkedDate, C.Comments
FROM TaskAllocations A
LEFT OUTER JOIN TaskEntries C
ON A.TaskId = C.TaskId
AND A.EmpNum = C.EmpNum
INNER JOIN Tasks B
ON A.TaskId = B.TaskId
WHERE A.EmpNum =123456
AND C.WorkedDate
IN ('2010-01-17','2010-01-18','2010-01-19',
'2010-01-20','2010-01-21','2010-01-22','2010-01-23' )
GROUP BY A.TaskId, C.WorkedDate
ORDER BY A.TaskId,C.WorkedDate ASC ';
What I am getting for this SQL piece is if and only if there is an entry for a particular task id, then only i am getting a row for that. but what I want is to get atleast a row for each and every task that is available to a EmpNum. Even if I get one row for each TaskId and WorkedDate combination no issues. Please help me with this. Actual intention of this is to build a HTML two dimensional table with each task entry against date and task as shown below.
---------------------------------------------------------
TaskId TaskName Sun Mon Tue Wed Thu Fri Sat
---------------------------------------------------------
18 name1 2 3 4:30 3:30
19 name2
20 name3 4 2:30
22 name4 2:30
23 name5
24 name6 1:30 6
---------------------------------------------------------
So that this can be updated by the user for each year week. First I thought of group_concat but because of performance I am using normal group by query.
Note: for a particular taskid and workeddate there will be only one entry of hoursspent.
I have almost built the frontend. Please help me to get all task ids as above even if there is no entry. Do I need to use subquery.
don't user an inner join, use a left or right join, depending which values from which tables you want.
so:
SELECT *
FROM tasks t
LEFT JOIN taskentries te
ON t.id = te.id
which is the same statement as:
SELECT *
FROM tasksentries te
RIGHT JOIN tasks t
ON te.id = t.id
will get you all tasks, even if there is no taskentry
an inner join will only select rows when there are rows in both tables, left join selects all rows from the left (first) table and matching from the other row (if there is no such row, null will be the value of all columns). right join will do the oposite: select all rows from right (second) table and matching from left.
a LEFT JOIN b is the same as b RIGHT JOIN a
After rigorous testing of different options I came up with the below solution which will give the required results for me.
SELECT Final.TaskId,
Final.TaskName,
Tmp.HoursSpent AS TotalHours,
Tmp.WorkedDate
FROM (
SELECT A.TaskId, B.TaskName, A.EmpNum
FROM TaskAllocations A
INNER JOIN
Tasks B
ON ( A.TaskId = B.TaskId )
WHERE a.empnum = "333"
)Final
LEFT OUTER JOIN (
SELECT New.TaskId, New.EmpNum, New.WorkedDate, New.HoursSpent
FROM TaskEntries New
WHERE New.WorkedDate
IN
('2010-01-17','2010-01-18','2010-01-19',
'2010-01-20','2010-01-21','2010-01-22','2010-01-23' )
OR New.WorkedDate IS NULL
AND New.EmpNum = "333"
)Tmp
ON Tmp.TaskId = Final.TaskId
AND Tmp.EmpNum = Final.EmpNum
ORDER BY Final.TaskId, Tmp.WorkedDate ASC ;
The first query of mine in the question was not working as I was putting a condition on right table's column while doing Left Outer Join. Thanks to all for the support.