Remove double entry from orignial table when joining - mysql

I've got 2 tables:
adresses and a log of files (named send) i've sent.
For a given file, I want to get all adresses, and whether they received the file or not.
What I've got so far is this:
SELECT *
, CASE
WHEN send.fileid = 1 THEN 1
ELSE send.fileid = NULL
END as file1
FROM send
RIGHT OUTER JOIN `adress`
ON `send`.adressid = `adress`.`id`
The problem is, when an adress got two diffrent files, they get listed twice. How can I alter the statement to get arround this?
Example Data
*adress*
1 Adrian
2 Christian
3 Max
4 Alex
*file*
1 music
2 video
3 document
*send*
adress:1 file:1
adress:1 file:2 -
adress:3 file:1
adress:4 file:2 -
adress:4 file:3
when i browse the file 2, i want to see:
X Adrian
X Alex
Christian
Max
TLDR: I want all my adresses (once) with either the specific file id or null.
Thanks in advance.

One way of going about is is putting the condition in a subquery and letting the outer join do all the heavy lifting:
SELECT a.*, s.fieldid
FROM address a
LEFT JOIN (SELECT filedid, addressid
FROM send
WHERE fileid = 1) ON s.addressid = a.id

Surely you can do this just by using GROUP BY?
I put together a quick example, then realised this is MySQL. I think the fundamental approach is the same, but the example won't work., as this is SQL Server syntax:
DECLARE #address TABLE (address_id INT, address_name VARCHAR(50));
INSERT INTO #address SELECT 1, 'Adrian';
INSERT INTO #address SELECT 2, 'Christian';
INSERT INTO #address SELECT 3, 'Max';
INSERT INTO #address SELECT 4, 'Alex';
DECLARE #file TABLE ([file_id] INT, [file_name] VARCHAR(50));
INSERT INTO #file SELECT 1, 'music';
INSERT INTO #file SELECT 2, 'video';
INSERT INTO #file SELECT 3, 'document';
DECLARE #send TABLE (address_id INT, [file_id] INT);
INSERT INTO #send SELECT 1, 1;
INSERT INTO #send SELECT 1, 2;
INSERT INTO #send SELECT 3, 1;
INSERT INTO #send SELECT 4, 2;
INSERT INTO #send SELECT 4, 3;
SELECT
a.address_id,
a.address_name,
MAX(CASE WHEN f.[File_id] = 1 THEN 'X' END) AS file_1,
MAX(CASE WHEN f.[File_id] = 2 THEN 'X' END) AS file_2,
MAX(CASE WHEN f.[File_id] = 3 THEN 'X' END) AS file_3
FROM
#address a
LEFT JOIN #send s ON s.address_id = a.address_id
LEFt JOIN #file f ON f.[file_id] = s.[file_id]
GROUP BY
a.address_id,
a.address_name
ORDER BY
a.address_id;
This gives a matrix of address and files, i.e.:
address_id address_name file_1 file_2 file_3
1 Adrian X X NULL
2 Christian NULL NULL NULL
3 Max X NULL NULL
4 Alex NULL X X

SELECT *
FROM adress
LEFT JOIN send ON send.adressid = adress.id
AND send.fileid =1
LIMIT 0 , 30
that seems to be it

Related

sql query to find potential duplicate records

I am working on employer's data to find out duplicate employers based on their names.
Data is Like this:
Employer ID | Legal Name | Operating Name
------------- | ---------------| --------------------
1 | AA | AA
2 | BB | AA
3 | CC | BB
4 | DD | DD
5 | ZZ | ZZ
Now if I try to find all duplicates of employer AA the query should return the following result:
Employer ID | Legal Name | Operating Name
------------- | ---------------| --------------------
1 | AA | AA
2 | BB | AA
3 | CC | BB
Employer 1's legal name and Employer 2's Operating Name are the direct match with the search.
But the catch is employer 3 which is not directly related with the search string but employer 2's legal name matches with employer 3's operating name.
And I need the search results up to nth level. I am not sure if that can be achieved by recursive query of something like that.
Please help
I was trying to achieve this by Recursive CTE but then I realized that it is going into infinite recursion. Here is the code:
DECLARE #SearchName VARCHAR(50)
SET #SearchName = 'AA'
;With CTE_EmployerNames
AS
(
-- Anchor Member definition
select *
from [dbo].[Name_Table]
where Leg_Name = #SearchName
OR Op_Name = #SearchName
UNION ALL
-- Recursive Member definition
select N.*
from [dbo].[Name_Table] N
JOIN CTE_EmployerNames C
ON N.ID <> C.ID
AND (N.Leg_Name = C.Leg_Name
OR N.Leg_Name = C.Op_Name
OR N.Op_Name = C.Leg_Name
OR N.Op_Name = C.Op_Name)
)
select *
from CTE_EmployerNames
Update:
I created a stored procedure to achieve what I want. But this procedure is a bit slow because of looping and cursor. As of now this is solving my problem by little compromising with execution time. Any suggestion to optimize it or another way to do this will be highly appreciated. thanks guys. Here is the code:
CREATE PROCEDURE [dbo].[Get_Similar_Name_Employers]
#P_BaseName VARCHAR(100)
AS
BEGIN
DECLARE #ID INT
DECLARE #Leg_Name VARCHAR(50)
DECLARE #Op_Name VARCHAR(50)
-- Create temp table to hold data temporarily
CREATE TABLE #Temp_Employers
(
[ID] [int] NULL,
[Leg_Name] [varchar](50) NULL,
[Op_Name] [varchar](50) NULL,
[Status] [bit] null -- To keep track if that record is processed or not
)
-- Insert all records which are directly matching with search criteria
INSERT INTO #Temp_Employers
SELECT NT.ID, NT.Leg_Name, NT.Op_Name, 0
FROM dbo.Name_Table NT
WHERE NT.Leg_Name = #P_BaseName
OR NT.Op_Name = #P_BaseName
while EXISTS (SELECT 1 from #Temp_Employers where Status = 0) -- until all rows are processed
BEGIN
DECLARE #EmployerCursor CURSOR
SET #EmployerCursor = CURSOR FAST_FORWARD
FOR
SELECT ID, Leg_Name, Op_Name
from #Temp_Employers
where Status = 0
OPEN #EmployerCursor
FETCH NEXT
FROM #EmployerCursor
INTO #ID, #Leg_Name, #Op_Name
WHILE ##FETCH_STATUS = 0
BEGIN
-- For every unprocessed record in temp table check if there is any possible duplicate.
-- and insert all possible duplicate records in same table for further processing to find their possible duplicates
INSERT INTO #Temp_Employers
select ID, Leg_Name, Op_Name, 0
from dbo.Name_Table
WHERE (Leg_Name = #Leg_Name
OR Op_Name = #Op_Name
OR Leg_Name = #Op_Name
OR Op_Name = #Leg_Name)
AND ID NOT IN ( select ID
FROM #Temp_Employers)
-- Update status of recently processed record to avoid processing again
UPDATE #Temp_Employers
SET Status = 1
WHERE ID = #ID
FETCH NEXT
FROM #EmployerCursor
INTO #ID, #Leg_Name, #Op_Name
END
-- close cursor and deallocate memory
CLOSE #EmployerCursor
DEALLOCATE #EmployerCursor
END
select ID,
Leg_Name,
Op_Name
from #Temp_Employers
Order By ID
DROP TABLE #Temp_Employers
END
You are basically trying to build a directed acyclic graph in which the nodes are names and you want to find all the names that lead to your employee.
There is a beginning tutorial at Oracle Tip: Solving directed graph problems with SQL, part 1, and a related StackOverflow question at Directed graph SQL.
You can do this with two self joins. I used DISTINCT to be safe - you don't need it for your example, but probably will for your actual data:
SELECT DISTINCT T2.EMPID, T2.LEGAL_NAME, T.LEGAL_NAME
FROM TABLE T
INNER JOIN TABLE T2 ON T.LEGAL_NAME = T2.OPERATING_NAME
INNER JOIN TABLE T3 ON T2.OPERATING_NAME = T3.OPERATING_NAME
WHERE T.LEGAL_NAME <> T3.LEGAL_NAME
Rename and alias tables and columns as you like.
SQL Fiddle Example
Edit - If you also want records where the op name is simply different from the legal name, UNION those in:
SELECT DISTINCT T2.EMPID, T2.LEGAL_NAME, T.LEGAL_NAME
FROM TABLE T
INNER JOIN TABLE T2 ON T.LEGAL_NAME = T2.OPERATING_NAME
INNER JOIN TABLE T3 ON T2.OPERATING_NAME = T3.OPERATING_NAME
WHERE T.LEGAL_NAME <> T3.LEGAL_NAME
UNION
SELECT EMPID, LEGAL_NAME, OP_NAME
FROM TABLE
WHERE LEGAL_NAME <> OP_NAME
SQL Fiddle Example 2

Check whether particular name order is available in my table

I have the following table stops how can I check whether the following stops name order GHI, JKL, MNO is available in my stops table?
stops table:
CREATE TABLE IF NOT EXISTS stops
(
stop_id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
name varchar(30) NOT NULL,
lat double(10,6),
longi double(10,6)
);
Simple:
1 ABC
2 DEF
3 GHI
4 JKL
5 MNO
6 PQR
7 SDU
8 VWX
This query will return 1 when there is an ordered of 'GHI','JKL','MNO':
SELECT 1
FROM stops s1
JOIN stops s2 ON s1.stop_id = s2.stop_id - 1
JOIN stops s3 ON s2.stop_id = s3.stop_id - 1
WHERE CONCAT(s1.name, s2.name, s3.name) = CONCAT('GHI','JKL','MNO')
SQL Fiddle Demo
This is a variation of the well known "find equal sets" task.
You need to insert the searched route into a table with a sequenced stop_id:
create table my_stops(stop_id INT NOT NULL,
name varchar(30) NOT NULL);
insert into my_stops (stop_id, name)
values (1, 'GHI'),(2, 'JKL'),(3, 'MNO');
Then you join and calculate the difference between both sequences. This returns a totally meaningless number, but always the same for consecutive values:
select s.*, s.stop_id - ms.stop_id
from stops as s join my_stops as ms
on s.name = ms.name
order by s.stop_id;
Now group by that meaningless number and search for a count equal to the number of searched steps:
select min(s.stop_id), max(s.stop_id)
from stops as s join my_stops as ms
on s.name = ms.name
group by s.stop_id - ms.stop_id
having count(*) = (select count(*) from my_stops)
See Fiddle
Another alternative:
select 1
from stops x
where x.name = 'GHI'
and (select GROUP_CONCAT(name order by y.stop_id)
from stops y where y.stop_id between x.stop_id + 1
and x.stop_id + 2
) = 'JKL,MNO';

SQLite "split" row into multiple rows

Is there a way to "split" one row into multiple rows?
My problem is that I have table of edges where patternID is the id of the edge and sourceStationID and targetStationID are the ids that the edge connects:
Patterns:
patternID | sourceStationID | targetStationID
1|1|2
2|1|6
3|1|3
4|1|4
5|4|6
6|5|6
I also have table of Hubs where I can transfer:
Hubs:
hubID
4
5
I need to get out of those data patternIDs that connect stations 1->6 exactly via one hub. So the result of query should be:
4
5
I did that by joining patterns table with hubs table and again with patterns table so i get:
patternID | sourceStationID | targetStationID | patternID | sourceStationID | targetStationID
4 | 1 | 4 | 5 | 4 | 6
How can I split this row into two rows?
edit:
Here is code that I use so far:
select t2a.patternID from
(
select * from `patterns`
join `hubs` on `targetStationID` = `hubID`
where `sourceStationID` = 1
) as t1a
join
(
select * from `patterns`
join `hubs` on `sourceStationID` = `hubID`
where `targetStationID` = 6
) as t2a
on t1a.hubID = t2a.hubID
union
select t1b.patternID from
(
select * from `patterns`
join `hubs` on `targetStationID` = `hubID`
where `sourceStationID` = 1
) as t1b
join
(
select * from `patterns`
join `hubs` on `sourceStationID` = `hubID`
where `targetStationID` = 6
) as t2b
on t1b.hubID = t2b.hubID;
It's working but I'm using the same select twice.
Updating my answer to hopefully be closer to what you wanted. As far as I understand your question, this seems to give what you requested. The view could be avoided, with its join done in the CTE instead:
C:\Users\DDevienne>sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table edges (id integer primary key autoincrement, beg int, end int);
sqlite> insert into edges (beg, end) values (1, 2), (1, 6), (1, 3), (1, 4), (4, 6), (5, 6);
sqlite> create view edges2 as
...> select start.id as beg_id, start.beg as beg, start.end as hub, finish.id as end_id, finish.end as end
...> from edges start, edges finish
...> where start.end = finish.beg;
sqlite> .headers on
sqlite> select * from edges2;
beg_id|beg|hub|end_id|end
4|1|4|5|6
sqlite> create table hubs (id integer primary key);
sqlite> insert into hubs values (4), (5);
sqlite> with hub_edges(beg_id, beg, hub, end_id, end) as (
...> select edges2.* from edges2, hubs where edges2.hub = hubs.id
...> )
...> select beg_id as id, beg, hub as end from hub_edges
...> union all
...> select end_id as id, hub as beg, end from hub_edges;
id|beg|end
4|1|4
5|4|6
sqlite>
Warning: This requires the latest SQLite (3.8.4.3), and won't work with 3.8.3.x

MYSQL: Find all posts between certain users

I have two tables:
table message (holds the creator of a message and the message)
id - creatorId - msg
and a table message_viewers (tells who can read the message msgId)
msgId - userId
If I create a message as user 1 and send it to user 2 and user 3, the tables will look like this:
tbl_message:
1 - 1 - 'message'
tbl_message_viewers:
1 - 2
1 - 3
What I want to do is to fetch the messages that are between the users x1...xN (any number of users) AND ONLY the messages between them.
(Example if users are 1, 2, and 3, I want the messages where the creator is 1, 2 or 3, and the users are 2,3 for creator = 1, 1 and 3 for creator = 2 and 1, 2 for creator = 3)
I am not interested by messages between 1 and 2, or 2 and 3, or 1 and 3, but only by messages between the 3 people.
I tried different approaches, such as joining the two tables on message id, selecting the messages where creatorId IN (X,Y) and then taking only the rows where userId IN (X, Y) as well. Maybe something about grouping and counting the rows, but I could not figure out a way of doing this that was working.
EDIT: SQL Fiddle here
http://sqlfiddle.com/#!2/963c0/1
I think this might do what you want:
SELECT m.*
FROM message m
INNER JOIN message_viewers mv ON m.id = mv.msgId
WHERE m.creatorId IN (1, 2, 3)
AND mv.userId IN (1, 2, 3)
AND NOT EXISTS (
SELECT 1
FROM message_viewers mv2
WHERE mv2.msgId = mv.msgId
AND mv2.userId NOT IN (1, 2, 3)
)
AND mv.userId != m.creatorId;
The IN's will give the users that created/can see, and the mv.userId != m.creatorId are for excluding the creator from the message_viewers table (like you showed in your requirements).
Edit:
With the requirement of only sending messages between those 3 id's, i came up with the following:
SELECT m.id,m.creatorId,m.message
FROM message m
INNER JOIN message_viewers mv ON m.id = mv.msgId
WHERE m.creatorId IN (1, 2, 3)
AND mv.userId IN (1, 2, 3)
AND mv.userId != m.creatorId
AND NOT EXISTS (
SELECT 1
FROM message_viewers mv2
WHERE mv2.msgId = mv.msgId
AND mv2.userId NOT IN (1, 2, 3)
)
GROUP BY 1,2,3
HAVING COUNT(*) = 2;
sqlfiddle demo
Try this with join and with IN() clause
SELECT * FROM
tbl_message m
JOIN tbl_message_viewers mv (m.id = mv.msgId )
WHERE m.creatorId IN(1,2,3) AND mv.userId IN(1,2,3)
Sounds like you might want the BETWEEN operator:
SELECT * FROM tablename WHERE fieldname BETWEEN 1 AND 10;
-- returns fieldname 1-10
In this case however, BETWEEN is inclusive, so you'll need to specify != those conditions as well:
SELECT * FROM tablename WHERE fieldname BETWEEN 1 AND 10 AND fieldname NOT IN (1, 10)
-- returns fieldname 2-9
http://www.w3schools.com/sql/sql_between.asp
this worked on oracle
first join gets row count for people we are not interested in
second join gets row count for people we are interested in
the in clauses will need to be generated by some sort of dynamic sql
and number_of_people also needs to be generated somehow.
select msgId, count_1, count_2
from message tm
join ( select ty.msgId as ty_msgId,
count(ty.msgId) as count_1
from message_viewers ty
where ty.userId not in (:a,:b,:c)
group by ty.msgId)
on msgId = ty_msgId
join (select tz.msgId as tz_msgId,
count(tz.msgId) as count_2
from message_viewers tz
where tz.userId in (:a,:b,:c)
group by tz.msgId)
on msgId = tz_msgId
where createrId in(:a,:b,:c)
and count_1 = 0
and count_2 = :number_of_people -1;
my sql prefers this
select msgId, count_1, count_2
from message tm
left join ( select ty.msgId as ty_msgId,
count(ty.msgId) as count_1
from message_viewers ty
where ty.userId not in (:a,:b,:c)
group by ty.msgId) as X
on msgId = ty_msgId
left join (select tz.msgId as tz_msgId,
count(tz.msgId) as count_2
from message_viewers tz
where tz.userId in (:a,:b,:c)
group by tz.msgId) as Y
on msgId = tz_msgId
where createrId in(:a,:b,:c)
and (count_1 = 0 or count_1 is null)
and count_2 = :number_of_people -1;

MySQL - loop through rows

I have the following code
select count(*)
from (select Annotations.user_id
from Annotations, Users
where Users.gender = 'Female'
and Users.user_id = Annotations.user_id
and image_id = 1
group by Annotations.user_id
having sum(case when stem = 'taxi' then 1 else 0 end) > 0 and
sum(case when stem = 'zebra crossing' then 1 else 0 end) > 0
) Annotations
It produces a count of how many females who have given the stem 'taxi' and 'zebra crossing' for image 1.
Sample data
user id, image id, stem
1 1 image
1 1 taxi
1 1 zebra crossing
2 1 person
2 1 zebra crossing
2 1 taxi
3 1 person
3 1 zebra crossing
Expected result (or similar)
stem1, stem2, count
taxi , zebra crossing 2
person, zebra crossing 2
However, as there are over 2000 stems, I cannot specify them all.
How would I go around looping through the stem rows with the image_id = 1 and gender = female as opposed to specifying the stem string?
Thank you
As per my understanding, you need to fetch female users that have 2 or more stems
Update: It seems you need to display the user's that have a stem that is used by another user too, I have updated the query for the same
SELECT
distinct a.user_id,
group_concat(DISTINCT a.stem ORDER BY a.stem)
FROM
Annotations a
JOIN Users u ON ( a.user_id = u.user_id AND u.gender = 'Female' )
JOIN
(
SELECT
b.user_id,
b.stem
FROM
Annotations b
) AS b ON ( a.user_id <> b.user_id AND b.stem = a.stem )
WHERE
a.image_id = 1
GROUP BY
a.user_id
UPDATE: As I understand it, you want to select all combinations of 2 stems, and get a count of how many users have that combination of stems. Here is my solution:
SELECT stem1, stem2, count(*) as count FROM
(
SELECT a.user_id,a.image_id,a.stem as stem1,b.stem as stem2
FROM Annotations a JOIN Annotations b
ON a.user_id=b.user_id && b.image_id=a.image_id && a.stem!=b.stem
JOIN Users ON Users.user_id = a.user_id
WHERE Users.gender = "Female"
) as stems GROUP BY stem1, stem2 having count > 1 WHERE image_id=1;
The caveat here is that it will return 2 rows for each combinations of stems. (The second occurrence will have the stems in reverse order).
Here's my attempt to solve your problem:
SELECT COUNT(*) AS Count, a1.stem AS Stem1, a2.Stem AS Stem2
FROM Annotations AS a1
INNER JOIN Annotations AS a2 ON a1.user_id = a2.user_id AND a1.image_id = a2.image_id
AND a1.stem < a2.stem
WHERE a1.image_id = 1
GROUP BY a1.stem, a2.Stem
HAVING COUNT(*) > 1;
I did not include image_id logic.
Please see my SQL Fiddle here: http://sqlfiddle.com/#!2/4ee69/33
Based on the following data (copied from yours) I get the result posted underneath it.
CREATE TABLE Annotations
(`user_id` int, `image_id` int, `stem` varchar(14))
;
INSERT INTO Annotations
(`user_id`, `image_id`, `stem`)
VALUES
(1, 1, 'image'),
(1, 1, 'taxi'),
(1, 1, 'zebra crossing'),
(2, 1, 'person'),
(2, 1, 'zebra crossing'),
(2, 1, 'taxi'),
(3, 1, 'person'),
(3, 1, 'zebra crossing')
;
COUNT STEM1 STEM2
2 person zebra crossing
2 taxi zebra crossing