Let's say I have the following table called Email, where Id is the primary key:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | anne#example.com |
| 2 | cat#example.com |
| 3 | anne#example.com |
+----+------------------+
I'm trying to delete all occurrences of duplicates except the first. So in this case the desired output would be
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | anne#example.com |
| 2 | cat#example.com |
+----+------------------+
After asking a friend, I found this solution works:
DELETE t1 FROM Person t1 INNER JOIN Person t2
Where t1.Email=t2.Email and t1.Id > t2.Id
My question is why does this work? In particular, when t1 inner joins t2 on Email field, how does the program know which row of anne#example.com should be matched with which, since there are multiple occurrences of this value with different Ids?
Consider this select statement only filtering by equality among email columns
SELECT t1.*, t2.*
FROM Person t1
INNER JOIN Person t2
WHERE t1.Email=t2.Email
ORDER BY t1.Id, t2.Id;
returns (1,1), (1,3), (3,1), (3,3) for t1.id and t2.id values respectively for the mail anne#example.com, and only (2,2) for cat#example.com. Then If you consider the other filter AND t1.Id > t2.Id,
SELECT t1.*, t2.*
FROM Person t1
INNER JOIN Person t2
WHERE t1.Email=t2.Email
AND t1.id > t2.id
ORDER BY t1.Id, t2.Id;
then you'll only have one tuple (3,1) since t1.id > t2.id is satisfied only for this case of id tuples. If you convert SELECT t1.*, t2.* to DELETE t1 (of course remove ORDER BY part also), then obviously you'll delete id = 3 and left rows with id values 1 and 2, reversely if you replace SELECT t1.*, t2.* with DELETE t2, then you'll have rows with id values 2 and 3.
Demo
First, this is more commonly written using aggregation:
DELETE p
FROM Person p INNER JOIN
(SELECT p2.email, MIN(p2.id) as min_id
FROM Person p2
GROUP BY p2.email
) p2
ON p.email = p2.email and p.id > p2.min_id;
Why does your version work? Well, it works because of the fact that a join not only matches data but also filters data.
So, the condition
t1.Email = t2.Email and t1.Id > t2.Id
Says that for each record in t1 find matching records in t2 where t1.id > t2.id. That is, find records in t1 that have a matching record with a smaller id.
All records have this property -- except for one for each email. That would be the record with the smallest id.
I do not recommend this method for identifying the smallest record, because the join multiplies the number of records. If one email has five records, then there are up to four matches for one of the records. MySQL needs to figure out what to do when you say to delete a single record four times. (It does the right thing, of course, but there is extra work.)
The aggregation method doesn't have any issues like this.
You compare two identical tables and check all occurrences where emailaddress of both tables are identical.
if the id is the same, the row is ignored.
If the id is different and it must have an id that is bigger than the id of first occurrence , this row gets deleted.
Related
I've got two tables T1 and T2, both with a single field (id).
T1.id has values:
1
2
4
T2.id has values:
1
3
4
I need to join these tables.
Desired result:
T1 | T2
------|------
1 | 1
2 | null
null | 3
4 | 4
With JOIN I'd do it easily:
Query 1
SELECT * FROM T1 FULL JOIN T2 ON T1.id=T2.id
But due to certain reasons I can't use JOIN here. So, with a simple query like this
Query 2
SELECT * FROM T1, T2 WHERE T1.id=T2.id
I would get only two rows of data
T1 | T2
------|------
1 | 1
4 | 4
as two other rows would be omitted due to no matches in the other table.
No matter what to fill the missing matches with. It could be NULL or any other value - really anything, but I need to get those omitted rows.
Is there a way to modify Query 2 to get the desired result without using any JOIN?
PS: Real tables are different in structure, so UNION is not allowed either.
PPS: I've just given a model to point out the problem. In reality it's a "megaquery" involving many tables each having dozens of columns.
Standard way to implement FULL OUTER JOIN when only implicit joins are supported.
select t1.id t1id, t2.id t2id
from t1, t2 where t1.id = t2.id
union all
select id, null from t1
where not exists (select 1 from t2 where t2.id = t1.id)
union all
select null, id from t2
where not exists (select 1 from t1 where t1.id = t2.id)
order by coalesce(t1id, t2id)
The first SELECT produces the INNER JOIN part of the result.
The second SELECT adds the additional LEFT OUTER JOIN rows to the result.
The third SELECT adds the additional RIGHT OUTER JOIN rows to the result.
All together, a FULL OUTER JOIN is performed!
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ec154ad243efdff2162816205fdd42b5
SELECT t1.id t1_id, t2.id t2_id
FROM ( SELECT id FROM table1
UNION DISTINCT
SELECT id FROM table2 ) t0
NATURAL LEFT JOIN table1 t1
NATURAL LEFT JOIN table2 t2
I got this tables that look like this:
table 1
|id|value1|value2|value3
table 2
|id|value4|value5|value6
The id value in each table is unique but an id could appear in a table 1 but no in table 2. (value 1 is equal to value4 but if id dont appear in table 2 value4 would be null... )
Then I got this a of ids and I want to get sometime like (supossing that id appear in table 1 but no in 2 and vice versa):
resultgrid
| id | value1| value2| value3|value4|value5|value6
|838383|result1|result2|result3|null |null |null
|548438|null |null |null |result4|result5|result6
hope you guys can help me, thanks!
EDIT: query i've been trying (it's actually a set of collected pieces of answer i'd see in stack overflow)
SELECT t1.*, t2.value4, t2.value5, t2.value6
FROM table1 as t1
JOIN table2 AS t2 ON t2.id = t1.id
Where t1.id = t2.id = 838383
this get me 0 rows returned.
I want to make it general to use the <2000 id list.
You want a full outer join which MySQL does not support. In your case, you can emulate this with left join:
select t1.*, t2.value4, t2.value5, t2.value6
from (select 838383 as id
) i left join
table1 t1
on t1.id = i.id left join
table2 t2
on t2.id = i.id;
The list of ids you want to keep goes in the i subquery.
You can use two different Select queries, using Left join between the two tables. In the first query, consider table1 as leftmost table; and table2 as leftmost in the second query.
Use Where <right table id> IS NULL to filter out rows where there is no matching entry in the rightmost table.
Use Union to combine the resultset. Since there will not be any duplicates (due to our query results), we can use Union All.
Try the following:
SELECT t1.id, t1.value1, t1.value2, t1.value3,
t2.value4, t2.value5, t2.value6
FROM table1 AS t1
LEFT JOIN table2 AS t2 ON t2.id = t1.id
WHERE t2.id IS NULL
UNION ALL
SELECT t2.id, t1.value1, t1.value2, t1.value3,
t2.value4, t2.value5, t2.value6
FROM table2 AS t2
LEFT JOIN table1 AS t1 ON t1.id = t2.id
WHERE t1.id IS NULL
I have two tables:
Table 1 contains the User ID
Table 2 contains the user ID and other data I would like
The relationship is on the ID in both tables so what I would like to do is the following:
Pull all data from table 2 where a record exists in the id field in table 2 that matches an id in table 1.
Table 1 has other copies so to speak that are specific to other accounts while table 2 contains all the ids for all the other tables which is why (I think) I need a JOIN statement but I'm open to suggestions.
Table 1:
id
123456
Table 2:
id | name | age
123456 | John | 23
651123 | Mary | 22
811561 | Sarah | 21
You can use subquery as:
SELECT *
FROM table2
WHERE ID IN (SELECT ID
FROM table1)
If you need fields from table1 as well then use an inner join like:
SELECT t1.*, t2.name, t2.age
FROM table1 t1 INNER JOIN table2 t2
ON t1.id = t2.id
Your assumption is correct, you need to join:
Select * from table1 inner join table2 on table1.userid = table2.userid
The only question here is if you want to get only id's that appear on both tables (and than use inner join) or also get such that appear only on the first table as well (left join)
You should choose inner join here because table 2 always contain records for table 1.
You can use INNER JOIN here becuase you have relation between both table, query should be this:
SELECT t1.id, t2.name, t2.age
FROM table1 t1
INNER JOIN table2 t2 ON t1.id = t2.id
If I have two tables that I'm joining and I write the most simple query possible like this:
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
There are a few records who have multiple rows per ID because they have multiple employers, so t1 looks like this:
ID Name Employer
12345 Jerry Comedy Cellar
12345 Jerry NBC
12348 Elaine Pendant Publishing
12346 George Real Estate
12346 George Yankees
12346 George NBC
12347 Kramer Kramerica Industries
t2 is linked with the similar IDs but with some activities that I'd like to see -- hence the SELECT * above. Though I don't want multiple rows to return if the Employer column is "NBC" -- but everything else is good.
The only other thing that matters here is that t2 is smaller than t1, because t1 is everybody and t2 are only from people who did particular activities -- so some of the matches won't return anything from t2, but I would still like them to be returned, hence the LEFT JOIN.
If I write the query like this:
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
WHERE Employer <> "NBC"
Then it removes Jerry and George completely -- when really all I want is for the NBC row to not be returned, but to return any other rows that are associated with them.
How can I write the query while joining t1 with t2 to return each row except for the NBC ones? The ideal output would be all of the rows from t1 regardless if they match up with all of t2 except removing all of the rows with "NBC" as the employer in the return file. Basically the ideal here is to return the JOINs where they fit, but regardless remove the entire row for anybody with "NBC" as employer without removing their other rows.
The more I write about it, it seems like I should potentially just run a query prior to my JOIN to delete all the rows in t1 who have "NBC" as their employer and then run the normal query.
Basic subset filtering
You can filter either of the two merged (joined) subsets by extending the ON clause.
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer != 'NBC'
If you get null values now, and you don't want them, you'd add:
WHERE t2.Employer IS NOT NULL
extended logic:
SELECT *
FROM t1
LEFT JOIN t2
ON (t1.ID = t2.ID AND t2.Employer != 'NBC')
OR (t2.ID = t2.ID AND t2.Employer IS NULL)
Using UNION
Basically, JOIN is for horizontal linking and UNION does vertical linking of datasets.
It merges to resultsets: the first without NBC, and the second (which is basically an OUTER JOIN), adds everyone in t1 which is not part of t2.
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer != 'NBC'
UNION
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer IS NULL
String manipulation in the resultset
If you just want to remove NBC as a string, here is a workaround:
SELECT
t1.*,
IF (t2.Employer = 'NBC', NULL, t2.Employer) AS Employer
FROM t1
LEFT JOIN t2
ON t1.id = t2.id
This basically replaces "NBC" by NULL
I have to database tables, where entities of the first Table may or may not have associated entries in the second table:
Table 1 Table 2
+-----+-----+ +-----+-------+-------+
| ID | ... | | ID | T1_ID | NAME |
+-----+-----+ +-----+-------+-------+
| 1 | ... | | 1 | 1 | p1 |
| 2 | ... | | 2 | 1 | p2 |
| 3 | ... | | 3 | 2 | p1 |
| 4 | ... | +-----+-------+-------+
+-----+-----+
I have the following queries i need to run:
Get all entities of Table_1 with a specific entry of Table_2 - That's easy, a simple Join will do...
Get all entities of Table_1, which don't have a specific entry of Table_2 associated - not so easy, but i also managed to query this with a join.
Get all entities of Table_1, which have a specific entry (A) and don't have another specific entry (B) associated, i.e. get all entities of Table_1 that have an entity of Table_2 with name=p1 and don't have an entity of Table_2 with name=p2 associated.
Is it possible to accomplish the kind of query from (3) in a single sql-statement without a sub-query?
Get all entities of Table_1, which
have a specific entry (A) and don't
have another specific entry (B)
associated, i.e. get all entities of
Table_1 that have an entity of Table_2
with name=p1 and don't have an entity
of Table_2 with name=p2 associated.
I'm having a bit of trouble understanding your criteria, but I think that is what you want:
SELECT *
FROM Table1 t1
JOIN Table2 t2 ON t1.ID = t2.t1_id
WHERE t2.name = 'p1'
AND NOT EXISTS(SELECT 'x' FROM Table2 t2_2 WHERE t1.ID = t2_2.t1_id AND t2_2.name = 'p2')
That will give you everything from Table1 that has a matching record in Table2 with name = 'p1' and DOESN'T have a matching record in Table2 with name = 'p2'. Is that what you need?
EDIT AGAIN:
I thought of a smarter way to do this that involves a static (non-correlated) subquery. This subquery will only be executed one time, rather than being executed once for every parent row in Table1. I didn't put this code through a query analyzer, but it should be significantly faster than of the queries using EXISTS(...)
SELECT *
FROM Table1 t1
JOIN Table2 t2 ON t1.ID = t2.t1_id
WHERE t2.name = 'p1'
AND t1.id NOT IN(SELECT t1_id FROM Table2 WHERE name = 'p2')
You can use an EXISTS subquery (effectively the same as doing two joins).
SELECT * FROM Table_1 AS t1
WHERE EXISTS (SELECT * FROM Table_2 AS t2 WHERE t1.Id = t2.Id AND Name='p1')
AND NOT EXISTS (SELECT * FROM Table_2 AS t2 WHERE t1.Id = t2.Id AND Name='p2')
To get all occurrences where t2 matches t1.id but not some other field do
SELECT t1.id, t2.id FROM table2 t2
INNER JOIN table1 t1 ON (t2.t1_id = t1.id AND not(t2.fieldx <=> t1.fieldx))
Note that this will also exclude rows where both fieldx are null.
If you don't want that substitute the <=> with =.
To make the variation of solutions more complete:
SELECT t1.*
FROM Table_1 t1
INNER JOIN Table_2 it2 ON t1.ID = it2.T1_ID AND it2.NAME = 'p1'
LEFT JOIN Table_2 lt2 ON t1.ID = lt2.T1_ID AND lt2.NAME = 'p2'
WHERE lt2.ID IS NULL