Good evening,
I have two tables t1 and t2
In t1, I have two variables, ID (which uniquely identify each row) and DOC (which can be common to several IDs)
In t2, I have three variables, ID (which does not necessarily uniquely identify the rows here), AUTH , and TYPE. Each ID has a maximum of 1 distinct AUTH.
Sample data:
What I would like to do is to select the DOCs that have an ID with AUTH='EP', and that also have an ID with AUTH='US'. They could have additional IDs with other AUTH, but they have to have at least these two.
Thus, i would have a final table with the DOC, ID,and AUTH (there should be at least 2 IDs per doc, but it can be more if there exists an additional AUTH to US and EP for this DOC)
The desired results:
This should work:
SELECT DISTINCT (T1.ID), T1.DOC, T2.AUTH FROM T1
LEFT JOIN T2 ON T2.ID = T1.ID
WHERE T1.DOC IN( SELECT T1.DOC FROM T2
LEFT JOIN T1 ON T1.ID = T2.ID
WHERE T2.AUTH IN('EP','US')
GROUP BY T1.DOC HAVING COUNT(DISTINCT T2.AUTH) = 2) ;
If I could understand correctly the query is going to be something like that:
select t1.doc, t1.id, t2.auth from t1
left join t2 on t2.id = t1.id
where t1.doc in( select t1.doc from t2
left join t1 on t1.id = t2.id
where t2.auth in('EP','US') );
Although, the result set is basically going to be the first sample data table, due to the ID 6 which has a AUTH = "EP" and, consequently, the ID 7 which has the same DOC from ID 6.
Related
I have read here that MySQL processes ordering before applying limits. However, I receive different results when applying a LIMIT parameter in conjunction with a JOIN subquery. Here is my query:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / matches)
LIMIT 0, 50 -- PROBLEM IS HERE (I think)
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;
Here is a rough description of what I think is happening:
The sub-query selects a bunch of ids from table_one that meet the criteria
These are ordered by (t2.counts / matches)
The top 50 (in ascending order) are fashioned into a table
This resulting table is then joined on the the id column
Results are returned from the top level JOIN - without a GROUP BY clause this time. table_one is a reference table so this will return many rows with the same ID.
I appreciate that some of these joins don't make a lot of sense but I have stripped down my query for readability - it's normally quite chunky .
The problem is that when, I include the LIMIT parameter I get a different set of results and not just the top 50. What I want to do is get the top results from the subquery and use these to join onto a bunch of other tables based on the reference table.
Here is what I have tried so far:
LIMIT on the outer query (this is undesirable as this cuts off important information).
Trying different LIMIT tables and values.
Any idea what is going wrong, or what else I could try?
I have found a solution to my problem. It seems as if my matches column name does can't be used in my ORDER BY clause - which is weird since I don't get an error. Either way, this solves the problem:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / COUNT(DISTINCT t1.id)) -- This line is changed
LIMIT 0, 50
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;
If I have two tables that I'm joining and I write the most simple query possible like this:
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
There are a few records who have multiple rows per ID because they have multiple employers, so t1 looks like this:
ID Name Employer
12345 Jerry Comedy Cellar
12345 Jerry NBC
12348 Elaine Pendant Publishing
12346 George Real Estate
12346 George Yankees
12346 George NBC
12347 Kramer Kramerica Industries
t2 is linked with the similar IDs but with some activities that I'd like to see -- hence the SELECT * above. Though I don't want multiple rows to return if the Employer column is "NBC" -- but everything else is good.
The only other thing that matters here is that t2 is smaller than t1, because t1 is everybody and t2 are only from people who did particular activities -- so some of the matches won't return anything from t2, but I would still like them to be returned, hence the LEFT JOIN.
If I write the query like this:
SELECT *
FROM t1
LEFT JOIN t2 ON t1.id = t2.id
WHERE Employer <> "NBC"
Then it removes Jerry and George completely -- when really all I want is for the NBC row to not be returned, but to return any other rows that are associated with them.
How can I write the query while joining t1 with t2 to return each row except for the NBC ones? The ideal output would be all of the rows from t1 regardless if they match up with all of t2 except removing all of the rows with "NBC" as the employer in the return file. Basically the ideal here is to return the JOINs where they fit, but regardless remove the entire row for anybody with "NBC" as employer without removing their other rows.
The more I write about it, it seems like I should potentially just run a query prior to my JOIN to delete all the rows in t1 who have "NBC" as their employer and then run the normal query.
Basic subset filtering
You can filter either of the two merged (joined) subsets by extending the ON clause.
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer != 'NBC'
If you get null values now, and you don't want them, you'd add:
WHERE t2.Employer IS NOT NULL
extended logic:
SELECT *
FROM t1
LEFT JOIN t2
ON (t1.ID = t2.ID AND t2.Employer != 'NBC')
OR (t2.ID = t2.ID AND t2.Employer IS NULL)
Using UNION
Basically, JOIN is for horizontal linking and UNION does vertical linking of datasets.
It merges to resultsets: the first without NBC, and the second (which is basically an OUTER JOIN), adds everyone in t1 which is not part of t2.
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer != 'NBC'
UNION
SELECT *
FROM t1
LEFT JOIN t2
ON t1.ID = t2.ID
AND t2.Employer IS NULL
String manipulation in the resultset
If you just want to remove NBC as a string, here is a workaround:
SELECT
t1.*,
IF (t2.Employer = 'NBC', NULL, t2.Employer) AS Employer
FROM t1
LEFT JOIN t2
ON t1.id = t2.id
This basically replaces "NBC" by NULL
I have three tables, two certainly with data values, one the values can be present or not.
This is an example schema
Table 1
id, username
Table 2
id, street
Table 3
id, phone_number (this can be not present)
please help me with query
SELECT t1.ID, t1.Username, t2.street, t3.phone_number
FROM Table1 as t1
INNER JOIN Table2 as t2 on t1.id = t2.id
LEFT OUTER JOIN Table3 as t3 on t1.id = t3.id
First a quick explanation: I am actually dealing with four tables and mining data from different places but my problem comes down to this seemingly simple concept and yes I am very new to this...
I have two tables (one and two) that both have ID columns in them. I want to query only the ID columns that are in table two only, not in both. As in..
Select ID
From dbo.one, dbo.two
Where dbo.two != dbo.one
I actually thought this would work but I'm getting odd results. Can anyone help?
SELECT t2.ID
FROM dbo.two t2
WHERE NOT EXISTS(SELECT NULL
FROM dbo.one t1
WHERE t2.ID = t1.ID)
This could also be done with a LEFT JOIN:
SELECT t2.ID
FROM dbo.two t2
LEFT JOIN dbo.one t1
ON t2.ID = t1.ID
WHERE t1.ID IS NULL
Completing the other 2 options after Joe's answer...
SELECT id
FROM dbo.two
EXCEPT
SELECT id
FROM dbo.one
SELECT t2.ID
FROM dbo.two t2
WHERE t2.ID NOT IN (SELECT t1.ID FROM dbo.one t1)
Note: LEFT JOIN will be slower than the other three, which should all give the same plan.
That's because LEFT JOIN is a join followed by a filter, the other 3 are semi-join
I have the following two tables:
Table1 {T1ID, Name}
Table2 {T2ID, T1ID, Date, Value}
Date is of type DATE.
and I am looking for a SQL query to fetch only the latest value (by Date) for each T1ID for which the Name matches a specific string.
SELECT`Table2`.`T1ID`,
`Table2`.`Value`,
`Table2`.`Date`,
`Table1`.`Name`,
FROM `Table1`
INNER JOIN `Table2` ON `Table2`.`T1ID` = `Table1`.`T1ID`
WHERE `Table1`.`Name` LIKE 'Smith'
but this returns the value for several dates for the same T1ID.
How do I get only the latest value by Date?
Edit:
I am using MySQL 5.5.8
If I've understodd the question correctly:
Assuming MySQL:
SELECT`Table2`.`T1ID`,
`Table2`.`Value`,
`Table2`.`Date`,
`Table1`.`Name`
FROM `Table1`
INNER JOIN `Table2` ON `Table2`.`T1ID` = `Table1`.`ID`,
(SELECT T1ID, MAX(Date) AS 'Date' FROM Table2 GROUP BY T1ID) Table3
WHERE
`Table3`.`T1ID` = `Table2`.`T1ID`
AND
`Table3`.`Date` = `Table2`.`Date`
AND
`Table1`.`Name` LIKE 'Smith'
EDIT: Updated the code to bring back the correct result set. Removed MSSQL answer as it wasn't relevant
You have two options.
select t1.t1id, max(t1.Name) Name, max(t2.date) Date,
(select Value from table2 t22
where t22.date = max(t2.date) and t22.t1id = t2.t1id) Value
from table1 t1 left join table2 t2 on t1.t1id = t2.t1id
where Name like '%Smith%'
group by t2.t1id order by 2
OR
select mx.t1id, mx.Name, mx.Date, t2.Value
from
(
select t1.t1id, max(t1.Name) Name, max(t2.date) Date
from table1 t1 left join table2 t2 on t1.t1id = t2.t1id
where Name like '%Smith%'
group by t2.t1id
) mx left join table2 t2 on (t2.t1id = mx.t1id and t2.date = mx.date)
order by 2
Both will produce the same result. The first one takes less code but you might have performance issues with a huge set of data. The second one takes a little more code, but it is also a little more optimized. Notes on the JOIN option:
If you go LEFT JOIN (as the example shows), items in Table1 with no correspondent records on Table2 will be displayed in the result, but the values in columns Date and Value will be NULL
If you go INNER JOIN, items in Table1 with no correspondent records on Table2 will not be displayed.
EDIT
I missed one of the requirements, which was the Name matching a specific string. The code is now updated. The '%' acts like a wildcard, so it will match names like 'Will Smith' and 'Wail Smithers'. If you want a exact match, remove the wildcards ('%').
Add this to your SQL:
ORDER BY 'Date' DESC LIMIT 1