Check if table a primary key is exist in table b - mysql

Table A:
ID, Name, etc.
Table B:
ID, TableA-ID.
SELECT * FROM A;
and I want to return a boolean value in the same result for this condition ( if A.ID Exists in Table B).

There are several ways of achieving what you need. Below are three possibilities. These all differ in execution plans and how database actually wants to execute them so depending on your record count one may be more efficient than the other. It's better if you see it for yourself.
1) Use LEFT JOIN and check if a non-null field from B is not null to ensure the record exists. Then apply DISTINCT clause if relationship is 1:N to only show rows from A without duplicates.
select distinct a.*, b.id is not null as exists_b
from a
left join b on
a.id = b.tablea-id
2) Use exists() function, which will be evaluated for each row being returned from table A.
select a.*, exists(select 1 from b where a.id = b.tablea-id) as exists_b
from a
3) Use a combination of subquery expression EXISTS and it's contradiction in two queries to check if a record has or has not a match within table B. Then UNION ALL to combine both results into one.
select *, true as exists_b
from a
where exists (
select 1
from b
where a.id = b.tablea-id
)
union all
select *, false as exists_b
from a
where not exists (
select 1
from b
where a.id = b.tablea-id
)

select A.*, IFNULL((select 1 from B where B.TableA-ID = A.ID limit 1),0) as `exists` from A;
The above statement will result in a 1, if the key exists, and a 0 if that key does not exist. Limit 1 is important if there are multiple records in B

Related

Hive SQL: How to create flag occurrence while join with other table

I want to check whether my member from table A present in table B or not? Here is the problem is Both Table A and Table B has millions of records and table B have duplicate records. So that i cann't do left join. it takes hours to run.
Table A
Table B
Output
use this :
select member,
case when EXISTS (select 1 from TableB where TableB.member = tableA.member) then 1 else 0 end as Flag
from tableA
Not a very good solution but you can try this.
So, we use not in or not exists to get one set of data and then use in or exists to get another set. And then union them all together to get complete set.
select
a.* , 0 flag
from tableA a where member not in ( select member from tableB)
union all
select
a.* , 1 flag
from tableA a where member in ( select member from tableB)
The trick may be, you can run 2 separate SQL for this and will get perf benefit instead of union all.
Not exist will work same way but can give you better performance.
SELECT a.*, 0 flag
FROM tableA a
WHERE NOT EXISTS(
SELECT 1 FROM tableB b WHERE (a.member=b.member))
union all
SELECT a.*, 1 flag
FROM tableA a
WHERE EXISTS(
SELECT 1 FROM tableB b WHERE (a.member=b.member))

How to filter out duplicates from a union query?

I have two similar SELECT queries that retrieve data from the same table "my_table".
-- 1st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON u = v
JOIN table3 ON x = y
UNION ALL
-- 2st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u
Duplicates are to be filtered out under the following conditions:
If the second select returns an id that is already present in the 1st select, it should be discarded.
Is there an easy solution without using a common table expression?
Note: The SQL does not have to be a UNION and can also be changed.
UNION filters out duplicate rows by default. UNION ALL does not remove duplicates.
But the duplicates are based on all columns being identical, not just the id column. If a given id value occurs in both queries, but any of the other two columns are different, then it counts as a distinct row.
If you want to reduce the result to a single row per id, the use a GROUP BY:
SELECT id, ...aggregate expressions...
FROM (
SELECT my_table.id, a, b ...
UNION
SELECT my_table.id, a, b ...
) AS t
GROUP BY id;
When you GROUP BY id, then any other expressions of the outer select-list must be in aggregate functions like MAX() or SUM(), etc.
The reason it is important to use an aggregate function is that when there are multiple rows with the same id value which you want to reduce to one row, what value should be displayed for a and b?
Example:
id
a
b
4
12
24
4
18
28
If you group by id, you would get one row for id=4, but what value for the other two columns?
id
a
b
4
?
?
Read https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html for more details on this. Or my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
You must use an aggregate function, which includes GROUP_CONCAT() to append all the values from that column in a comma-separated list. Or you can use ANY_VALUE() which picks one of the values from that column arbitrarily.
I think this should do it:
-- 1st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON u = v
JOIN table3 ON x = y
WHERE id NOT IN (
SELECT
my_table.id,
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u
)
UNION ALL
-- 2st select
SELECT
my_table.id,
a,
b
FROM my_table
JOIN table2 ON r = s
JOIN table3 ON t = u

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.

Find values missing in a column from a set (mysql)

I am using mysql.
I have a table that has a column id.
Let us say I have an input set of ids. I want to know which all ids are missing in the table.
If the set is "ida", "idb", "idc" and the table only contains "idb", then the returned value should be "ida", "idc".
Is this possible with a single sql query? If not, what is the most efficient way to execute this.
Note that I am not allowed to use stored procedure.
MySQL will only return rows that exist. To return missing rows you must have two tables.
The first table can be temporary (session/connection specific) so that multiple instances can run simultaneously.
create temporary table tmpMustExist (text id);
insert into tmpMustExist select "ida";
insert into tmpMustExist select "idb";
-- etc
select a.id from tmpMustExist as a
left join table b on b.id=a.id
where b.id is null; -- returns results from a table that are missing from b table.
Is this possible with a single sql query?
Well, yes it is. Let me work my way to that, first with a union all to combine the select statements.
create temporary table tmpMustExist (text id);
insert into tmpMustExist select "ida" union all select "idb" union all select "etc...";
select a.id from tmpMustExist as a left join table as b on b.id=a.id where b.id is null;
Note that I use union all which is a bit faster than union because it skips over deduplication.
You can use create table...select. I do this frequently and really like it. (It is a great way to copy a table as well, but it will drop indexes.)
create temporary table tmpMustExist as select "ida" union all select "idb" union all select "etc...";
select a.id from tmpMustExist as a left join table as b on b.id=a.id where b.id is null;
And finally you can use what's called a "derived" table to bring the whole thing into a single, portable select statement.
select a.id from (select "ida" union all select "idb" union all select "etc...") as a left join table as b on b.id=a.id where b.id is null;
Note: the as keyword is optional, but clarifies what I'm doing with a and b. I'm simply creating short names to be used in the join and select field lists
There's a trick. You can either create a table with expected values or you can use union of multiple select for each value.
Then you need to find all the values that are in the etalon, but not in the tested table.
CREATE TABLE IF NOT EXISTS `single` (
`id` varchar(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `single` (`id`) VALUES
('idb');
SELECT a.id FROM (
SELECT 'ida' as id
UNION
SELECT 'idb' as id
UNION
SELECT 'idc' AS id
) a WHERE a.id NOT IN (SELECT id FROM single)
//you can pass each set string to query
//pro-grammatically you can put quoted string
//columns must be utf8 collation
select * from
(SELECT 'ida' as col
union
SELECT 'idb' as col
union
SELECT 'idc' as col ) as setresult where col not in (SELECT value FROM `tbl`)

Eliminating duplicates from SQL query

What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.