What's the most efficient way to select rows that must satisfy two conditions met in the same column?
name | title
------------------
John | Associate
John | Manager
Fran | Manager
Fran | President
I'd like to do something like
select name
from table
where title = 'Associate'
and name in ( select *
from table
where title = 'Manager')
which should return
John
but that seems woefully inefficient, especially if the table itself is super big. You could also do
select name
from table a,
table b
where a.title = 'Associate'
and b.title = 'Manager'
and a.name = b.name
Is that the best one can do?
Your first query is not syntactically correct. It should be:
select name
from table
where title = 'Associate' and
name in (select name from table where title = 'Manager');
The second is better written as a join:
select name
from table a join
table b
on a.title = 'Associate' and b.title = 'Manager' and a.name = b.name;
The second is probably better in terms of taking advantage of indexes on the table.
You can also do this with a group by:
select name
from table a
group by name
having sum(title = 'Associate') > 0 and sum(title = 'Manager') > 0;
MySQL is not very good at optimizing group by. But if there are no indexes on the table, it might be faster than the join methods.
I would have an index on your table via ( title, name ), then do a self-join. In this case, I am putting what would be less probable as the outer primary condition of the query where the manager records are considered first... ie. a company may have 5 managers and 100 associates vs looking for 100 associates that match the 5 managers.
select
t.name
from
table t
join table t2
on t2.title = 'Associate'
AND t.name = t2.name
where
t.title = 'Manager'
There's not a whole lot of data given as an example, but I'm assuming both John's here are the same person with multiple titles? If that were the case you would be better off having your titles being a child table of the employees table (if that's what this table is)
So instead you could have:
Employee
----
id
name
titles
----
id
titleName
employeeTitles
----
employeeId
titleId
If you can't do it this way, i would think another way to write your original query would be:
select name
from table t1
inner join (
select distinct name
from table
where title = 'manager'
) t2 on t1.name = t2.name
where title = 'Associate'
could also do group by name rather than distinct. But still, doing the above solution i think would be better all around (assuming my own assumptions are correct about your data)
Using WHERE EXISTS (or NOT EXISTS) is a very effective alternative for this
select
name
from table1
where title = 'Associate'
and exists (
select 1 /* could be: select NULL as nothing actually needs to be "returned */
from table1 as t2
where t2.title = 'Manager'
and t2.name = table1.name /* correlated here */
)
;
Similar to using IN() it requires a subquery but "correlates" that subquery. However this subquery does NOT need to move any data (IN can require this).
Also, similar to IN() using EXISTS has no impact on the number of result rows. Joins can create unwanted row repetition (but of course sometimes the extra rows are needed).
This reference is for SQL Server, but it compares several relevant methods (ignore Outer Apply - mssql specific), including potential issues dealing with NULLs when using IN() that do not affect EXISTS
| NOT | EXISTS () should be one of the first methods to consider.
It depends on MySQL Version (MySQL 5.6 has query rewrite feature which improves IN() subquery) and Table Relationships.
There are at least 3 ways to get result you're expecting. In my experience, INNE JOIN is faster than others in general cases.
Try your self with your data.
IN () - you've wrote firstly.
please note that in MySQL. IN() produces dependent sub-query plan.
SELECT DISTINCT name
FROM table
WHERE title = 'Associate'
AND name IN (SELECT name FROM table WHERE title = 'Manager')
SELF JOIN - your 2nd
SELECT DISTINCT name
FROM table t1 INNER JOIN table t2
WHERE a.title = 'Associate' AND b.title = 'Manager'
AND t1.name = t2.name
EXISTS - Semi JOIN
EXISTS is fast when tables have 1:n relationship. This requires no DISTINCT, GROUP BY.
SELECT name
FROM table t1
WHERE a.title = 'Associate'
AND EXISTS (SELECT 1 FROM table t2
WHERE t2.name = t1.name AND t2.title = 'Manager')
Related
What would be the best way to return one item from each id instead of all of the other items within the table. Currently the query below returns all manufacturers
SELECT m.name
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
I have solved my question by using the DISTINCT value in my query:
SELECT DISTINCT m.name, m.id
FROM `default_ps_products` p
INNER JOIN `default_ps_products_manufacturers` m ON p.manufacturer_id = m.id
ORDER BY m.name
there are 4 main ways I can think of to delete duplicate rows
method 1
delete all rows bigger than smallest or less than greatest rowid value. Example
delete from tableName a where rowid> (select min(rowid) from tableName b where a.key=b.key and a.key2=b.key2)
method 2
usually faster but you must recreate all indexes, constraints and triggers afterward..
pull all as distinct to new table then drop 1st table and rename new table to old table name
example.
create table t1 as select distinct * from t2; drop table t1; rename t2 to t1;
method 3
delete uing where exists based on rowid. example
delete from tableName a where exists(select 'x' from tableName b where a.key1=b.key1 and a.key2=b.key2 and b.rowid >a.rowid) Note if nulls are on column use nvl on column name.
method 4
collect first row for each key value and delete rows not in this set. Example
delete from tableName a where rowid not in(select min(rowid) from tableName b group by key1, key2)
note that you don't have to use nvl for method 4
Using DISTINCT often is a bad practice. It may be a sing that there is something wrong with your SELECT statement, or your data structure is not normalized.
In your case I would use this (in assumption that default_ps_products_manufacturers has unique records).
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE EXISTS (SELECT 1 FROM default_ps_products p WHERE p.manufacturer_id = m.id)
Or an equivalent query with IN:
SELECT m.id, m.name
FROM default_ps_products_manufacturers m
WHERE m.id IN (SELECT p.manufacturer_id FROM default_ps_products p)
The only thing - between all possible queries it is better to select the one with the better execution plan. Which may depend on your vendor and/or physical structure, statistics, etc... of your data base.
I think in most cases EXISTS will work better.
I have a situation where in i need to pull data from one table but exclude some rows based on the rows in another table. I mean that i need to pull studentid(s) from one table but exclude those studentid(s) which are there in another table.
first query :
$sql = "select studentid from table 2 where iarsid = '12'";
as i'll get an array result from this query i want to use this result and put it in NOT conditions in the next query simply excluding these very rows from the result from this another query.
Second query:
$sql2 = "select studentid from table 2, table 3 where iarsid = '12' // and a lot of joins";
Basically the students who are in the first table are not needed while pulling out students based on the second query.
If i am using the wrong logic, please guide so as to achieve this.
You can do the general idea at least 3 ways, using a LEFT JOIN, and also using NOT IN and NOT EXISTS.
Via LEFT JOINS.
SELECT student_name
FROM table_A a
LEFT JOIN table_B b ON a.student_id = b.student_id
WHERE b.student_id IS NULL
This gets all student information in table_A, where the student is not in table_B.
and here it is via NOT EXISTS:
SELECT student_name
FROM table_A a
WHERE NOT EXISTS (SELECT student_id FROM table_B b WHERE b.student_id = a.student_id)
and via NOT IN
SELECT student_name
FROM table_A a
WHERE a.student_id NOT IN (SELECT student_id FROM table_B b)
Do you mean second query that use the first query as a condition with NOT?
"select studentid from table 2, table 3 where iarsid = '12' // and a lot of joins"
+ " WHERE studentid NOT IN (select studentid from table 2 where iarsid = '12')"
I can see that you have accepted an answer. But you can also do this. The best way to check which query is fast by checking your Explain Plan.
SELECT student_name
FROM table_A a
WHERE a.student_id NOT EXISTS (SELECT student_id FROM table_B b)
Since this is an un-correalted query using exists, this will be fater for a larger table. And IN will be faster for a small table. The reason it's faster the moment it finds no match, it will return a false instead IN will do a full table scan.
Also this one:
SELECT student_name
FROM table_A a
WHERE NOT EXISTS (SELECT null
FROM table_B b
WHERE a.studentid = b.studentid);
I'm doing an order on all fields of a table (depending on the user's choice), however one column of that table contains a category (stored as an abbreviation) and those abbreviations are defined in a second table. How can I sort by category name? Example of the table structures below:
Table 1
title | amount | category_abbreviation
Table 2 -> for category
category_name | category_abbreviation
Just join the tables and order on a field from the second table.
SELECT * from table1
INNER JOIN table2
ON table1.category_abbreviation = table2.category_abbreviation
ORDER BY table2.category_name
Create a view and run your query from that:
SELECT
Table1.title, Table1.amount, Table1.category_abbreviation, Table2.category_name
FROM
Table1
INNER JOIN Table2 ON Table1.category_abbreviation = Table2.category_abbreviation
and use that as your datasource. Or just use the SQL as your datasource, however you are doing it.
You don't have to select the Table2.category_name though if you didn't want to
try
SELECT t1.*
FROM Table_1 T1
JOIN Table_2 T2
ON T1.category_abbreviation=T2.category_abbreviation
ORDER BY T2.category_name
Lets say I have 2 tables. The first table, table_1, contains each posted content including
table_1
title,
author name,
email,
city_name, etc.
The second table provides a lookup for table_1. It has 2 columns,
table_2
city_id and
city_name.
For instance, city_id =1 corresponds to New York, city_id =2 corresponds to Chicago... and so on. Under the 'city' column in table1, the city_id is listed which can easily be joined with table 2, producing a readable city name.
Would the following statement be as efficient as using a WHERE with city_id? Reason being is that I would be filtering results based on a city which is a string and I don't want (or need?) to correlate each input to its matching ID number in table2.
SELECT table1.city, table2.city_name
FROM table1
WHERE table2.city_name=(input city name)
JOIN table2.city_name ON table2.city_id = table1.city
Because the join is an inner join the following should lead to equivalent execution plans. (That is, they should exhibit the same performance characteristics -- write the SQL clearly and let the SQL engine do the dirty optimization work.)
As presented in the other answers:
SELECT table1.*, table2.city_name
FROM table1
JOIN table2 ON table1.city_id = table2.city_id
WHERE table2.city_name = (city_input);
And, as what I believe is the "optimized form" presented in the question:
SELECT table1.*, t2.city_name
FROM table1
JOIN (SELECT * FROM table2
WHERE table2.city_name = (city_input)) AS t2
ON table1.city_id = t2.city_id
This is because of the Relation Algebra Model that SQL follows; at least under RA the equality ("selection") here can be moved across the join (a "natural join" in RA) while keeping the same semantics. Of course, "to make for certain", run a query analysis. The most basic form is using EXPLAIN.
Happy coding.
I'm not exactly following what the question is, but I'll say that the proper way to handle a query in which you need to filter on a specific city name, rather than ID, would be like this:
SELECT table1.*, table2.city_name
FROM table1
JOIN table2 ON table1.city_id = table2.city_id
WHERE table2.city_name = (city_input);
Problem is as follows. I have a product that can be in one of three categories (defined by category_id). Each category table has category_id field related to category_id in product table. So I have 3 cases. I'm checking If my product.category_id is in table one. If yes, I take some values. If not I check in tables that are left. What can I write In the ELSE section? Can anyone correct my query ?
CASE
WHEN IF EXISTS(SELECT * FROM table1 WHERE category_id='category_id') THEN SELECT type_id FROM table1 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table2 WHERE category_id='category_id') THEN SELECT value_id FROM table2 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table3 WHERE category_id='category_id') THEN SELECT group_id FROM table3 WHERE category_id='category_id';
ELSE "dont know what here";
END;
In the else you would put whatever you want as default value, for example null.
I think that it would be much more efficient to make three left joins instead of several subqueries for each product in the result, and use coalesce to get the first existing value. Example:
select coalesce(t1.type_id, t2.value_id, t3.group_id)
from product p
left join table1 t1 on t1.category_id = p.category_id
left join table2 t2 on t2.category_id = p.category_id
left join table3 t3 on t3.category_id = p.category_id
example
SELECT CompanyName,
Fax,
CASE WHEN IF(Fax='', 'No Fax', 'Got Fax')='No Fax' THEN NULL
ELSE IF(Fax='', 'No Fax', 'Got Fax')
END AS Note
FROM Customers;
You can possibly include this...
SELECT "Unknown type" FROM table1;
You do not need to use ELSE if there is nothing left to do.
or something like this
CASE
WHEN IF EXISTS(SELECT * FROM table1 WHERE category_id='category_id') THEN SELECT type_id FROM table1 WHERE category_id='category_id';
WHEN IF EXISTS(SELECT * FROM table2 WHERE category_id='category_id') THEN SELECT value_id FROM table2 WHERE category_id='category_id';
ELSE SELECT group_id FROM table3 WHERE category_id='category_id';
In addition to Guffa's answer here is another approach - assuming #category_id is
SET #category_id = 'some_category_id_value'
then
SELECT t1.type_id
WHERE category_id = #category_id
UNION ALL
SELECT t2.value_id
WHERE category_id = #category_id
UNION ALL
SELECT t3.group_id
WHERE category_id = #category_id
should return what you ask for (and performance is not bad either).
If you have certain category_id in more then one table you will get multiple records (you can get out of that by limiting the number of results to 1; you might need to make it the whole union a subquery and order it, but not sure, consult the docs)
However, your question looks like you have a problem with a design of your tables
why do you keep three category tables and not one?
what is the relationship between type_id, value_id and group_id and why does it make sense to select them as if they were the same thing (what is the meaning/semantics of each table/column)?
how do you guarantee that you don't have entries in multiple tables that correspond to one product (and implement other business rules that you might have)?
These questions could have valid answers, but you should know them :)