I have mysql queries with a WHERE IN statement.
SELECT * FROM table1 WHERE id IN (1, 2, 15, 17, 150 ....)
How will it perform with hundreds of ids in the IN clause? is it designed to work with many arguments? (my table will have hundreds of thousands of rows and id is the primary field)
is there a better way to do it?
EDIT: I am getting the Ids from the result set of a search server query. So not from the database. I guess a join statement wouldn't work.
I am not sure how WHERE ... IN performes but for me it sounds like a JOIN or maybe a subselect would be the better choice here.
See also: MYSQL OR vs IN performance and http://www.slideshare.net/techdude/how-to-kill-mysql-performance
You should put the IN clause "arguments" into table2 for instance.
Afterwords you make this:
SELECT t1.* FROM table1 t1
INNER JOIN table2 t2 ON t1.Id = t2.Id
Related
Trying to understand this, but code efficiency increased more than 10x when I stopped using subquery. Table2 has about 5000 rows, while table1 is pretty huge, a few hundred thousand.
Original Statement
SELECT *
FROM table1
WHERE indexedCol IN (
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
)
So somehow this is way more efficient.
SELECT *
FROM table1
WHERE indexedCol IN
(*comma separated result of SELECT FROM table2*)
Is there something I am missing here? Or subquery is never a good idea.
The real issue is the sub-query correlated? What do I mean by that? If the sub-query references table1. If it doesn't then then answer is simple -- if you have two queries
SELECT *
FROM table1
and
SELECT indexedCol
FROM table2
WHERE iCol2 = "somevalue"
The time it take to run one of them is less than the time it takes to run both of them. This could be even worse (as suggested in the comments) if one of them is run for every row.
This query could be rewriten to use a join like this:
SELECT *
FROM TABLE1
JOIN TABLE2 on TABLE1.indexedCol = TABLE2.indexedCol and TABLE2.iCol2 = 'some value'
Which will probably solve your problem.
I am using mysql workbench and mysql server to query databse. I have two tables t1 and t2 with one column t1_name and t2_name. t2 has 3 million records and t1 has 1 million.
I need to select all t2_names where t2_names are not equal to t1_name or not substring of t1_name. When I try the query below:
SELECT DISTINCT `t2_name`
FROM `t2`, `t1`
`t2`.`t2_name` NOT LIKE CONCAT('%',`t1`.`t1_name`,'%'));
I get this error:
mysql Error Code: 1066. Not unique table/alias: 't2'
Can you explain and correct my query please? Previously I have made this post and tried this query:
SELECT DISTINCT `t2_name`
FROM `t2`
WHERE NOT EXISTS (SELECT * FROM `t1`
WHERE `t2_name` LIKE CONCAT('%',`t2_name`,'%'));
but it takes forever and never ends.
Start by qualifying all column names. Does this still cause an error?
SELECT DISTINCT t2.t2_name
FROM t2 JOIN
t1
ON t2.t2_name NOT LIKE CONCAT('%', t1.t1_name, '%');
If your issue is performance, the not exists is going to be better without the distinct:
SELECT t2_name
FROM t2
WHERE NOT EXISTS (SELECT 1
FROM t1
WHERE t2.t2_name LIKE CONCAT('%', t1.t1_name, '%')
);
However, this is not going to be much of an improvement. Unfortunately, like queries with such wildcards are highly inefficient. Often, you can structure the data model so you can write a more efficient query.
You're missing the WHERE keyword. The parser thinks t2 should be an alias for t1 as it follows t1. But t2 is already occupied by the previous t2.
Insert WHERE (and remove the last closing )):
SELECT DISTINCT `t2_name`
FROM `t2`, `t1`
WHERE `t2`.`t2_name` NOT LIKE CONCAT('%',`t1`.`t1_name`,'%');
Side note: I'm afraid your attempt with building the Cartesian product won't perform any any better than the NOT EXISTS. More likely it performs much, much worse...
I think you have mistyped the second where clause and it should say
SELECT DISTINCT `t2_name`
FROM `t2`
WHERE NOT EXISTS (SELECT * FROM `t1`
WHERE `t1_name` LIKE CONCAT('%',`t2_name`,'%'));
At the moment you are effectively comparing t2_name with itself.
It's going to be jolly slow anyway because mysql is going to do a table scan on that. Have a look at your data structure and content and see whether you might be better doing some data cleansing/restructuring before you start trying to use it for analysis.
When i execute this mysql query like
select * from t1 where colomn1 in (select colomn1 from t2) ,
what really happens?
I want to know if it executes the inner statement for every row?
PS: I have 300,000 rows in t1 and 50,000 rows in t2 and it is taking a hell of a time.
I'm flabbergasted to see that everyone points out to use JOIN as if it is the same thing. IT IS NOT!, not with the information given here. E.g. What if t2.column1 has doubles ?
=> Assuming there are no doubles in t2.column1, then yes, put a UNIQUE INDEX on said column and use a JOIN construction as it is more readable and easier to maintain. If it is going to be faster; that depends on what the query engine makes from it. In MSSQL the query-optimizer (probably) would consider them the same thing; maybe MySQL is 'not so eager' to recognize this... don't know.
=> Assuming there can be doubles in t2.column1, put a (non-unique) INDEX on said column and rewrite the WHERE IN (SELECT ..) into a WHERE EXISTS ( SELECT * FROM t2 WHERE t2.column1 = t1.column1). Again, mostly for readability and ease of maintenance; most likely the query engine will treat them the same...
The things to remember are
Always make sure you have proper indexing (but don't go overboard)
Always realize that what really happens will be an interpretation of your sql-code; not a 'direct translation'. You can write the same functionality in different ways to achieve the same goal. And some of these are indeed more resilient to different scenarios.
If you only have 10 rows, pretty much everything works. If you have 10M rows it could be worth examining the query plan... which most-likely will be different from the one with 10 rows.
A join would be quicker, viz:
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1
Try with INNER JOIN
SELECT t1.*
FROM t1
INNER JOIN t2 ON t1.column1=t2.column1
You should do indexing in column1 and then you can use inner join
for indexing
CREATE INDEX index1 ON t1 (col1);
CREATE INDEX index2 ON t2 (col2);
select t1.* from t1 INNER JOIN t2 on t1.colomn1=t2.colomn1
I have big DB. It's about 1 mln strings. I need to do something like this:
select * from t1 WHERE id1 NOT IN (SELECT id2 FROM t2)
But it works very slow. I know that I can do it using "JOIN" syntax, but I can't understand how.
Try this way:
select *
from t1
left join t2 on t1.id1 = t2.id
where t2.id is null
First of all you should optimize your indexes in both tables, and after that you should use join
There are different ways a dbms can deal with this task:
It can select id2 from t2 and then select all t1 where id1 is not in that set. You suggest this using the IN clause.
It can select record by record from t1 and look for each record if it finds a match in t2. You would suggest this using the EXISTS clause.
You can outer join the table then throw away all matches and stay with the non-matching entries. This may look like a bad way, especially when there are many matches, because you would get big intermediate data and then throw most of it away. However, depending on how the dbms works, it can be rather fast, for example when it applies hash join techniques.
It all depends on table sizes, number of matches, indexes, etc. and on what the dbms makes of your query. There are dbms that are able to completely re-write your query to find the best execution plan.
Having said all this, you can just try different things:
the IN clause with (SELECT DISTINCT id2 FROM t2). DISTINCT can reduce the intermediate result significantly and really speed up your query. (But maybe your dbms does that anyhow to get a good execution plan.)
use an EXISTS clause and see if that is faster
the outer join suggested by Parado
I have a query like:
select * from (select ... ) t1 join (select ... ) t2 on t1._ = t2._
where the join subselects are identical. Is there an easy way to name this select so that I can use it both times? I tried this:
select * from (select ... ) t1 join t1 t2 on t1._ = t2._
but it gave an error. Any ideas?
If the cost of acquiring the rows in your subselect is significant, you may consider storing the intermediate result in a temporary table and then reference that twice in your select.
But you better measure this, because it also costs to store the intermediate result...
Can you share your query? Maybe you don't need to reference it twice after all?
CREATE VIEW MyCommonSelect (Col1, Col2. . .) AS
SELECT Col1, Col2. . .
Depending on exactly what your query looks like, you may be able to name the subqueries internally, but something like this tends to indicate that the subquery represents database logic that (in my opinion — others disagree) deserves its own name.