What is the best way to query a database with a large set of up to 500 comparisons per row.
SELECT * FROM table WHERE column = x OR column = y OR column = z OR ...(x500)
I estimate that the table could grow to up to thousands of entries in the short term.
Thanks
Use WHERE column IN(x,y,z...)
If you are doing a few... say less than 20 or so, you could use an IN (....) clause. However, if you are doing something that will normally span 100's or 1000's on a regular basis, I would use a temp table of just "columnX" values and insert into that all the possible values... then query a join using this temp table as basis to the other...
select YT.*
from
JustValuesTable JVT
Join YourTable YT
ON JVT.ColumnX = YT.ColumnX
rest of query...
maybe you can use the in clause ..
select * from table where column in ('asdf', 'aqwer' .....)
additional, you may want to create a view, containing your allowed values, and then
select * from table where column in (select your_field_name from your_view)
You may be better off using an IN query:
SELECT * FROM table WHERE column IN (x, y, z)
That would make it a little more readable at least and I think should improve performance.
How about using the IN function?
Related
I am completely new to database coding, and I've tried Googling but cannot seem to figure this out. I imagine there's a simple solution. I have a very large table with MemberIDs and a few other relevant variables that I want to pull records from (table1). I also have a second large table of distinct MemberIDs (table2). I want to pull rows from table 1 where the MemberID exists in table2.
Here’s how I tried to do it, and for some reason I suspect this isn’t working correctly, or there may be a much better way to do this.
proc sql;
create table tablewant as select
MemberID, var1, var2, var3
from table1
where exists (select MemberID from table2)
;
quit;
Is there anything wrong with the way I’m doing this? What's the best way to solve this when working with extremely large tables (over 100 million records)? Would doing some sort of join be better? Also, do I need to change
where exists (select MemberID from table2)
to
where exists (select MemberID from table2 where table1.MemberID = table2.MemberID)
?
You want to implement a "semi-join". You second solution is correct:
select MemberID, var1, var2, var3
from table1
where exists (
select 1 from table2 where table1.MemberID = table2.MemberID
)
Notes:
There's no need to select anything special in the subquery since it's not checking for values, but for row existence instead. For example, 1 will do, as well as *, or even null. I tend to use 1 for clarity.
The query needs to access table2 and this should be optimized specially for such large tables. You should consider adding the index below, if you haven't created it already:
create index ix1 on table2 (MemberID);
The query does not have a filtering criteria. That means that the engine will read 100 million rows and will check each one of them for the matching rows in the secondary table. This will unavoidably take a long time. Are you sure you want to read them all? Maybe you need to add a filtering condition, but I don't know your requirements in this respect.
select *
from AllUK
where exists (select * from AllCompanies where replace(AllUK.mobile,' ','')=replace(AllCompanies.mobile,' ',''))
I need to include the columns from the AllCompanies table in to my first select. How can I do that?
select *
from AllUK a
join AllCompanies b
on a.mobile = b.mobile
exists is a boolean operation, so the clause you have above will always return all the results if there any records that can be joined accross the 2 tables. It's hard to tell what you're really trying to achieve.
Also, putting string operations on columns within exists and joins is not best practice because the compiler has to do the operation on every row & column at run time. Might be better to create a temp table to hold the replaced values and then join on that.
Avoid using IN(...) when selecting on indexed fields, It will kill the performance of SELECT query.
I found this here: https://wikis.oracle.com/pages/viewpage.action?pageId=27263381
Can you explain it? Why that will kill performance? And what should I use instead of IN. "OR" statement maybe?
To tell the truth, that statement contradicts to many hints that I have read in books and articles on MySQL.
Here is an example: http://www.mysqlperformanceblog.com/2010/01/09/getting-around-optimizer-limitations-with-an-in-list/
Moreover, expr IN(value, ...) itself has additional enhancements for dealing with large value lists, since it is supposed to be used as a useful alternative to certain range queries:
If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.
Still overusing INs may result in slow queries. Some cases are noted in the article.
Because MySQL can't optimize it.
Here is an example:
explain select * from keywordmaster where id in (1, 567899);
plan (sorry for external link. Doesn't show correctly here)
here is another query:
explain
select * from table where id = 1
union
select * from keywordmaster where id = 567899
plan
As you can see in the second query we get ref as const and type is const instead of range. MySQL can't optimize range scans.
Prior to MySQL 5.0 it seems that mySQL would only use a single index for a table. So, if you had a SELECT * FROM tbl WHERE (a = 6 OR b = 33) it could chooose to use either the a index or the b index, but not both. Note that it says fields, plural. I suspect the advice comes from that time and the work-around was to union the OR results, like so:
SELECT * FROM tbl WHERE (a = 6)
UNION
SELECT * FROM tbl WHERE (b = 33)
I believe IN is treated the same as a group of ORs, so using ORs won't help.
An alternative is to create a temporary table to hold the values of your IN-clause and then join with that temporary table in your SELECT.
For example:
CREATE TEMPORARY TABLE temp_table (v VARCHAR)
INSERT INTO temp_table VALUES ('foo')
INSERT INTO temp_table VALUES ('bar')
SELECT * FROM temp_table tmp, orig_table orig
WHERE temp_table.v = orig.value
DROP TEMPORARY TABLE temp_table
I am running a complicated and costly query to find the MIN() values of a function grouped by another attribute. But I don't just need the value, I need the entry that produces it + the value.
My current pseudoquery goes something like this:
SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att;
but I want a.* and MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) as my result.
The only way I can think of is using this ugly beast:
SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2)
FROM (prefiltering) as a1
WHERE COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) =
(SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att)
But now I am executing the prefiltering_query 2 times and have to run the costly function twice. This is ridiculous and I hope that I am doing something seriously wrong here.
Possible solution?:
Just now I realize that I could create a temporary table containing:
(SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) as complex FROM (prefiltering) as a1)
and then run the MIN() as subquery and compare it at greatly reduced cost. Is that the way to go?
A problem with your temporary table solution is that I can't see any way to avoid using it twice in the same query.
However, if you're willing to use an actual permanent table (perhaps with ENGINE = MEMORY), it should work.
You can also move the subquery into the FROM clause, where it might be more efficient:
CREATE TABLE temptable ENGINE = MEMORY
SELECT a1.*,
COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) AS complex
FROM prefiltering AS a1;
CREATE INDEX group_att_complex USING BTREE
ON temptable (group_att, complex);
SELECT a2.*
FROM temptable AS a2
NATURAL JOIN (
SELECT group_att, MIN(complex) AS complex
FROM temptable GROUP BY group_att
) AS a3;
DROP TABLE temptable;
(You can try it without the index too, but I suspect it'll be faster with it.)
Edit: Of course, if one temporary table won't do, you could always use two:
CREATE TEMPORARY TABLE temp1
SELECT *, COSTLY_FUNCTION(att1,att2,$v1,$v2) AS complex
FROM prefiltering;
CREATE INDEX group_att_complex ON temp1 (group_att, complex);
CREATE TEMPORARY TABLE temp2
SELECT group_att, MIN(complex) AS complex
FROM temp1 GROUP BY group_att;
SELECT temp1.* FROM temp1 NATURAL JOIN temp2;
(Again, you may want to try it with or without the index; when I ran EXPLAIN on it, MySQL didn't seem to want to use the index for the final query at all, although that might be just because my test data set was so small. Anyway, here's a link to SQLize if you want to play with it; I used CONCAT() to stand in for your expensive function.)
You can use the HAVING clause to get columns in addition to that MIN value. For example:
SELECT a.*, COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2) FROM (prefiltering) as a GROUP BY a.group_att HAVING MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) = COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2);
How can I get the minimum values! (plural) from a table without using a subquery? The table contains following data (sorry four the mouse):
As you can see, I always want to select the minimum values. If there are the same values (table 2 & 3) the query shall give all rows, because there is no minimum. I'm using MySQL. I don't want to use a subquery if possible because of performance reasons. A min(value) and group by id doesn't work either, because of the unique ids.
Thanks in advance
ninsky
As far as I know, this cannot be done without a subquery in MySQL. For example:
select *
from YourTable
where value =
(
select min(value)
from YourTable
)
if you do not trust MySQL in performance you can split query proposed by Andomar to 2 atomic subquery