Suppose I have a table A with a primary key
PRIMARY KEY(c1,c2)
and the cardinality of c1 is very low, whereas c2 is very high
When executing the following queries,
select *
from A
where (c1, c2) in (('001', 'aaa'))
select *
from A
where c1 = '001'and c2 = 'aaa'
the optimizer uses the index.
However, for the following cases,
Case 1:
select *
from A
where (c1, c2) in (('001', 'aaa'), ('002', 'bbb'), ('003', 'ccc'))
Case 2:
select *
from A
where (c1 = '001'and c2 = 'aaa') or
(c1 = '002'and c2 = 'bbb') or
(c1 = '003'and c2 = 'ccc')
the optimizer stops using the index for the Case 1 but still uses for the Case 2.
What makes the optimizer stop using the index for Case 1?
*MySQL Version: 5.6.10
Although those expressions are semantically equivalent, MySQL only added the Range Optimization of Row Constructor Expressions, which is actually able to execute them the same way, in MySQL 5.7.3 (emphasis mine):
The optimizer now is able to apply the range scan access method to queries of this form:
SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));
Previously, for range scans to be used it was necessary for the query to be written as:
SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );
For the optimizer to use a range scan, queries must satisfy these conditions:
Only IN() predicates are used, not NOT IN().
On the left side of the IN() predicate, the row constructor contains only column references.
On the right side of the IN() predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.
On the right side of the IN() predicate, there is more than one row constructor.
Related
I am using mysql and my table is as follows:
col_a: varchar(50)
col_b: varchar(50)
created_at: timestamp
I would like to do run the following query but by using tuples:
SELECT * FROM tbl
WEHRE col_a = x AND col_b = y AND NOW()-created_at >= z
I am aware of the following query but it only allows direct comparison (=) and not other operators (>=) which are required in the third argument:
SELECT * FROM tbl
WHERE (col_a, col_b, col_c) IN <tuple_list_goes_here>
Can I use different operators with a IN query?
An IN() predicate only evaluates equality, not any other comparison operation.
I've always used the IN (val1, val2, ...) syntax quite easily when testing for a bunch of values. However, I'm wondering what type of data structure it actually evaluates to, is this a table function? For example:
-- to populate data
CREATE TABLE main_territory (
name varchar NOT NULL,
is_fake_territory integer NOT NULL,
code varchar NOT NULL
);
INSERT INTO main_territory (name, is_fake_territory, code) VALUES ('Afghanistan', 0, 'AF'), ('Albania', 0, 'AL'), ('Algeria', 0, 'DZ');
select '1' as "query#", * from main_territory where code in ('AF', 'AL') union all
select '2' as "query#", * from main_territory where code in (select 'AF' UNION ALL select 'AL') UNION ALL
select '3' as "query#", * from main_territory where code in (select code from main_territory where name ='Albania' or name = 'Afghanistan')
The second and third queries return a one-columned table (is this called a scalar-table?), and so I would imagine doing (expr1, expr2, ...) does the same thing -- it evaluates to a one-columed table. Is that accurate, or what actual data type is this?
I would not call the IN ( ) predicate tuple comparison. An example of tuple comparison (aka row constructor comparison) is:
WHERE (col1, col2) = ('abc', 123)
Or you can even do multivalued row constructor comparison:
WHERE (col1, col2) IN (('abc', 123), ('xyz', 456))
The examples you show are simply the IN ( ) predicate, which compares a single value to a list of values. If the value matches any of those in the list, the predicate is satisfied. The list can either be a fixed list of expressions or literals:
WHERE code IN ('AF', 'AL')
Or it can be the result of a subquery:
WHERE code IN (SELECT code FROM ...)
How this is implemented depends on the code of the respective RDBMS. It might have to materialize the result of the subquery and store it as a list internally. In some software, they may use a temporary table with one column as the data structure to store the result of the subquery. Then the IN ( ) predicate can be executed as a join against that temporary table. If there's one thing an SQL engine ought to be able to do efficiently, it's a join. :-)
But this might be expensive if the result of the subquery is millions of rows. In that case, a clever optimizer would "factor out" the IN ( ) predicate and just do a join. That is, it would read each value of code and do an index lookup into the second table's code column. This means there's no data structure per se, it's just the evaluation of a join.
The real answer would be implementation-dependent. Both MySQL and PostgreSQL are open-source, so you can try downloading and reading the code yourself if you want to know the implementation.
The confusion is understandable, since these are actually two different kinds of IN:
WHERE expr IN (2, 3, 4, ...)
WHERE expr IN (SELECT ...)
The first will be converted to an array like this:
QUERY PLAN
═══════════════════════════════════════════════════
Seq Scan on tab
Filter: (expr = ANY ('{2,3,4,...}'::integer[]))
or, if the list has only one element, to
QUERY PLAN
══════════════════════
Seq Scan on tab
Filter: (expr = 2)
The second will be executed as a join, for example:
QUERY PLAN
═══════════════════════════════════
Hash Join
Hash Cond: (tab.expr = sub.col)
-> Seq Scan on tab
-> Hash
-> Seq Scan on sub
So, to answer your question: A plain IN list will be converted to an array, and IN becomes = ANY.
I have used FIND_IN_SET multiple times before but this case is a bit different.
Earlier I was searching a single value in the table like
SELECT * FROM tbl_name where find_in_set('1212121212', sku)
But now I have the list of SKUs which I want to search in the table. E.g
'3698520147','088586004490','868332000057','081308003405','088394000028','089541300893','0732511000148','009191711092','752830528161'
I have two columns in the table SKU LIKE 081308003405 and SKU Variation
In SKU column I am saving single value but in variation column I am saving the value in the comma-separated format LIKE 081308003405,088394000028,089541300893
SELECT * FROM tbl_name
WHERE 1
AND upc IN ('3698520147','088586004490','868332000057','081308003405','088394000028',
'089541300893','0732511000148','009191711092','752830528161')
I am using IN function to search UPC value now I want to search variation as well in the variation column. This is my concern is how to search using SKU list in variation column
For now, I have to check in the loop for UPC variation which is taking too much time. Below is the query
SELECT id FROM products
WHERE 1 AND upcVariation AND FIND_IN_SET('88076164444',upc_variation) > 0
First of all consider to store the data in a normalized way. Here is a good read: Is storing a delimited list in a database column really that bad?
Now - Assumng the following schema and data:
create table products (
id int auto_increment,
upc varchar(50),
upc_variation text,
primary key (id),
index (upc)
);
insert into products (upc, upc_variation) values
('01234', '01234,12345,23456'),
('56789', '45678,34567'),
('056789', '045678,034567');
We want to find products with variations '12345' and '34567'. The expected result is the 1st and the 2nd rows.
Normalized schema - many-to-many relation
Instead of storing the values in a comma separated list, create a new table, which maps product IDs with variations:
create table products_upc_variations (
product_id int,
upc_variation varchar(50),
primary key (product_id, upc_variation),
index (upc_variation, product_id)
);
insert into products_upc_variations (product_id, upc_variation) values
(1, '01234'),
(1, '12345'),
(1, '23456'),
(2, '45678'),
(2, '34567'),
(3, '045678'),
(3, '034567');
The select query would be:
select distinct p.*
from products p
join products_upc_variations v on v.product_id = p.id
where v.upc_variation in ('12345', '34567');
As you see - With a normalized schema the problem can be solved with a quite basic query. And we can effectively use indices.
"Exploiting" a FULLTEXT INDEX
With a FULLTEXT INDEX on (upc_variation) you can use:
select p.*
from products p
where match (upc_variation) against ('12345 34567');
This looks quite "pretty" and is probably efficient. But though it works for this example, I wouldn't feel comfortable with this solution, because I can't say exactly, when it doesn't work.
Using JSON_OVERLAPS()
Since MySQL 8.0.17 you can use JSON_OVERLAPS(). You should either store the values as a JSON array, or convert the list to JSON "on the fly":
select p.*
from products p
where json_overlaps(
'["12345","34567"]',
concat('["', replace(upc_variation, ',', '","'), '"]')
);
No index can be used for this. But neither can for FIND_IN_SET().
Using JSON_TABLE()
Since MySQL 8.0.4 you can use JSON_TABLE() to generate a normalized representation of the data "on the fly". Here again you would either store the data in a JSON array, or convert the list to JSON in the query:
select distinct p.*
from products p
join json_table(
concat('["', replace(p.upc_variation, ',', '","'), '"]'),
'$[*]' columns (upcv text path '$')
) v
where v.upcv in ('12345', '34567');
No index can be used here. And this is probably the slowest solution of all presented in this answer.
RLIKE / REGEXP
You can also use a regular expression:
select p.*
from products p
where p.upc_variation rlike '(^|,)(12345|34567)(,|$)'
See demo of all queries on dbfiddle.uk
You can try with below example:
SELECT * FROM TABLENAME
WHERE 1 AND ( FIND_IN_SET('3698520147', SKU)
OR UPC IN ('3698520147') )
I have a solution for you, you can consider this solution:
1: Create a temporary table example here: Sql Fiddle
select
tablename.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(tablename.name, ',', numbers.n), ',', -1) sku_variation
from
numbers inner join tablename
on CHAR_LENGTH(tablename.sku_split)
-CHAR_LENGTH(REPLACE(tablename.sku_split, ',', ''))>=numbers.n-1
order by id, n
2: Use the temporary table to filter. find in set with your data
Performance considerations. The main thing that matters for performance is whether some index can be used. The complexity of the expression has only a minuscule impact on overall performance.
Step 1 is to learn what can be optimized, and in what way:
Equal: WHERE x = 1 -- can use index
IN/1: WHERE x IN (1) -- Turned into the Equal case by Optimizer
IN/many: WHERE x IN (22,33,44) -- Usually worse than Equal and better than "range"
Easy OR: WHERE (x = 22 OR x = 33) -- Turned into IN if possible
General OR: WHERE (sku = 22 OR upc = 33) -- not sargable (cf UNION)
Easy LIKE: WHERE x LIKE 'abc' -- turned into Equal
Range LIKE: WHERE x LIKE 'abc%' -- equivalent to "range" test
Wild LIKE: WHERE x LIKE '%abc%' -- not sargable
REGEXP: WHERE x RLIKE 'aaa|bbb|ccc' -- not sargable
FIND_IN_SET: WHERE FIND_IN_SET(x, '22,33,44') -- not sargable, even for single item
JSON: -- not sargable
FULLTEXT: WHERE MATCH(x) AGAINST('aaa bbb ccc') -- fast, but not equivalent
NOT: WHERE NOT ((any of the above)) -- usually poor performance
"Sargable" -- able to use index. Phrased differently "Hiding the column in a function call" prevents using an index.
FULLTEXT: There are many restrictions: "word-oriented", min word size, stopwords, etc. But it is very fast when it applies. Note: When used with outer tests, MATCH comes first (if possible), then further filtering will be done without the benefit of indexes, but on a smaller set of rows.
Even when an expression "can" use an index, it "may not". Whether a WHERE clause makes good use of an index is a much longer discussion than can be put here.
Step 2 Learn how to build composite indexes when you have multiple tests (WHERE ... AND ...):
When constructing a composite (multi-column) index, include columns in this order:
'Equal' -- any number of such columns.
'IN/many' column(s)
One range test (BETWEEN, <, etc)
(A couple of side notes.) The Optimizer is smart enough to clean up WHERE 1 AND .... But there are not many things that the Optimizer will handle. In particular, this is not sargable: `AND DATE(x) = '2020-02-20', but this does optimize as a "range":
AND x >= '2020-02-20'
AND x < '2020-02-20' + INTERVAL 1 DAY
Reading
Building indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Sargable: https://en.wikipedia.org/wiki/Sargable
Tips on Many-to-many: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
This depends on how you use it. In MySQL I found that find_in_set is way faster than using JSON when tested on the following commands, so much faster it wasn't even a competition (to be clear, the speed test did not include the set command line):
Fastest
set #ids = (select group_concat(`ID`) from `table`);
select count(*) from `table` where find_in_set(`ID`, #ids);
10 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where `ID` member of( #ids );
34 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where JSON_CONTAINS(#ids, convert(`ID`, char));
34 x slower
set #ids = (select json_arrayagg(`ID`) from `table`);
select count(*) from `table` where json_overlaps(#ids, json_array(`ID`));
SELECT * FROM tbl_name t1,(select
group_concat('3698520147',',','088586004490',',','868332000057',',',
'081308003405',',','088394000028',',','089541300893',',','0732511000148',',','009191711092',
',','752830528161') as skuid)t
WHERE FIND_IN_SET(t1.sku,t.skuid)>0
I recently had to wrote a query to filter some specific data that looked like the following:
Let's suppose that I have 3 distinct values that I want to search in 3 different fields of one of my tables on my database, they must be searched in all possible orders without repetition.
Here is an example (to make it easy to understand, I will use named queries notation to show where the values must be placed):
val1 = "a", val2 = "b", val3 = "c"
This is the query I've generated:
SELECT * FROM table WHERE
(fieldA = :val1 AND fieldB = :val2 AND fieldC = :val3) OR
(fieldA = :val1 AND fieldB = :val3 AND fieldC = :val2) OR
(fieldA = :val2 AND fieldB = :val1 AND fieldC = :val3) OR
(fieldA = :val2 AND fieldB = :val3 AND fieldC = :val1) OR
(fieldA = :val3 AND fieldB = :val1 AND fieldC = :val2) OR
(fieldA = :val3 AND fieldB = :val2 AND fieldC = :val1)
What I had to do is generate a query that simulates a permutation without repetition. Is there a better way to do this type of query?
This is OK for 3x3 but if I need to do the same with something bigger like 9x9 then generating the query will be a huge mess.
I'm using MariaDB, but I'm okay accepting answers that can run on PostgreSQL.
(I want to learn if there is a smart way of writing this type of queries without "brute force")
There isn't a much better way, but you can use in:
SELECT *
FROM table
WHERE :val1 in (fieldA, fieldB, fieldC) and
:val2 in (fieldA, fieldB, fieldC) and
:val3 in (fieldA, fieldB, fieldC)
It is shorter at least. And, this is standard SQL, so it should work in any database.
... I'm okay accepting answers that can run on PostgreSQL. (I want to
learn if there is a smart way of writing this type of queries without "brute force")
There is a "smart way" in Postgres, with sorted arrays.
Integer
For integer values use sort_asc() of the additional module intarray.
SELECT * FROM tbl
WHERE sort_asc(ARRAY[id1, id2, id3]) = '{1,2,3}' -- compare sorted arrays
Works for any number of elements.
Other types
As clarified in a comment, we are dealing with strings.
Create a variant of sort_asc() that works for any type that can be sorted:
CREATE OR REPLACE FUNCTION sort_asc(anyarray)
RETURNS anyarray LANGUAGE sql IMMUTABLE AS
'SELECT array_agg(x ORDER BY x COLLATE "C") FROM unnest($1) AS x';
Not as fast as the sibling from intarray, but fast enough.
Make it IMMUTABLE to allow its use in indexes.
Use COLLATE "C" to ignore sorting rules of the current locale: faster, immutable.
To make the function work for any type that can be sorted, use a polymorphic parameter.
Query is the same:
SELECT * FROM tbl
WHERE sort_asc(ARRAY[val1, val2, val3]) = '{bar,baz,foo}';
Or, if you are not sure about the sort order in "C" locale ...
SELECT * FROM tbl
WHERE sort_asc(ARRAY[val1, val2, val3]) = sort_asc('{bar,baz,foo}'::text[]);
Index
For best read performance create a functional index (at some cost to write performance):
CREATE INDEX tbl_arr_idx ON tbl (sort_asc(ARRAY[val1, val2, val3]));
SQL Fiddle demonstrating all.
My answer assumes there is a Key column that we can single out. The output should be all the keys that meet all 3 values and each field and value being used:
This "should" get you a list of Keys that meet the criteria
SELECT F.KEY
FROM (
SELECT DISTINCT L.Key, L.POS
FROM (
SELECT Key, 'A' AS POS, FieldA AS FIELD FROM table AS A
UNION ALL
SELECT Key, 'B' AS POS, FieldB AS FIELD FROM table AS A
UNION ALL
SELECT Key, 'C' AS POS, FieldC AS FIELD FROM table AS A ) AS L
WHERE L.FIELD IN(:VAL1, :VAL2, :VAL3)
) AS F
GROUP BY F.KEY
HAVING COUNT(*) = 3
Although Gordon's answer is definitely shorter and almost certainly faster as well, I was toying with the idea on how to minimize the code change when the number of combinations increase.
And I can come up with is something for Postgres which is by no means shorter, but more "change-friendly":
with recursive params (val) as (
values (1),(2),(3) -- these are the input values
), all_combinations as (
select array[val] as elements
from params
union all
select ac.elements||p.val
from params p
join all_combinations ac
on array_length(ac.elements,1) < (select count(*) from params)
)
select *
from the_table
where array[id1,id2,id3] = any (select elements from all_combinations);
What does it do?
First we create a CTE holding the values we are looking for, the recursive CTE then builds a list of all possible permutations from those values. This list will include too many elements because it will also hold arrays with 1 or two elements.
The final select that puts the columns that should be compared into an array and compares that with the permutations generated by the CTE.
Here is a SQLFiddle example: http://sqlfiddle.com/#!15/43066/1
When the number of values (and columns) increase you only need to add the new value to the values row constructor and add the additional column to the array of columns in the where condition.
Using a naive approach, I would use the in clause for this job, and since there should not be any repetition, exclude when the fields repeat.
There is also some optimisations you could do.
First you can exclude the last field, since:
A <> B, A <> C
A <> B, B <> C,
Also means that:
C <> B, C <> A
And also, the following queries doesn't need a previously queried field, since:
A <> B == B <> A
The query would be written as:
SELECT * FROM table
WHERE :val1 in (fieldA, fieldB, fieldC) and
:val2 in (fieldA, fieldB, fieldC) and
:val3 in (fieldA, fieldB, fieldC) and
fieldA not in (fieldB, fieldC) and
fieldB <> fieldC
This is a naive approach, there are probably others which use the MySQL API, but this one does the job.
How can i run mysql and or query together instant of separate query.
e.g.:
And query:
select * form tablename where name='A' and password="A" and id='A';
Or query:
select * form tablename where name='A' or password="A" or id='A';
-These are 2 different query,can i make these query together?what is the syntax??
Use parentheses to group the conditions?
SELECT * FROM table WHERE (X and Y or Z) AND (P and Q or F)
Well, you can just union them but, since one is a subset of the other, it's not strictly necessary:
select * from tablename
where name = 'A' and password = 'A' and id = 'A'
union select * from tablename
where name = 'A' or password = 'A' or id = 'A'
That will give you exactly the same results as if you had just run the second query on its own. That will make sense once you realise that every single row from the first query has a name equal to 'A', so it will match the first part of the where clause in the second query.
If you want duplicate rows for those returned in both queries, just use union all instead of union.
If you were using 'A' as just a placeholder and its values are different in the two queries, then you have two approaches. Use a construct like:
... where (name = 'A' and password = 'B' and id = 'C')
or name = 'D' or password = 'E' or id = 'F'
or use the union solution I gave above, something like:
select * from tablename
where name = 'A' and password = 'B' and id = 'C'
union select * from tablename
where name = 'D' or password = 'E' or id = 'F'
(use union all when you know there is no possibility of duplicates between the two queries, - it will save the DBMS the trouble of removing non-existent duplicates - that's not the case with these queries).
The union may give better performance on a DBMS that can hive off the two selects more easily to separate query engines (something that would be more difficult with a single query with a complex where clause). Of course, as will all optimisations, measure, don't guess.
It is not clear what you expect as the result, but my guess is you want a UNION:
SELECT 1 `query`, `name`, `password`, `id`
FROM `tablename` WHERE `name`='A' and `password`='A' and `id`='A'
UNION
SELECT 2 `query`, `name`, `password`, `id`
FROM `tablename` WHERE `name`='A' or `password`='A' or `id`='A'
Note that the first column query in result is required to separate results from the two queries because union of (X and Y) and (X or Y) is always (X or Y).
Use () for such type of conditions
select * form tablename
where name='A' OR password="A" OR id='A' OR
(name='A' AND password="A" AND id='A')
If you want to check for same string as A here then you will get same o/p using following query
select * form tablename
where name='A' OR password="A" OR id='A'
Just combine the conditions with WHERE
SELECT * FROM tablename WHERE (name='A' AND password='A' AND id='A') OR name='A' OR password='A' OR id='A'
The parentheses ensure that the whole AND expressions "validates" only if ALL the containing conditions are true while the rest macthes the OR