Given 3 possible values for X, is it faster to do WHERE (X = B OR X = C), or to do WHERE X != A? - mysql

Given 3 possible values for X (A, B, C), is it faster to do:
WHERE (X = 'B' OR X = 'C'), or
WHERE X != 'A'
Or does it depend? If so, then what does it depend on?

Option 1:
WHERE (X = 'B' OR X = 'C')
and
WHERE X IN ('B', 'C')
are equivalent and may use an index on (X).
Option 2:
WHERE X != 'A'
will not use an index on (X). See a comment by Henrik Grubbström at the MySQL docs, How MySQL Optimizes WHERE Clauses page:
Indexes are ignored for the <> operator:
So, if the use of index makes the query faster (for example, if 99% of the table has X = 'A'), use the first option.
Note: The != operator is a synonym (in MySQL) of the SQL-standard <> inequality operator.

Your second operation should be faster because it requires one less logical check. If it's scanning a value, it only has to check to make sure it's not A, where your first operation would need to match B and then if there is no match, C. Regarding the use of an index, it depends on what your index looks like and how it's being called. If you have an index on columns W, X and you only filter X, the index will not be used as indexes work left-to-right.

Direct equality (=) and inquality (!=) takes the same time. Best case your queries will run the same time, but worst case, case 1. could be slower as you're adding another case to check by the OR.
Of course not knowing if there are indexes or the distribution of values for X can affect the performance...

In my opinion the second item is better because it is always only one comparison; in the first item if the value to be tested is 'C' or 'A' you have to 2 comparison, the fisrt (X = 'B') will fail and then the second comparison gives the final result.

If case 1 uses an index, which in my view it should if there is an index on X, it will be faster than case 2 if case 2 doesn't use an index, which in my view it won't. In general. It also depends on the actual distribution of values: if significantly skewed, results will vary accordingly.

Related

Need a way to make a MySQL query factor in or exclude condition in Where clause based on what conditions are specified or not specified

I use Chartio to create dashboards. I'm able to use variables with Chartio that can fill in sections of a MySQL query and then pump out a cool looking graph. I have a situation where I need a query that can have any combination of 3 variables X, Y, Z as shown below.
SELECT orderid
FROM orders
WHERE productcode IN (X) AND
status IN (Y) AND
date IN (Z);
I need to have the ability for the query to "determine" that if I only give it X, ignore Y and Z as a condition, for example. Or if I give it X and Y, ignore Z. I could give it any combinations of those three. By "ignore" I mean not use it as a condition in the WHERE clause.
Is this possible using OR? REGEXP? Wildcards? ...? I'm not very well versed in MySQL. Thanks in advance
if it sets the variable to an empty string when the user leaves the field out, you can write:
SELECT orderid
FROM orders
WHERE (X = '' OR productcode = X) AND
(Y = '' OR status = Y) AND
(Z = '' OR date = Z);

MySQL - Indexing for String Comparison

This is for InnoDB with MySQL 5.7.
If I have a query like:
SELECT A, B, C FROM TABLE WHERE STRCMP(D, 'somestring') > 0
Is it possible to have an index on D which can be used by the query? i.e. is MySQL smart enough to use the btree index for STRCMP function?
If not, how might I be able to redesign the query (and/or table) such that I can do string comparison on D, and there can be some form of pruning so that it does not have to hit every single row?
Amazing how many wrong answers so far. Let me see if I can not join them.
STRCMP returns -1, 0, or 1, depending on how the arguments compare.
STRCMP(D, 'somestring') > 0 is identical to D > 'somestring'. (Not >=, not =, not LIKE)
Actually, there may be collation differences, but that can be handled if necessary.
Any function, and certain operators, 'hide' columns from use with INDEXes. D > 'somestring' can benefit from an index starting with D; the STRCMP version cannot.
Why not just
WHERE D > 'somestring'
This would leverage a typical B-tree index on column D.
If you perform comparisons only by passing column values to a function such as STRCMP(), there is no value in indexing it.
Reference: https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators.
So, you may use
SELECT A, B, C FROM TABLE WHERE D > 'somestring'
instead, while LIKE is more strict and could be what you expect.
Also, you can use the COLLATE operator to convert the column to a case-sensitive collation, to make sure it's comparing the case as well.
SELECT A, B, C FROM TABLE WHERE D > 'somestring' COLLATE utf8_bin
Mysql will not use indexes for function calls.
Use below query And Create index on D column
SELECT A, B, C FROM TABLE WHERE D ='somestring'
to create index use the below query
create index index1 on TABLE(D);

Trying to figure out the best index possible for this SQL Query

I have a table that contains 300,000 rows of test data, and I'm trying to come up with an appropriate index for this select statement, however I do believe nothing will work very efficiently and I might need to adjust my approach.
x and y can be anywhere from 1 to 9, and p can be literally any value above x * y (so x of 4 and y or 4, equals 16, but it could be any value above that, there would be no limit. An x of 1 and y of 1 might have a p of 1000, or might just be 1. Another x of 9 and y of 9 might be a p of 81, might be 100, might be 10000, it has no limit.)
SELECT `x`,
`y`
FROM `table`
WHERE `x` <= '9'
AND `y` <= '9'
AND `used` = '0'
ORDER BY `p` DESC
LIMIT 1
I've created an index that of...
x, y, used, price
... which works brilliantly well for specific values of x and y, but when asking for a range, this obviously takes a lot more work.
Can anyone see an efficient way of doing this?
For this query:
SELECT `x`, `y`
FROM `table`
WHERE `x` <= '9' AND `y` <= '9' AND `used` = '0'
ORDER BY `p` DESC
LIMIT 1;
Note: If any of the values are numbers, then drop the single quotes for the comparison. Under some circumstances, comparing strings and numbers can make it harder for the optimizer to do its work.
This is no great index. The best is on (used, x, y, p). The query will still require a file sort. One problem is the inequalities in the where. The second is that MySQL does not implement the desc option for columns in an index.
There 3 possible indexes. But the optimizer is unlikely to consistently pick the best. All start by filtering on the one useful thing used=0
INDEX(used, price) -- This scans from top price down; it wins if
-- a good x&y are found soon
INDEX(used, x) -- This wins if there are few rows with x<9
INDEX(used, y) -- This wins if there are few rows with y<9
If you have simplified the query, then all bets are off. Especially in the following:
By having a "covering index", you get an added boost:
INDEX(used, price, x, y)
INDEX(used, x, y, price)
INDEX(used, y, price, x)
That is, if you really have SELECT x,y,z FROM ..., then these are no longer "covering", and (because of being bulkier) are worse.
Add the 3 indexes and hope that the Optimizer guesses right.
Note: Is used a true/false flag? Do not use INT for flags; it takes 4 bytes. Use TINYINT or ENUM, they take only 1 byte.

SQL Query: boolean processing

I have no idea if this is the right forum or not. Lets say I have the following:
SELECT *
FROM MyTable m
WHERE ((A OR B) AND (C OR D))
Assume that A, B, C, D are proper boolean clauses that each need to be evaluated on a row-level basis. Lets also assume no indexes.
This is logically equivalent to:
SELECT *
FROM MyTable m
WHERE (A AND C)
OR (A AND D)
OR (B AND C)
OR (B AND D)
Is there a performance advantage to either one? We're on MSSql-2008.
My understanding is that your first case is more efficient, because:
in this clause:
WHERE ((A OR B) AND (C OR D))
the entire statement fails if neither A or B are true; the Second part of the statement, (C OR D) is not evaluated. Even if A OR B are true, there is only one more pair to check - C OR D. Worst case is that four criteria are checked before the statement as a whole can be evaluated (if A = False, B = False, C= False, but D = True). Best case is, the statement becomes false after checking only A and B. If neither are true, then the entire statement is false.
In your second case, each of the four cases must ALL be evaluated before the statement as a whole can be evaluated.
Nesting the OR conditionals inside the AND means if the first case fails, more on along, nothing more of interest here. You improve things even more if you place the case most likely to be false as the first pair.
I will be interested to hear from others on this . . .

How do I explain a query with parameters in MySQL

I have a query
SELECT foo FROM bar WHERE some_column = ?
Can I get a explain plan from MySQL without filling in a value for the parameter?
So long as you're doing only an equals (and not a like, which can have short circuit affects), simply replace it with a value:
EXPLAIN SELECT foo FROM bar WHERE some_column = 'foo';
Since it's not actually executing the query, the results shouldn't differ from the actual. There are some cases where this isn't true (I mentioned LIKE already). Here's an example of the different cases of LIKE:
SELECT * FROM a WHERE a.foo LIKE ?
Param 1 == Foo - Can use an index scan if an index exists.
Param 1 == %Foo - Requires a full table scan, even if an index exists
Param 1 == Foo% - May use an index scan, depending on the cardinality of the index and other factors
If you're joining, and the where clause yields to an impossible combination (and hence it will short circuit). For example:
SELECT * FROM a JOIN b ON a.id = b.id WHERE a.id = ? AND b.id = ?
If the first and second parameters are the same, it has one execution plan, and if they are different, it will short circuit (and return 0 rows without hitting any data)...
There are others, but those are all I can think of off the top of my head right now...
The explain plan may be different depending on what you put in. I think explain plans without real parameter don't mean anything.
I don't think it's possible.
WHERE some_column ='value', WHERE some_column = other_column and WHERE some_column = (SELECT .. FROM a JOIN b JOIN c ... WHERE ... ORDER BY ... LIMIT 1 ) return different execution plans.