MySQL - Indexing for String Comparison - mysql

This is for InnoDB with MySQL 5.7.
If I have a query like:
SELECT A, B, C FROM TABLE WHERE STRCMP(D, 'somestring') > 0
Is it possible to have an index on D which can be used by the query? i.e. is MySQL smart enough to use the btree index for STRCMP function?
If not, how might I be able to redesign the query (and/or table) such that I can do string comparison on D, and there can be some form of pruning so that it does not have to hit every single row?

Amazing how many wrong answers so far. Let me see if I can not join them.
STRCMP returns -1, 0, or 1, depending on how the arguments compare.
STRCMP(D, 'somestring') > 0 is identical to D > 'somestring'. (Not >=, not =, not LIKE)
Actually, there may be collation differences, but that can be handled if necessary.
Any function, and certain operators, 'hide' columns from use with INDEXes. D > 'somestring' can benefit from an index starting with D; the STRCMP version cannot.

Why not just
WHERE D > 'somestring'
This would leverage a typical B-tree index on column D.

If you perform comparisons only by passing column values to a function such as STRCMP(), there is no value in indexing it.
Reference: https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators.
So, you may use
SELECT A, B, C FROM TABLE WHERE D > 'somestring'
instead, while LIKE is more strict and could be what you expect.
Also, you can use the COLLATE operator to convert the column to a case-sensitive collation, to make sure it's comparing the case as well.
SELECT A, B, C FROM TABLE WHERE D > 'somestring' COLLATE utf8_bin

Mysql will not use indexes for function calls.
Use below query And Create index on D column
SELECT A, B, C FROM TABLE WHERE D ='somestring'
to create index use the below query
create index index1 on TABLE(D);

Related

Has null instead of integer/string in the mysql WHERE clause any impact on the performance?

for a following query :
SELECT some_value
FROM some_table
WHERE param_one='62627'
AND param_two='1'
AND param_three=QUESTIONABLE_VALUE
Does it have any impact on the performance if QUESTIONABLE_VALUE is a null or an integer/string?
IS NULL Optimization. MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.
You must have a composite index
`idx` (`param_one`,`param_two`,`param_three`)
Use EXPLAIN to check how the optimizer behave on different datatypes
If you have a parameter that may be null or may be non-null, and you want to match data that is the same, you would use the <=> operator (null-safe comparison).
SELECT some_value
FROM some_table
WHERE param_one='62627'
AND param_two='1'
AND param_three <=> QUESTIONABLE_VALUE
With =, the result will be null if either operand is null.
Your Title, plus the two answers so far, point out issues with testing against NULL. I want to point out an issue with "integer/string".
AND param_three = 123
can use an index if param_three is some numeric type, but not if it a VARCHAR. The column needs to be converted to numeric to perform the test.
The other way works fine -- these work equally well if param_two is numeric because the string literal is converted to numeric.
AND param_two='1'
AND param_two=1

Did `<> ##` ever match `NULL` for some older version of MySQL

There's a WHERE foo <> ## clause in a MySQL query that currently is excluding rows where foo IS NULL. The contention is that a listing based off this query used to include such rows. Did the <> ## operator ever include NULL rows for some past versions of MySQL?
Comparing anything to NULL with an operator like =, <, >, !=, <>, LIKE, IN(), etc. returns NULL, signifying an unknown boolean state.
This has always been the case in MySQL, as it should be, because ANSI SQL defines NULL semantics that way.
MySQL has an operator <=> which can compare NULLs. See https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_equal-to

Select all records after the CHAR c

I have a column(char) with values between A and Z
I only want to select the records where the char is >= 'C'
Can anyone help me with this?
I tried >= 'C' but this didn't work. Also I couldn't find anything about this on the internet. So I thought it's a good question to ask.
You can use the ascii value for comparison.
select * from tablename where ascii(colname) >= ascii('C')
here is another method.
SELECT SUBSTRING_INDEX(YourColumn,'c',-1) FROM Yourtable;
Strings can be compared in MySQL with regular comparison operators, so this should work:
SELECT * FROM table WHERE col >= 'C'
Do note that the exact sort order (mainly case sensitivity) for strings depends on your characterset collation. Maybe that is the reason why it didn't work for you.
You can also use ASCII() function, which returns the character value of a single character, and compare those:
SELECT * FROM table WHERE ASCII(col) >= ASCII('C')
Note that this only works for single byte characters. For multi byte characters you must use ORD() instead of ASCII().
Yet another way is to use STRCMP() which compares two strings (again, using the sort order of your characterset collation) and returns 0 if the strings are the same, -1 if the first argument is smaller than the second, and 1 otherwise.
SELECT * FROM table WHERE STRCMP(col, 'C') >= 0

Along with which clauses in MySQL 'NOT' keyword can be used? Indeed specifically in MySQL

There is a keyword 'NOT' in MySQL.
For example, consider the below SQL query in MySQL :
SELECT * FROM Customers WHERE Country NOT LIKE '%land%';
See in above query 'NOT' keyword is used along with the LIKE keyword.
I want to know along with which clauses and keywords the 'NOT' keyword is used specifically in MySQL database queries.
Can someone please provide me help in this regard with example queries?
Thank you.
The NOT keyword can be used with any boolean expression (and hence actually with any numeric expression but let's not go there). For example:
where not (a = b)
where not (a is null)
where not (a in ('x', 'y', 'z'))
where not (a = 1 or b = 2 or c = 3)
That is the keyword not. There are at least three other infix operators that contain not: not in, not like, and is not null. For these, the not is part of the operator name, not a separate keyword.

Given 3 possible values for X, is it faster to do WHERE (X = B OR X = C), or to do WHERE X != A?

Given 3 possible values for X (A, B, C), is it faster to do:
WHERE (X = 'B' OR X = 'C'), or
WHERE X != 'A'
Or does it depend? If so, then what does it depend on?
Option 1:
WHERE (X = 'B' OR X = 'C')
and
WHERE X IN ('B', 'C')
are equivalent and may use an index on (X).
Option 2:
WHERE X != 'A'
will not use an index on (X). See a comment by Henrik Grubbström at the MySQL docs, How MySQL Optimizes WHERE Clauses page:
Indexes are ignored for the <> operator:
So, if the use of index makes the query faster (for example, if 99% of the table has X = 'A'), use the first option.
Note: The != operator is a synonym (in MySQL) of the SQL-standard <> inequality operator.
Your second operation should be faster because it requires one less logical check. If it's scanning a value, it only has to check to make sure it's not A, where your first operation would need to match B and then if there is no match, C. Regarding the use of an index, it depends on what your index looks like and how it's being called. If you have an index on columns W, X and you only filter X, the index will not be used as indexes work left-to-right.
Direct equality (=) and inquality (!=) takes the same time. Best case your queries will run the same time, but worst case, case 1. could be slower as you're adding another case to check by the OR.
Of course not knowing if there are indexes or the distribution of values for X can affect the performance...
In my opinion the second item is better because it is always only one comparison; in the first item if the value to be tested is 'C' or 'A' you have to 2 comparison, the fisrt (X = 'B') will fail and then the second comparison gives the final result.
If case 1 uses an index, which in my view it should if there is an index on X, it will be faster than case 2 if case 2 doesn't use an index, which in my view it won't. In general. It also depends on the actual distribution of values: if significantly skewed, results will vary accordingly.