Using LIKE vs. = for exact string match - mysql

First off, I recognize the differences between the two:
- Like makes available the wildcards % and _
- significant trailing whitespace
- colation issues
All other things being equal, for an exact string match which is more efficient:
SELECT field WHERE 'a' = 'a';
Or:
SELECT field WHERE 'a' LIKE 'a';
Or: Is the difference so insignificant that it doesn't matter?

I would say that the = comparator would be faster. The lexical doesn't send the comparison to another lexical system to do general matches. Instead the engine is able to just match or move on. Our db at work has millions of rows and an = is always faster.

In a decent DBMS, the DB engine would recognise that there were no wildcard characters in the string and implicitly turn it into a pure equality (not necessarily the same as =). So, you'd only get a small performance hit at the start, usually negligible for any decent-sized query.
However, the MySQL = operator doesn't necessarily act the way you'd expect (as a pure equality check). Specifically, it doesn't by default take into account trailing spaces for CHAR and VARCHAR data, meaning that:
SELECT age WHERE name = 'pax'
will give you rows for 'pax', 'pax<one space>' and 'pax<a hundred spaces>'.
If you want to do a proper equality check, you use the binary keyword:
SELECT field WHERE name = binary 'pax'
You can test this with something like:
mysql> create table people (name varchar(10));
mysql> insert into people value ('pax');
mysql> insert into people value ('pax ');
mysql> insert into people value ('pax ');
mysql> insert into people value ('pax ');
mysql> insert into people value ('notpax');
mysql> select count(*) from people where name like 'pax';
1
mysql> select count(*) from people where name = 'pax';
4
mysql> select count(*) from people where name = binary 'pax';
1

Related

Lookup against MYSQL TEXT type column

My table/model has TEXT type column, and when filtering for the records on the model itself, the AR where produces the correct SQL and returns correct results, here is what I mean :
MyNamespace::MyValue.where(value: 'Good Quality')
Produces this SQL :
SELECT `my_namespace_my_values`.*
FROM `my_namespace_my_values`
WHERE `my_namespace_my_values`.`value` = '\\\"Good Quality\\\"'
Take another example where I m joining MyNamespace::MyValue and filtering on the same value column but from the other model (has relation on the model to my_values). See this (query #2) :
OtherModel.joins(:my_values).where(my_values: { value: 'Good Quality' })
This does not produce correct query, this filters on the value column as if it was a String column and not Text, therefore producing incorrect results like so (only pasting relevant where) :
WHERE my_namespace_my_values`.`value` = 'Good Quality'
Now I can get past this by doing LIKE inside my AR where, which will produce the correct result but slightly different query. This is what I mean :
OtherModel.joins(:my_values).where('my_values.value LIKE ?, '%Good Quality%')
Finally arriving to my questions. What is this and how it's being generated for where on the model (for text column type)?
WHERE `my_namespace_my_values`.`value` = '\\\"Good Quality\\\"'
Maybe most important question what is the difference in terms of performance using :
WHERE `my_namespace_my_values`.`value` = '\\\"Good Quality\\\"'
and this :
(my_namespace_my_values.value LIKE '%Good Quality%')
and more importantly how do I get my query with joins (query #2) produce where like this :
WHERE `my_namespace_my_values`.`value` = '\\\"Good Quality\\\"'
(Partial answer -- approaching from the MySQL side.)
What will/won't match
Case 1: (I don't know where the extra backslashes and quotes come from.)
WHERE `my_namespace_my_values`.`value` = '\\\"Good Quality\\\"'
\"Good Quality\" -- matches
Good Quality -- does not match
The product has Good Quality. -- does not match
Case 2: (Find Good Quality anywhere in value.)
WHERE my_namespace_my_values.value LIKE '%Good Quality%'
\"Good Quality\" -- matches
Good Quality -- matches
The product has Good Quality. -- matches
Case 3:
WHERE `my_namespace_my_values`.`value` = 'Good Quality'
\"Good Quality\" -- does not match
Good Quality -- matches
The product has Good Quality. -- does not match
Performance:
If value is declared TEXT, all cases are slow.
If value is not indexed, all are slow.
If value is VARCHAR(255) (or smaller) and indexed, Cases 1 and 3 are faster. It can quickly find the one row, versus checking all rows.
Phrased differently:
LIKE with a leading wildcard (%) is slow.
Indexing the column is important for performance, but TEXT cannot be indexed.
What is this and how it's being generated for where on the model (for
text column type)?
Thats generated behind Active Records (Arel) lexical engine.
See my answer below on your second question as to why.
What is the difference in terms of performance using...
The "=" matches by whole string/chunk comparison
While LIKE matches by character(s) ( by character(s)).
In my projects i got tables with millions of rows, from my experience its really faster to the use that comparator "=" or regexp than using a LIKE in a query.
How do I get my query with joins (query #2) produce where like this...
Can you try this,
OtherModel.joins(:my_values).where(OtherModel[:value].eq('\\\"Good Quality\\\"'))
I think it might be helpful.
to search for \n, specify it as \n. To search for \, specify it as
\\ this is because the backslashes are stripped once by the parser
and again when the pattern match is made, leaving a single backslash
to be matched against.
link
LIKE and = are different operators.
= is a comparison operator that operates on numbers and strings. When comparing strings, the comparison operator compares whole strings.
LIKE is a string operator that compares character by character.
mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+
The '=' op is looking for an exact match while the LIKE op is working more like pattern matching with '%' being similar like '*' in regular expressions.
So if you have entries with
Good Quality
More Good Quality
only LIKE will get both results.
Regarding the escape string I am not sure where this is generated, but looks like some standardized escaping to get this valid for SQL.

Use of case in MySQL LIKE query

If I have a table with a column called TITLE which contains text of mixed case, e.g.
Vinashin to Receive Government Loans to Pay Workers
German government concerned over rise in inflation
Is it possible to perform an SQL LIKE query such as:
SELECT * FROM MYTABLE WHERE TITLE LIKE '%Government%'
But which would only return the first row and not the second?
MYSQL's LIKE seems to ignore case.
From the documentation:
The following two statements illustrate that string comparisons are not case sensitive unless one of the operands is a binary string:
mysql> SELECT 'abc' LIKE 'ABC';
-> 1
mysql> SELECT 'abc' LIKE BINARY 'ABC';
-> 0
So you can use LIKE BINARY '%Government%' to make a case-sensitive comparison.
You can use:
LIKE BINARY
instead of LIKE and it will match case sensitive.

MySQL Remove characters from column headers

All my column headers in a MySQL database are prefixed with a number, 1_X, 2_X, etc... which makes bringing the data into IDL impossible using just a basic select statement to bring in the entire table. I'm not sure but I see two possible ways:
1) Bring in the table with column name aliases. Can I use TRIM or SUBSTRING_INDEX to remove/replace the first two characters?
2) Create a routine that uses the information schema to to recursively go through and delete the first two characters of the column headers and create a new table with those headers and copy the data in.
If there weren't so many different tables (all with 1_X, 2_X, etc...) there'd be no problem manually selecting 1_X AS X but that's not feasible. It would be great to be able to use TRIM/SUBSTRING on column headers in the select statement.
Thanks.
It's not possible to use functions in a SQL statement to alter the identifier assigned to a column being returned. The SQL way of specifying the identifier for the column in a resultset is to use the expr AS alias approach.
Rather than trim off the leading digit characters, you could prepend the identifiers with another valid character. (Trimming off leading characters seems like it would potentially lead to another problem, duplicate and/or zero length column names.)
You could just use a SQL statement to generate the SELECT list for you.
(NOTE: the GROUP_CONCAT function is limited by some system/session variables: group_concat_max_len and max_allowed_packet, it's easy enough to adjust these higher, though changing global max_allowed_packet may require MySQL to be restarted.)
To get it back the SELECT list on all one line (assuming you won't overflow the GROUP_CONCAT limits) something like:
SELECT c.table_schema
, c.table_name
, GROUP_CONCAT(
CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
ORDER BY c.ordinal_position
) AS select_list_expr
FROM information_schema.columns c
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
Or, you could even get back a whole SELECT statement, if you wrapped that GROUP_CONCAT expression (which produces the select list) in another CONCAT
Something like this:
SELECT CONCAT('SELECT '
, GROUP_CONCAT(
<select_list_expr>
)
, ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
) AS stmt
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
You could use a more clever expression for <select_list_expr>, to check for leading "digit" characters, and assign an alias to just those columns that need it, and leave the other columns unchanged, though that again introduces the potential for returning duplicate column names.
That is, if you already have columns named '1_X' and 'x1_X' in the same table. But a carefully chosen leading character may avoid that problem...
The <select_list_expr> could be more clever by doing a conditional test for leading digit character, something like this:
SELECT CONCAT('SELECT '
, GROUP_CONCAT(
CASE
WHEN c.column_name REGEXP '^[[:digit:]]'
THEN CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
ELSE CONCAT('t.`',c.column_name,'`')
END
)
, ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
) AS stmt
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
Again, there's a potential for generation "duplicate" column names with this approach. The conditional test "c.column_name REGEXP" could be extended to check for other "invalid" leading characters as well.
As a side note, at some point, someone thought it a "good idea" to name columns with leading digit characters. Just because something is allowed doesn't mean it's a good idea.
Then again, maybe all that rigamarole isn't necessary, and just wrapping the column names in backticks would be sufficient for your application.
I think you can follow option 2. However this will not be quick solution.
Another way around this could be,
Generate schema script for the tables you want to correct.
Open the script in notepad++ or any editor that supports find using regular expression.
Search and replace with [0-9]+_ expression and empty string for replacement.
Create the new tables using this script and copy data into them.
This may sound like a manual approach but you will do this once for all of your tables.
Look into a strategy of doing 2 selects, one for the column name, then one for the data with column alias. You might have to revert to some scripting language, like PHP, for help.
First, get the column names :
show columns from tbl_client;
+-------------------------------+-----------------------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------+-----------------------------------+------+-----+---------------------+-----------------------------+
| 1_X | int(11) | NO | PRI | NULL | auto_increment |
Then, loop through the results and create a list of column alias
Then create your new select
SELECT 1_X as NEW_COLUMN_NAME_FOR_FIELD_1 FROM tbl_client;

mysql match string with start of string in table

I realise that it would be a lot easier if I could modify the table when it was created, but assuming I can't, I have a table that is such as:
abcd
abde
abdf
abff
bbsdf
bcggs
... snip large amount
zza
The values in the table are not fixed length.
I have a string to match such as abffagpokejfkjs .
If it was the other way round, I could do
SELECT * from table where value like 'abff%'
but I need to select the value that matches the start of a string that is provided.
Is there a quick way of doing that, or does it need an itteration through the table to find a match?
Try this:
SELECT col1, col2 -- etc...
FROM your_table
WHERE 'abffagpokejfkjs' LIKE CONCAT(value, '%')
Note that this will not use an index effectively so it will be slow if you have a lot of records.
Also note that some characters in value (e.g. %) may be interpreted by LIKE as having a special meaning, which may undesirable.
LIKE can be avoided, by truncating the comparison string to each value's length:
... WHERE LEFT('abffagpokejfkjs', LENGTH(value)) = value

Difference between LIKE and = in MYSQL?

What's the difference between
SELECT foo FROM bar WHERE foobar='$foo'
AND
SELECT foo FROM bar WHERE foobar LIKE'$foo'
= in SQL does exact matching.
LIKE does wildcard matching, using '%' as the multi-character match symbol and '_' as the single-character match symbol. '\' is the default escape character.
foobar = '$foo' and foobar LIKE '$foo' will behave the same, because neither string contains a wildcard.
foobar LIKE '%foo' will match anything ending in 'foo'.
LIKE also has an ESCAPE clause so you can set an escape character. This will let you match literal '%' or '_' within the string. You can also do NOT LIKE.
The MySQL site has documentation on the LIKE operator. The syntax is
expression [NOT] LIKE pattern [ESCAPE 'escape']
LIKE can do wildcard matching:
SELECT foo FROM bar WHERE foobar LIKE "Foo%"
If you don't need pattern matching, then use = instead of LIKE. It's faster and more secure. (You are using parameterized queries, right?)
Please bear in mind as well that MySQL will do castings dependent upon the situation: LIKE will perform string cast, whereas = will perform int cast. Considering the situation of:
(int) (vchar2)
id field1 field2
1 1 1
2 1 1,2
SELECT *
FROM test AS a
LEFT JOIN test AS b ON a.field1 LIKE b.field2
will produce
id field1 field2 id field1 field2
1 1 1 1 1 1
2 1 1,2 1 1 1
whereas
SELECT *
FROM test AS a
LEFT JOIN test AS b ON a.field1 = b.field2
will produce
id field1 field2 id field1 field2
1 1 1 1 1 1
1 1 1 2 1 1,2
2 1 1,2 1 1 1
2 1 1,2 2 1 1,2
According to the MYSQL Reference page, trailing spaces are significant in LIKE but not =, and you can use wildcards, % for any characters, and _ for exactly one character.
I think in term of speed = is faster than LIKE. As stated, = does an exact match and LIKE can use a wildcard if needed.
I always use = sign whenever I know the values of something. For example
select * from state where state='PA'
Then for likes I use things like:
select * from person where first_name like 'blah%' and last_name like 'blah%'
If you use Oracle Developers Tool, you can test it with Explain to determine the impact on the database.
The end result will be the same, but the query engine uses different logic to get to the answer. Generally, LIKE queries burn more cycles than "=" queries. But when no wildcard character is supplied, I'm not certain how the optimizer may treat that.
With the example in your question there is no difference.
But, like Jesse said you can do wildcard matching
SELECT foo FROM bar WHERE foobar LIKE "Foo%"
SELECT foo FROM bar WHERE foobar NOT LIKE "%Foo%"
More info:
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html
A little bit og google doesn't hurt...
A WHERE clause with equal sign (=) works fine if we want to do an exact match. But there may be a requirement where we want to filter out all the results where 'foobar' should contain "foo". This can be handled using SQL LIKE clause alongwith WHERE clause.
If SQL LIKE clause is used along with % characters then it will work like a wildcard.
SELECT foo FROM bar WHERE foobar LIKE'$foo%'
Without a % character LIKE clause is very similar to equal sign alongwith WHERE clause.
In your example, they are semantically equal and should return the same output.
However, LIKE will give you the ability of pattern matching with wildcards.
You should also note that = might give you a performance boost on some systems, so if you are for instance, searching for an exakt number, = would be the prefered method.
Looks very much like taken out from a PHP script. The intention was to pattern-match the contents of variable $foo against the foo database field, but I bet it was supposed to be written in double quotes, so the contents of $foo would be fed into the query.
As you put it, there is NO difference.
It could potentially be slower but I bet MySQL realises there are no wildcard characters in the search string, so it will not do LIKE patter-matching after all, so really, no difference.
In my case I find Like being faster than =
Like fetched a number of rows in 0.203 secs the first time then 0.140 secs
= returns fetched the same rows in 0.156 secs constantly
Take your choice
I found an important difference between LIKE and equal sign = !
Example: I have a table with a field "ID" (type: int(20) ) and a record that contains the value "123456789"
If I do:
SELECT ID FROM example WHERE ID = '123456789-100'
Record with ID = '123456789' is found (is an incorrect result)
If I do:
SELECT ID FROM example WHERE ID LIKE '123456789-100'
No record is found (this is correct)
So, at least for INTEGER-fields it seems an important difference...