The MySQL manual contains the following interesting note about mixing quoted and unquoted values in an IN condition:
You should never mix quoted and unquoted values in an IN() list because the comparison rules for quoted values (such as strings) and unquoted values (such as numbers) differ. Mixing types may therefore lead to inconsistent results.
However, it doesn't really explain why this is a problem. It has examples, but it doesn't show either the data being queried or the results, so they only serve as illustrations without giving any explanation about the issue.
I have two questions:
Why does this cause problems in MySQL? Ideally, provide an example where the results are wrong/inconsistent/unintuitive, to demonstrate.
Is this a MySQL-specific quirk or does this apply to other database systems? In particular, I am interested in whether this issue affects SQL Server, but would ideally like the question answered in the general case.
It depends what you consider "non-intuitive". This returns false:
'00' in ('0', '01')
However, this returns true:
'00' in (0, '01')
I think the next few lines give an unintuitive example without mixing :
mysql> SELECT 'a' IN (0), 0 IN ('b');
-> 1, 1
That you can extend :
SELECT 'a' IN (0, 1, '2'), 'a' IN ('0', '1', '2');
-> 1, 0
SELECT 0 IN (0.0, 'b'), 0 IN ('0.0', 'b');
-> 1, 1
Also there is this other question :
In MySQL, why does the following query return '----', '0', '000', 'AK3462', 'AL11111', 'C131521', 'TEST', etc.?
select varCharColumn from myTable where varCharColumn in (-1, '');
I get none of these results when I do:
select varCharColumn from myTable where varCharColumn in (-1);
select varCharColumn from myTable where varCharColumn in ('');
Everything is cast into float, most likely, according to this link :
[...] In all other cases, the arguments are compared as floating-point (real) numbers. For example, a comparison of string and numeric operands takes places as a comparison of floating-point numbers.
And string are cast as 0.0, unless they start by digits. Also from the same link, there could be problems with floating point accuracy, and queries not using index because the type is not right (it must cast everything to float, so no index usage, I guess).
I think you might get something similar but not the same with every DBMS because you have to cast things to compare them. It might not be the exact same issue in SQL Server, because the data type precedence is not the same, but you should compare data of the same data type.
According to this link that gives data type precedence for SQL Server :
user-defined data types (highest)
sql_variant
xml
datetimeoffset
datetime2
datetime
smalldatetime
date
time
float
real
decimal
money
smallmoney
bigint
int
smallint
tinyint
bit
ntext
text
image
timestamp
uniqueidentifier
nvarchar (including nvarchar(max) )
nchar
varchar (including varchar(max) )
char
varbinary (including varbinary(max) )
binary (lowest)
int and string would be cast to int (not float) for a SQL server DBMS.
Running some simple tests seems that the control between data types is done correctly, despite what is written in the MySQL manual.
SELECT 0 IN ('0','00',0,00); -> TRUE
SELECT 0 IN ('0','01',1,01); -> TRUE
SELECT 0 IN ('1','00',1,10); -> TRUE
SELECT 0 IN ('11','10',0,10); -> TRUE
SELECT 0 IN ('1','01',1,00); -> TRUE
SELECT '0' IN ('1','01',1,00); -> TRUE
SELECT '0' IN ('0','00',0,00); -> TRUE
SELECT '0' IN ('0','01',1,01); -> TRUE
SELECT '0' IN ('1','00',1,10); -> FALSE
SELECT '0' IN ('11','10',0,10); -> TRUE
SELECT '1' IN ('11','10',1,10); -> TRUE
SELECT '15.32' IN ('11','10',1,15.32); -> TRUE
SELECT 13.12 IN ('11','10',1,13.12); -> TRUE
SELECT 00 IN ('11','00',1,13.12); -> TRUE
SELECT '00' IN ('11',00,1,13.12); -> TRUE
SELECT '00.0' IN ('11',00.0,1,13.12); -> TRUE
SELECT '00.00' IN ('11',0,1,13.12); -> TRUE
SELECT '00.01' IN ('11',0.01,1,13.12); -> TRUE
The above results can be seen in this SQLFiddle
But the above tests are not even close to testing all the different data types of MySQL.
In addition we should simply just think in what cases we would use the IN () operator.
MySQL writes that mixed data types offer surprises on results sometimes, but then again is it actually needed to have different data types inside IN ()?
In short no. What will be checked against the values inside the parenthesis will be a table column having specific data type.
For example doesn't comparing a column of TEXT against IN ('Hello','World',13) seems odd? I know that one could oppose the fact that in the column having data type TEXT you may have numerical values. Good, then just write the above like this IN ('Hello','World','13') since we were speaking about a TEXT column.
In case that we did not know the data type or if somehow the data type is dynamic and could some times change, then we should convert that field to the data type that we expect the majority of results would be.
1. Why does this cause problems in MySQL?
The example below should be able to show you the inconsistency about using IN across quoted (x='1a') and unquoted types (x=1). Note for the same value of x = 1, the same IN expression yields 0 in Query 1, but yields 1 in Query 2.
SELECT
x, x IN ('1b','a1')
FROM
(
select '1a' as x
union all select 1
) q1;
SELECT
x, x IN ('1b','a1')
FROM
(
select 1 as x
) q1;
Results:
Query 1:
'1a': 0
1: 0
Query 2:
1: 1
For far I cannot observe inconsistency if I only alter the list inside IN. But I observed that pattern is like:
expr IN (...array of values)
For expr with string, against string values: compare as string
For expr without string, against string values: compare as number
For expr with string, against numeric values: compare as number
For expr without string, against numeric values: compare as number
2. Is this a MySQL-specific quirk or does this apply to other database systems?
Case by case. For MSSQL I tell you no because when comparing string with number, they give you an error message like:
Conversion failed when converting the varchar value '1a' to data type int.
1. Why does this cause problems in MySQL?
Engine needs to know how it will make comparisons.
If you compare column with integers, the column integer value will be compared with the IN list. If IN list items are strings, comparison will differ.
https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html
2. Is this a MySQL-specific quirk or does this apply to other database systems?
It is not MYSQL specific. For performance reasons (indexing) it is always better not to make casting.
Why does this cause problems in MySQL?
It's not a bug, it's a feature. 😬
Basically it's about how the database handles the field comparison. In particular, MySQL automatically converts the string value to a numeric value when comparing the numeric with string values. Since MySQL is written in C++ , somewhere in the code base, they should cast the string value to double prior to field comparison.
There is nothing special about the IN clause, I think. In the MySQL source code, I saw comments similar to this one:
`WHERE a IN (b, c)` can also be rewritten as `WHERE a = b OR a = c`
Which makes sense and IN is (probably) treated the same way in code base. So based on this, if we have let's say something like this:
... WHERE '04.2' IN ('0', 4.2);
Which means '04.2' = '0' OR '04.2' = 4.2, and will return true, because, in C/C++:
"04.2" = "0" // string value comparison -> false
cast_as_double("04.2") = 4.2 // double value comparison -> true
The same applies for other cases, which resolve as true, e.g. 42 IN ('0042', 0), '3.00' IN (3, '1'), 0 IN (3, '0.00') etc.
Is this a MySQL-specific quirk or does this apply to other database systems?
This seems to be the case with other databases as well. If you like, you can test them online
MySQL: https://www.db-fiddle.com
PostgreSQL: https://www.db-fiddle.com
MS SQL Server 2017: http://sqlfiddle.com/#!18/ff6b8/12807
Whilst there have been a lot of lot of answers and comments that provide examples of 'unintuitive' behaviour, most of these examples seem to be explained by the standard casting rules. In other words, the results were entirely consistent with what would be returned from SELECT A = B; for the given A and B.
"Because casting" doesn't seem like a particularly satisfying explanation for the paragraph I quoted in the question. That paragraph comes after a number of paragraphs explaining how type conversion affects the IN() statement, so it seems somewhat repetitive and redundant if that is all it's referring to.
My interpretation of the quoted paragraph is that it is an explicit statement that a IN(b, c) may give different results to a = b OR a = c in situations where b and c are quoted differently.
I was therefore looking to find an example where the result couldn't be explained by the usual casting rules.
I think the reason that we haven't seen a good example yet is because most answers focussed on comparing numbers, in string and non-string representations. However, by basing the test around string values instead, I have managed to construct a non-intuitive example that is not explained by simple type conversion rules and which is not equivalent to the individual comparisons ORed together; the comparison between 'test' and 23 gives different results depending on what other values are in the IN() list:
SELECT 'test' IN('fish'); --> 0
SELECT 'test' IN(23); --> 0
SELECT 'test' IN('fish', 23); --> 1 !!!
I have yet to come up with a good explanation about what is happening here - is there some rule being followed, or is it just a MySQL quirk? I also haven't got an answer to the second question, as that somewhat depends on the reason for the behaviour (e.g. if it is defined by the standard or is an artefact of an obvious optimisation, vs. just being a MySQL-specific quirk) but I guess this could be figured out by running the above test on other RDBMSs.
Any comments to help flesh this out (or answers that cover the missing elements) will be appreciated - I will update this answer with any further details that I manage to deduce and don't plan on accepting any answer (including my own) until I understand what's going on a little bit better.
The query executed should match the story_id with the provided string but when I execute the query it's giving me a wrong result. Please refer to the screenshot.
story_id column in your case is of INT (or numeric) datatype.
MySQL does automatic typecasting in this case. So, 5bff82... gets typecasted to 5 and thus you get the row corresponding to story_id = 5
Type Conversion in Expression Evaluation
When an operator is used with operands of different types, type
conversion occurs to make the operands compatible. Some conversions
occur implicitly. For example, MySQL automatically converts strings to
numbers as necessary, and vice versa.
Now, ideally your application code should be robust enough to handle this input. If you expect the input to be numeric only, then your application code can use validation operations on the data (to ensure that it is only a number, without typecasting) before sending it to MySQL server.
Another way would be to explicitly typecast story_id as string datatype and then perform the comparison. However this is not recommended approach as this would not be able to utilize Indexing.
SELECT * FROM story
WHERE (CAST story_id AS CHAR(12)) = '5bff82...'
If you run the above query, you would get no results.
you can also use smth like this:
SELECT * FROM story
WHERE regexp_like(story_id,'^[1-5]{1}(.*)$');
for any story_ids starting with any number and matching any no of charatcers after that it wont match with story_id=5;
AND if you explicitly want to match it with a string;
Can anyone help me understand the following problem with a BIT(64) column in MySQL (5.7.19).
This simple example works fine and returns the record from the temporary table:
CREATE TEMPORARY TABLE test (v bit(64));
INSERT INTO test values (b'111');
SELECT * FROM test WHERE v = b'111';
-- Returns the record as expected
When using all the 64 bits of the column it no longer works:
CREATE TEMPORARY TABLE test (v bit(64));
INSERT INTO test values (b'1111111111111111111111111111111111111111111111111111111111111111');
SELECT * FROM test WHERE v = b'1111111111111111111111111111111111111111111111111111111111111111';
-- Does NOT return the record
This only happens when using a value with 64 bits. But I would expect that to be possible.
Can anyone explain this to me?
Please do not respond by advising me not to use BIT columns. I am working on a database tool that should be able to handle all the data types of MySQL.
The problem seems to be, that the value b'11..11' in the WHERE clause is considered to be a SIGNED BIGINT which is -1 and is compared to the value in your table which is considered to be an UNSIGNED BIGINT which is 18446744073709551615. This is always an issue when the first of 64 bits is 1. IMHO this is a bug or a design flaw, because I expect an expression in the WHERE clause to match a row if the same expression has been used in the INSERT satement (at least in this case).
One workaround would be to cast the value to UNSIGNED:
SELECT *
FROM test
WHERE v = CAST(b'1111111111111111111111111111111111111111111111111111111111111111' as UNSIGNED);
Or (if your application language supports it) convert it to something like long uint or decimal:
SELECT * FROM test WHERE v = 18446744073709551615;
Bits are returned as binary, so to display them, either add 0, or use a function such as HEX, OCT or BIN to convert them https://mariadb.com/kb/en/library/bit/ or Bit values in result sets are returned as binary values, which may not display well. To convert a bit value to printable form, use it in numeric context or use a conversion function such as BIN() or HEX(). High-order 0 digits are not displayed in the converted value. https://dev.mysql.com/doc/refman/8.0/en/bit-value-literals.html
I have the following data in a column b which is part of table x.
Table x Column b
{"op":"&","c":[{"type":"date","d":">=","t":1459756800}],"showc":[true]}
{"op":"&","showc":[true],"c":[{"type":"date","d":">=","t":1460534400}]}
I tried to use the query table below to extract my data but does not work, as the timestamps are in different positions.
SELECT substring(Column b, 44 , 10)
FROM Table x
How would I go about extracting just the timestamp.
Much Appreciated
This answer is for MySQL < 5.7, seems 5.7 added json support
Native querying does not support JSON parsing which would lead to all kinds of trouble if you tried to parse this column as a string. Example of issue would a difference in timestamp's value placement due to different proprieties, string length, etc.
You need to add json parsing support either through script (php, ...) or augment mysql functionality
I never got around to use it but common-schema could help you out. I am sure there are other ways
https://code.google.com/archive/p/common-schema/
Usage example from http://mechanics.flite.com/blog/2013/04/08/json-parsing-in-mysql-using-common-schema/:
mysql> select common_schema.extract_json_value(f.event_data,'/age') as age,
-> common_schema.extract_json_value(f.event_data,'/gender') as gender,
-> sum(f.event_count) as event_count
-> from json_event_fact f
-> group by age, gender;
I have a table on my MySQL db named membertable. The table consists of two fields which are memberid and membername. The memberid field has the type of integer and uses auto_increment function starting from 2001. The membername table has the type of varchar.
The membertable has two records with the same order as described above. The records look like this :
memberid : 2001
membername : john smith
memberid : 2002
membername : will smith
I found something weird when I ran a SELECT statement against the memberid field. Running the following statement :
SELECT * FROM `membertable` WHERE `memberid` = '2001somecharacter'
It returned the first data.
Why did that happen? There's no record with memberid = 2001somecharacter. It looks like MySQL only search the first 4 character (2001) and when It's found related data, which is the returned data above, it denies the remaining characters.
How could this happen? And is there any way to turn off this behavior?
--
membertable uses innodb engine
This happens because mysql tries to convert "2001somecharacter" into a number which returns 2001.
Since you're comparing a number to a string, you should use
SELECT * FROM `membertable` WHERE CONVERT(`memberid`,CHAR) = '2001somecharacter';
to avoid this behavior.
OR to do it properly, is NOT put your search variable in quotes so that it has to be a number otherwise it'll blow up because of syntax error and then in front end making sure it's a number before passing in the query.
sqlfiddle
Your finding is an expexted MySQL behaviour.
MySQL converts a varchar to an integer starting from the beginning. As long as there are numeric characters wich can easily be converted, they are icluded in the conversion process. If there's a letter, the conversion stops returning the integer value of the numeric string read so far...
Here's some description of this behavior on the MySQL documentation Site. Unfortunately, it's not mentioned directly in the text, but there's an example which exactly shows this behaviour.
MySQL is very liberal in converting string values to numeric values when evaluated in numeric context.
As a demonstration, adding 0 causes the string to evaluated in a numeric context:
SELECT '2001foo' + 0 --> 2001
, '01.2-3E' + 0 --> 1.2
, 'abc567g' + 0 --> 0
When a string is evaluated in a numeric context, MySQL reads the string character by character, until it encounters a character where the string can no longer be interpreted as a numeric value, or until it reaches the end of the string.
I don't know of a way to "turn off" or disable this behavior. (There may be a setting of sql_mode that changes this behavior, but likely that change will impact other SQL statements that are working, which may stop working if that change is made.
Typically, this kind of check of the arguments is done in the application.
But if you need to do this in the SELECT statement, one option would be cast/convert the column as a character string, and then do the comparison.
But that can have some significant performance consequences. If we do a cast or convert (or any function) on a column that's in a condition in the WHERE clause, MySQL will not be able to use a range scan operation on a suitable index. We're forcing MySQL to perform the cast/convert operation on every row in the table, and compare the result to the literal.
So, that's not the best pattern.
If I needed to perform a check like that within the SQL statement, I would do something like this:
WHERE t.memberid = '2001foo' + 0
AND CAST('2001foo' + 0 AS CHAR) = '2001foo'
The first line is doing the same thing as the current query. And that can take advantage of a suitable index.
The second condition is converting the same value to a numeric, then casting that back to character, and then comparing the result to the original. With the values shown here, it will evaluate to FALSE, and the query will not return any rows.
This will also not return a row if the string value has a leading space, ' 2001'. The second condition is going to evaluate as FALSE.
When comparing an INT to a 'string', the string is converted to a number.
Converting a string to a number takes as many of the leading characters as it can and still be a number. So '2001character' is treated as the number 2001.
If you want non-numeric characters in member_id, make it VARCHAR.
If you want only numeric ids, then reject '200.1character'