MySQL strange behavior with comma separated list of numbers - mysql

I had a very complicated problem, but i narrowed it down to this, First, let me give you some test data:
Run this:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`value` text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
INSERT INTO test (value) VALUES
(1),
('1'),
('1,2'),
('3');
Now run this query:
SELECT * FROM test WHERE value = 1;
I would expect in this case to get only the first two rows, where the value is either entered as a numeric 1 or a '1' char, but for some reason this is what i get:
1, 1
2, 1
3, 1,2
My question is, why do i get the third row?
Note: This is my version of mysql: 5.6.28-0ubuntu0.14.04.1
Also, I already solved my original problem by using FIND_IN_SET and I am aware that it's not a very good idea to have this comma separated list type structure, ie, it should probably have been done with a join table in the first place. Unfortunately I'm working within a system that is very large and making that change is not practical at this time.
I'm just interested in why this specific behavior happens.

The reason you get the third row is implicit datatype conversion performed by MySQL. Your query has a predicate (condition) in the WHERE clause
WHERE value = 1
On the right side of the equality comparison operator (the equal sign), we have a numeric literal. On the left side, we have a column that is datatype TEXT.
It's not possible for MySQL to do a comparison of those two different datatypes.
So, MySQL converts one side or the other to a type that is compatible, so a comparison can be performed. In this case, MySQL is converting the value from the column to be numeric, so it compare to the numeric literal.
As a demonstration of what that looks like, we can add a zero (forcing MySQL to do a conversion), and exhibit the results in a SELECT.
SELECT t.value, t.value + 0 FROM test t
t.value t.value + 0
------- -----------
1 1
1 1
1,2 1
3 3
It's documented in the MySQL Reference Manual somewhere, how MySQL does the conversion. At a risk of misstating what the manual says: MySQL reads the string character by character from left to right, until it encounters a character where it can no longer convert to numeric.
In the case if the string '1,2', that happens to be the comma character. That's where MySQL stops. So the conversion returns a numeric value of 1. You would be right to point out that other databases would throw an error attempting to do a conversion of that string to numeric. But MySQL doesn't throw an error or warning.
Reference: Type Conversion in Expression Evaluation http://dev.mysql.com/doc/refman/5.7/en/type-conversion.html
Basically, the predicate in your query is equivalent to specifying:
WHERE value + 0 = 1
Which forces a conversion of the contents of the column value to numeric, and then a comparison to the numeric literal.
That's why the third row is being returned.
To get a different result, consider comparing to a string literal
WHERE value = '1'

Related

Precision loss when performing large number operation in mysql

In MySQL 5.7, a table defined as following shown
CREATE TABLE `person` (
`person_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`person_id`),
KEY `ix_name` (`name`)
) ENGINE=InnoDB CHARSET=utf8
And then we prepared two records for testing, the value of name field (with varchar type) are
123456789123456789
1
respectively.
Case 1
select * from person where name = 123456789123456789-1;
Note that we are using a number instead of string inside the where clause. The record with name 123456789123456789 returned, and it seemed that -1 in the end are ignored!
Furthermore, we add another record with name = 123456789123456788, and this time the above select returns two records, including both 123456789123456789 and 123456789123456788;
The output looks so strange!
Case 2
select * from person where name = 123456789123456789-123456789123456788;
We could get the record with name 1, and in this case it seems that the - act as a minus operator.
Why the behavior of - in two cases are so different!
I can't immediately tell you what the type of 123456789123456789-1 is but for the comparison operation, we're almost certainly falling through most of the more "normal" data type conversion rules for mysql and ending up at:
In all other cases, the arguments are compared as floating-point (real) numbers.
Because one of the argument for the comparison (name) is a string type and the other is numeric, nothing else matches. So both get converted to floats and float types don't have too many digits of precision. Certainly less than the 18 required to represent 123456789123456789 and 123456789123456788 as two different numbers.
Look here:
SELECT person_id, name, name + 0.0, 123456789123456789-1 + 0.0, name = 123456789123456789-1
FROM person
ORDER BY person_id;
Perhaps, before comparing name = 123456789123456789-1 MySQL converts name and 123456789123456789-1 to DOUBLE as I showed in select. So some digits are lost.
Demo.

How Mysql treats comparing a no string value to a indexed varchar column?

Lately I discovered a performance issue in the following use case
Before I had a table "MyTable" with a INT indexed column "MyCode"
Afterwhile Ineeded to change the table structure converting "MyCode" column to VARCHAR (index on the column was preserved)
ALTER TABLE MyTable CHANGE MyCode MyCode VARCHAR(250) DEFAULT NULL
Then experienced a unexpected latency, query were being performed like:
SELECT * FROM MyTable where MyCode = 1234
This query was completely ignoring the MyCode VARCHAR indexing, impression was it was full scanning the table
Converting the query to
SELECT * FROM MyTable where MyCode = "1234"
Performance get back to optimal leveraging on VARCHAR indexing
So the question is.... how to explain it... and how does actually MySQL treat indexing. Or maybe some DB setting to be changed to avoid this ?
int_col = 1234 -- no problem; same type
char_col = "1234" -- no problem; same type
int_col = "1234" -- string is converted to number, then no problem
char_col = 1234 -- converting all the strings to numbers -- tedious
In the 4th case, the index is useless, so the Optimizer looks for some other way to perform the query. This is likely to lead to a "full table scan".
The main exception involves a "covering index", which is only slightly faster -- involving a "full index scan".
I accepted Rick James answer because he got the point.
But I'd like to add more info after having some testing.
The case in the question is: how does actually MySQL compares two values when the filtered column is varchar type and the provided value to filter by is not a string.
If this is the case you'll lose the opportunity to leverage on the index applied to the VARCHAR column having a dramatically loss of performances in your query, supposed instead to be immediate and simple.
Explanation is that MySQL in front of a given value which has a different type from
VARCHAR will perform a full table scan and for every record's field will to perform a CAST(varcharcol as providedvaluetype) and compare the result with provided value.
E.g.
having a VARCHAR column named "code" and filtering
SELECT * FROM table WHERE code=1234
will full scan every record just like doing doing
SELECT * FROM table WHERE CAST(code as UNSIGNED)=1234
Notice that if you'll test it against 0
SELECT * FROM table WHERE CAST(code as UNSIGNED)=0
you'll get back ALL records having a string that its CAST to UNSIGNED won't have a unsigned meaning for mysql CAST function.

How does mysql do multi-type comparison?

I was working on a project and due to a miscomprehension, we ended up comparing a stored int with a string in a MySql database. I ran a few test and it seems to work but I would like to know how MySql compares different datatypes.
Does it convert one to the other? If it does does it convert both to strings or to ints?
When you use a string in an integer context, for example in an arithmetic expression or in a comparison to an integer, MySQL takes the numeric value of that string as a DOUBLE data type.
See https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html
Demonstration:
mysql> create table foo as select 1+'1' as x;
mysql> show create table foo\G
CREATE TABLE `foo` (
`x` double NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The numeric value of a string is the numeric value of any leading digit characters or other characters that make a floating-point number, like -+.e.
For example, the numeric value of '123abc' is 123.
Scientific notation is supported.
mysql> select 1 + '5e-2xyz' as n;
+------+
| n |
+------+
| 1.05 |
+------+
If there are no leading characters that form a numeric value, the string's numeric value is 0.
Mysql manual has a complete section dedicated to this, called Type Conversion in Expression Evaluation.
When an operator is used with operands of different types, type conversion occurs to make the operands compatible. Some conversions occur implicitly. For example, MySQL automatically converts numbers to strings as necessary, and vice versa.
If you compare an int with a string, then both values are converted to floating point number and compared thus.

mysql query ignores where clause when query for wrong data type

I have a table that has a STATUS TINYINT(1) column and DEFAULT 0. All the records now have 0 for this column. When I query this with where status = 'active', the results seem to ignore this clause and return all the results. Is this the default behavior of SQL or MySQL?
What is happening is that MySQL is doing silent conversion. When a string is used in a numeric context, it is converted to a number, based on the leading digits.
There are no leading digits in 'active'. So, the value is converted to 0. Hence, your logic become (after the conversion):
status = 0
Pay attention to types and to the constants used for comparison.

SQL coalesce(): what type does the combined column have?

Lets say I use coalesce() to combine two columns into one in select and subsequently a view constructed around such select.
Tables:
values_int
id INTEGER(11) PRIMARY KEY
value INTEGER(11)
values_varchar
id INTEGER(11) PRIMARY KEY
value VARCHAR(255)
vals
id INTEGER(11) PRIMARY KEY
value INTEGER(11) //foreign key to both values_int and values_varchar
The primary keys between values_int and values_varchar are unique and that allows me to do:
SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
FROM vals
JOIN values_int ON values_int.id = vals.value
JOIN values_varchar ON values_varchar.id = vals.value
This produces nice assembled view with ID column and combined value column that contains actual values from two other tables combined into single column.
What type does this combined column have?
When turned into view and then queried with a WHERE clause using this combined "value" column, how is that actually handled type-wise? I.e. WHERE value > 10
Som rambling thoughts in the need (most likely wrong):
The reason I am asking this is that the alternative to this design have all three tables merged into one with INT values in one column and VARCHAR in the other. That would of course produce a lots of NULL values in both columns but saved me the JOINs. For some reason I do not like that solution because it would require additional type checking to choose the right column and deal with the NULL values but maybe this presented design would require the same too (if the resulting column is actually VARCHAR). I would hope that it actually passes the WHERE clause down the view to the source (so that the column does NOT have a type per se) but I am likely wrong about that.
You query should be explicit to be clear, In this case mysql is using varchar.
I would write this query like this to be clear
coalesce(values_int.value,cast(values_varchar.value as integer), 0)
or
coalesce(cast(values_int.value as varchar(20)),values_varchar.value,'0')
you should put in that last value unless you want the column to be null if both columns are null.
Returns the data type of expression with the highest data type precedence. If all expressions are nonnullable, the result is typed as nonnullable.
So in your case the type will be VARCHAR(255)
Lets say I use coalesce() to combine two columns into one
NO, that's not the use of COALESCE function. It's used for choosing a provided default value if the column value is null. So in your case, if values_int.value IS NULL then it will select the value in values_varchar.value
coalesce(values_int.value, values_varchar.value) AS value
If you want to combine the data then use concatenation operator (OR) CONCAT() function rather like
concat(values_int.value, values_varchar.value) AS value
Verify it yourself. An easy way to check in MySQL is to DESCRIBE a VIEW you create to capture your dynamic column:
mysql> CREATE VIEW v AS
-> SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
-> FROM vals
-> JOIN values_int ON values_int.id = vals.value
-> JOIN values_varchar ON values_varchar.id = vals.value;
Query OK, 0 rows affected (0.01 sec)
Now DESCRIBE v will show you what's what. Note that under MySQL 5.1, I see the column as varbinary(255), but under 5.5 I see varchar(255).