I was doing some system testing and expecting empty results from MySQL(5.7.21) but got surprised to get results.
My transactions table looks like this:
Column Data type
----------------------------
id | INT
fullnames | VARCHAR(40)
---------------------------
And I have some records
--------------------------------
id | fullnames
--------------------------------
20 | Mutinda Boniface
21 | Boniface M
22 | Some-other Guy
-------------------------------
My sample queries:
select * from transactions where id = "20"; -- gives me 1 record which is fine
select * from transactions where id = 20; -- gives me 1 record - FINE as well
Now it gets interesting when I try with these:
select * from transactions where id = "20xxx"; -- gives me 1 record - what is happening here?
What does MySQL do here??
MySQL plays fast and loose with type conversions. When implicitly converting a char to a number, it will take characters from the beginning of the string as long as they are digits, and ignore the rest. In your example, xxx aren't digits, so MySQL only takes the initial "20".
One way around this (which is horrible for performance, since you lose the usage on the index you may have on your column), is to explicitly cast the numeric side to a character:
SELECT * FROM transactions WHARE (CAST id AS CHAR) = 20;
EDIT:
Referencing the discussion about performance from the comments - performing the cast to a number on the client-side is probably the best approach, as it will allow you to avoid sending queries to the database when you know no rows should be returned (i.e., when your input is not a valid number, such as "20x").
An alternative hack could be to cast the input to a number and back again to a string, and compare the lengths. If the lengths are the same it means the input string was fully converted into a number and no characters were omitted. This should be OK WRT performance, since this comparison is performed on an inputted string, not on a value from the column, and the column's index can still be used if the condition passes the short-circuit evaluation of the input:
SELECT *
FROM transactions
WHERE LENGTH(:input) = LENGTH(CAST(:input AS SIGNED)) AND id = :input;
Related
I have a table which's name is users in my MySQL database, and I am using this DB with Ruby on Rails application with ORM structure for years. The table has id field and this field is configured as AI (auto-increment), BIGINT.
Example of my users table;
+----+---------+
| id | name |
+----+---------+
| 1 | John |
| 2 | Tommy |
| 3 | ... |
| 4 | ... |
| 5 | ... |
| 6 | ... |
+----+---------+
The problem I am facing is when I execute the following query I get unexpected rows.
SELECT * FROM users WHERE id = '1AW3F4SEFR';
This query is returning the exact same value with the following query,
SELECT * FROM users WHERE id = 1;
I do not know why SQL let me use strings in WHERE clause on a data type INT. And as we can see from the example, my DB converts the strings I gave to the integer at position 0. I mean, I search for 1AW3F4SEFR and I expect not to get any result. But SQL statement returns the results for id = 1.
In Oracle SQL, the behavior of this exact same query is completely different. So, I believe there is something different on MySQL. But I am not sure about what causes this.
As has been explained in the request comments, MySQL has a weird way of converting strings to numbers. It simply takes as much of a string from the left as is numeric and ignores the rest. If the string doesn't start with a number the conversion defaults to 0.
Examples: '123' => 123, '12.3' => 12.3, '.123' => 0.123, '12A3' => 12, 'A123' => 0, '.1A1.' => 0.1
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=55cd18865fad4738d03bf28082217ca8
That MySQL doesn't raise an error here as other DBMS do, can easily lead to undesired query results that get a long time undetected.
The solution is easy though: Don't let this happen. Don't compare a numeric column with a string. If the ID '1AW3F4SEFR' is entered in some app, raise an error in the app or even prevent this value from being entered. When running the SQL query, make sure to pass a numeric value, so '1AW3F4SEFR' cannot even make it into the DBMS. (Look up how to use prepared statements and pass parameters of different types to the database system in your programming language.)
If for some reason you want to pass a string for the ID instead (I cannot think of any such reason though) and want to make your query fail-safe by not returning any row in case of an ID like '1AW3F4SEFR', check whether the ID string represents an integer value in the query. You can use REGEXP for this.
SELECT * FROM users WHERE id = #id AND #id REGEXP '^[0-9]+$';
Thus you only consider integer ID strings and still enable the DBMS to use an index when looking up the ID.
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=56f8ee902342752933c20b8762f14dbb
I have a table with the following format.
|id | int_col|
--------------
1 | 0 |
2 | 0 |
--------------
The DDL is defined below:
id - is the primary key - it is also set to auto increment
int_col - is an attribute
I tried the below queries:
Select * from table_name where id='string_value';
Returns 0 rows.
Select * from table_name where int_col = 'string_value';
Returns all rows
I am not sure as to why it has returned all rows. I expected it to return 0 rows for both queries.
To make it short: do not compare strings and integers in MySQL. This can lead to unpredictable results such as those you are seeing.
As per MySQL conversion rules, comparing a string and an integer actually results in both values being under the hood converted to floating point numbers, and then compared. The documentation warns :
Comparisons that use floating-point numbers [...] are approximate because such numbers are inexact. This might lead to results that appear inconsistent.
Further in the doc, another disclaimers can be found that specifically relates to integer/string comparison :
Furthermore, the conversion from string to floating-point and from integer to floating-point do not necessarily occur the same way. The integer may be converted to floating-point by the CPU, whereas the string is converted digit by digit in an operation that involves floating-point multiplications.
The results shown will vary on different systems, and can be affected by factors such as computer architecture or the compiler version or optimization level.
Finally, here is an example of (possible) conversion discrepancy, also from MySQL documentation:
mysql> SELECT '18015376320243458' = 18015376320243458;
-> 1
mysql> SELECT '18015376320243459' = 18015376320243459;
-> 0
If I have a large table with floating numbers, can it help in reading speed if I add a column that represent the int value of each float? maybe if the int value will be an index, then when I need to select all the floats that starts with certain int it will "filter" the values that are surely not necessary?
For example if there are 10,000 numbers, 5000 of which begin with 14: 14.232, 14.666, etc, is there an sql statement that can increase the selecting speed if I add the int value column?
id | number | int_value |
1 | 11.232 | 11 |
2 | 30.114 | 30 |
3 | 14.888 | 14 |
.. | .. | .. |
3005 | 14.332 | 14 |
You can create a non clustered index on number column itself. and when selecting the data from table you can filtered out with like operator. No need of additional column,
Select * from mytable
where number like '14%'
First of all: Do you have performance issues? If not then why worry?
Then: You need to store decimals, but you are sometimes only interested in the integer part. Yes?
So you have one or more queries of the type
where number >= 14 and number < 15
or
where truncate(number, 0) = 14
Do you already have indexes on the number? E.g.
create index idx on mytable(number);
The first mentioned WHERE clause would probably benefit from it. The second doesn't, because when you invoke a function on the column, the DBMS doesn't see the relation to the index anymore. This shows it can make a difference how you write the query.
If the first WHERE clause is still too slow in spite of the index, you can create a computed column (ALTER TABLE mytable ADD numint int GENERATED ALWAYS AS truncate(number, 0) STORED), index that, and access it instead of the number column in your query. But I doubt that would speed things up noticeably.
As to your example:
if there are 10,000 numbers, 5000 of which begin with 14
This is not called a large table, but a small one. And as you'd want half of the records anyway, the DBMS would simply read all records sequentially and look at the number. It doesn't make a difference whether it looks at an integer or a decimal number. (Well, some nanoseconds maybe, but nothing you would notice.)
I've stumbled on a previously asked and answered question here:
How to use comparison operator for numeric string in MySQL?
I absolutely agree with the answer being the best mentioned. But it left me with a question myself while I was trying to create my own answer. I was trying to select the first number and convert it to an integer. Next I wanted to compare that integer with a number (3 in case of the question).
This is the query I've created:
SELECT experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1), UNSIGNED INTEGER) AS num
FROM employee
WHERE #num >= 3;
For the sake of simplicity, asume the data inside experience is: 4-8
The query doesn't return any errors. But it doesn't return the data either. I know it's possible to compare the data inside a column with a user defined variable. But is it possible to compare data (the integer in this case) with the variable like I'm trying to do?
This is purely out of curiousity and to learn something.
Yes, a derived table will do. The inner select block below is a derived table. And every derived table needs a name. In my case, xDerived.
The strategy is to let the derived table cleanse the use of the column name. Coming out of the derived chunk is a clean column named num which the outer select is free to use.
Schema
create table employee
( id int auto_increment primary key,
experience varchar(20) not null
);
-- truncate table employee;
insert employee(experience) values
('4-5'),('7-1'),('4-1'),('6-5'),('8-6'),('5-9'),('10-4');
Query
select id,experience,num
from
( SELECT id,experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) AS num
FROM employee
) xDerived
where num>=7;
Results
+----+------------+------+
| id | experience | num |
+----+------------+------+
| 2 | 7-1 | 7 |
| 5 | 8-6 | 8 |
| 7 | 10-4 | 10 |
+----+------------+------+
Note, your #num concept was faulty but hopefully I interpreted what you meant to do above.
Also, I went with 7 not 3 because all your sample data would have returned, and I wanted to show you it would work.
The AS num instruction names the result of convert as num, not a variable named #num.
You could repeat the convert
SELECT experience,CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER)
FROM employee
WHERE CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) >= 3;
Or use a partial (derived) table (only one convert)
SELECT experience,num
FROM (select experience,
CONVERT(SUBSTRING_INDEX(experience,'-',1),UNSIGNED INTEGER) as num
FROM employee) as partialtable WHERE num>=3;
Much simpler. (Or at least much shorter.) This will work for the data as described, namely "number, -, other stuff".
SELECT experience,
0+experience AS 'FirstPart'
FROM employee
WHERE 0+experience >= 3
Why? 0+string is parsed as "convert the string to a number, then add it to 0". Converting a string will extract the digits up to the first non-digit, then convert that as numeric.
I'm trying to find a way to compare two DNA-like strings with MySQL, stored functions are no problem. Also the string may be changed, but needs to have the following format: [code][id]-[value] like C1-4. (- may be changed aswell)
Example of the string:
C1-4,C2-5,C3-9,S5-2,S8-3,L2-4
If a value not exists in the other string, for example S3-1 it will score 10 (max value). If the asked string has C1-4 and the given string has C1-5 the score has to be 4 - 5 = -1 and if the asked string is C1-4 and the given string has C1-2 the score has to be 4 - 2 = 2.
The reason for a this is that my realtime algorithm is getting slow with 10.000 results. (already optimized with stored functions, indexes, query optimalizations) Because 10.000 x small and quick queries will make a lot.
And the score has to be calculated before I can order my query and get the right limit.
Thanks and if you have any questions let me know by comment.
** EDIT **
I'm thinking that it's also possible to not use a string but a table where the DNA-bits are stored as a 1-n relation table.
ID | CODE | ID | VALUE
----------------------
1. | C... | 2. | 4....