mysql int column matching against a string - mysql

I have the following query and result:
mysql> SELECT item_id FROM phppos_items WHERE item_id = '5CG4500RRL';
+---------+
| item_id |
+---------+
| 5 |
+---------+
item_id is an int(11) primary key
How do I prevent this from matching? It looks like it is somehow becoming 5 when matching.
I still want to run this code so I don't have to change a lot of logic so I would prefer to keep it in mysql to do a strict comparison if possible.

I can be done by several methods. For example:
SELECT item_id
FROM phppos_items
WHERE '5CG4500RRL' REGEXP '^[0-9]+$' AND item_id = '5CG4500RRL';
Here we check is input value digits only and it equal to item_id.
Here you can find more options to check input value.

Related

Strange behavior querying an Int field with a String in MySQL

Using MySQL 5.5.60.
I'm running into some peculiar behavior when running select queries in Mysql. I have a table list whose schema looks like this:
+-----------------------+--------------+------+-----+-----------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+-----------------+----------------+
| list_id | int(11) | NO | PRI | NULL | auto_increment |
| vendor_id | int(11) | NO | MUL | NULL | |
| referrer_id | int(11) | NO | | 0 | |
...
If I run this query
mysql> select * from list where list_id = "1946"\G
Everything works as it should and the list with id 1946 is returned. Here is where it gets weird. If I change my query to look like this:
mysql> select * from list where list_id = "1946dhkdf"\G
It still returns list 1946! Clearly MySQL somehow cast off the dhkdf part and uses the 1946 portion only. So does it try to cast that value to an Integer that way? Why then does this query return and empty set?
mysql> select * from list where list_id = "xq1946dhkdf"\G
I can't seem to find any documentation explaining this behavior. Can someone shed some light on it?
You are seeing MySQL's somewhat complex casting rules at work here. When trying to compare an integer column against a string literal, either one has to be cast to integer, or the other to string. In this case, MySQL will try to cast the string literal to an integer, to match the type of the column. But, in this case, it can't cast the entire string literal to an integer, since it contains characters. Therefore, the casting rules kick in, which state that if the first N characters of the string be numeric, then use only that leading number. So, as a result the following query:
select * from list where list_id = "1946dhkdf";
will return the same result set as:
select * from list where list_id = "1946";

Can MySQL FIND_IN_SET or equivalent be made to use indices?

If I compare
explain select * from Foo where find_in_set(id,'2,3');
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | User | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
with this one
explain select * from Foo where id in (2,3);
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | User | range | PRIMARY | PRIMARY | 8 | NULL | 2 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
It is apparent that FIND_IN_SET does not exploit the primary key.
I want to put a query such as the above into a stored procedure, with the comma-separated string as an argument.
Is there any way to make the query behave like the second version, in which the index is used, but without knowing the content of the id set at the time the query is written?
In reference to your comment:
#MarcB the database is normalized, the CSV string comes from the UI.
"Get me data for the following people: 101,202,303"
This answer has a narrow focus on just those numbers separated by a comma. Because, as it turns out, you were not even talking about FIND_IN_SET afterall.
Yes, you can achieve what you want. You create a prepared statement that accepts a string as a parameter like in this Recent Answer of mine. In that answer, look at the second block that shows the CREATE PROCEDURE and its 2nd parameter which accepts a string like (1,2,3). I will get back to this point in a moment.
Not that you need to see it #spraff but others might. The mission is to get the type != ALL, and possible_keys and keys of Explain to not show null, as you showed in your second block. For a general reading on the topic, see the article Understanding EXPLAIN’s Output and the MySQL Manual Page entitled EXPLAIN Extra Information.
Now, back to the (1,2,3) reference above. We know from your comment, and your second Explain output in your question that it hits the following desired conditions:
type = range (and in particular not ALL) . See the docs above on this.
key is not null
These are precisely the conditions you have in your second Explain output, and the output that can be seen with the following query:
explain
select * from ratings where id in (2331425, 430364, 4557546, 2696638, 4510549, 362832, 2382514, 1424071, 4672814, 291859, 1540849, 2128670, 1320803, 218006, 1827619, 3784075, 4037520, 4135373, ... use your imagination ..., ..., 4369522, 3312835);
where I have 999 values in that in clause list. That is an sample from this answer of mine in Appendix D than generates such a random string of csv, surrounded by open and close parentheses.
And note the following Explain output for that 999 element in clause below:
Objective achieved. You achieve this with a stored proc similar to the one I mentioned before in this link using a PREPARED STATEMENT (and those things use concat() followed by an EXECUTE).
The index is used, a Tablescan (meaning bad) is not experienced. Further readings are The range Join Type, any reference you can find on MySQL's Cost-Based Optimizer (CBO), this answer from vladr though dated, with a eye on the ANALYZE TABLE part, in particular after significant data changes. Note that ANALYZE can take a significant amount of time to run on ultra-huge datasets. Sometimes many many hours.
Sql Injection Attacks:
Use of strings passed to Stored Procedures are an attack vector for SQL Injection attacks. Precautions must be in place to prevent them when using user-supplied data. If your routine is applied against your own id's generated by your system, then you are safe. Note, however, that 2nd level SQL Injection attacks occur when data was put in place by routines that did not sanitize that data in a prior insert or update. Attacks put in place prior via data and used later (a sort of time bomb).
So this answer is Finished for the most part.
The below is a view of the same table with a minor modification to it to show what a dreaded Tablescan would look like in the prior query (but against a non-indexed column called thing).
Take a look at our current table definition:
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
select min(id), max(id),count(*) as theCount from ratings;
+---------+---------+----------+
| min(id) | max(id) | theCount |
+---------+---------+----------+
| 1 | 5046213 | 4718592 |
+---------+---------+----------+
Note that the column thing was a nullable int column before.
update ratings set thing=id where id<1000000;
update ratings set thing=id where id>=1000000 and id<2000000;
update ratings set thing=id where id>=2000000 and id<3000000;
update ratings set thing=id where id>=3000000 and id<4000000;
update ratings set thing=id where id>=4000000 and id<5100000;
select count(*) from ratings where thing!=id;
-- 0 rows
ALTER TABLE ratings MODIFY COLUMN thing int not null;
-- current table definition (after above ALTER):
CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;
And then the Explain that is a Tablescan (against column thing):
You can use following technique to use primary index.
Prerequisities:
You know the maximum amount of items in comma separated string and it is not large
Description:
we convert comma separated string into temporary table
inner join to the temporary table
select #ids:='1,2,3,5,11,4', #maxCnt:=15;
SELECT *
FROM foo
INNER JOIN (
SELECT * FROM (SELECT #n:=#n+1 AS n FROM foo INNER JOIN (SELECT #n:=0) AS _a) AS _a WHERE _a.n <= #maxCnt
) AS k ON k.n <= LENGTH(#ids) - LENGTH(replace(#ids, ',','')) + 1
AND id = SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
This is a trick to extract nth value in comma separated list:
SUBSTRING_INDEX(SUBSTRING_INDEX(#ids, ',', k.n), ',', -1)
Notes: #ids can be anything including other column from other or the same table.

MySQL Query Anomaly

i faced a unique problem by accident
But before that i want to show you a table structure
td_category
|---------------------------------------------------------------------|
| category_id | category_title | category_slug | p_cid |
|---------------------------------------------------------------------|
| 1 | Shirts | 1-Shirts | 0 |
|---------------------------------------------------------------------|
| 2 | Jeans | 2-Jeans | 0 |
|---------------------------------------------------------------------|
Now,
category_id is INT and auto-increment value
category_title is VARCHAR
category_slug is VARCHAR
Now what i amdoing is that, by mistake i wrote a query
SELECT * FROM td_category WHERE category_id = '2-Jeans'
and instead of giving me any error it displayed the 2nd tuple
Isn't it supposed to throw an error??
please can anybody clarify?
mysql performs implicit conversion for int datatype due to which '2-Jeans' is treated as 2-0 (since Jeans is not an int type and is defaulted to 0 for compatibility as described in the docs here)
Hence the final query as the parser interprets is as below:
SELECT * FROM td_category WHERE category_id = 2;
The following query will take id as 2 which is your first character and display second record
SELECT * FROM td_category WHERE category_id = '2-Jeans'
Try this query which will return first record
SELECT * FROM td_category WHERE category_id = '1-Jeans'
2-jeans is treated as 2 so return second record and 1-jeans is treated as 1 so return first record.
Check Manual for auto casting in mysql.

Why my mysql answer that "not using key" when I use rand in where

I have a table that has 4,000,000 records.
The table is created that : (user_id int, partner_id int, PRIMARY_KEY ( user_id )) engine=InnoDB;
I want to test the performance of select 100 records.
Then, I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( 1 );
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | MY_TABLE | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+----------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set, 1 warning (0.00 sec)
This is OK.
But, this query is buffered by mysql.
So, this test make no after the first test.
Then, I thinked of a sql that select by random value.
I tested following:
mysql> explain select user_id from MY_TABLE use index (PRIMARY) where user_id IN ( select ceil( rand() ) );
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | MY_TABLE | index | NULL | PRIMARY | 4 | NULL | 3998727 | Using where; Using index |
+----+-------------+----------+-------+---------------+---------+---------+------+---------+--------------------------+
But, it's bad.
Explain shows that possible_keys is NULL.
So, full index scanning is planned, and in fact, it's too slow rather than the one before.
Then, I want to ask you to teach me how do I write random value with index looking up.
Thanks
Using rand() in SQL is usually a sure-fire way to make the query slow. A common theme here is people using it in ORDER BY to get a random sequence. It's slow because not only does it throw away the indexes, but it also reads through the whole table.
However in your case, the fact that the function calls are in a sub-query ought to allow the outer query to still use its indexes. The fact that it isn't seems quite odd (so I've given the question a +1 vote).
My theory is that perhaps MySQL's optimiser is getting it wrong -- it's seeing the functions in the inner query, and deciding incorrectly that it can't use an index.
The only thing I can suggest to work around that is using force index to push MySQL into using the index you want.
See the definition of rand().
If i understand right, you are trying to get a random record from the database. If that is the case, again from the rand() definition:
ORDER BY RAND() combined with LIMIT is useful for selecting a random sample from a set of rows:
SELECT * FROM table1, table2 WHERE a=b AND c<d -> ORDER BY RAND() LIMIT 1000;
It's a limitation of the MySQL optimizer, that it can't tell that the subquery returns exactly one value, it has to assume the subquery returns multiple rows with unpredictable values, potentially even all the values of user_id. Therefore it decides it's just going to do an index scan.
Here's a workaround:
mysql> explain select user_id from MY_TABLE use index (PRIMARY)
where user_id = ( select ceil( rand() ) );
Note that MySQL's RAND() function returns a value in the range 0 <= v < 1.0. If you CEIL() it, you'll likely get the value 1. Therefore you'll virtually always get the row where user_id=1. If you don't have such a row in your table, you'll get an empty set result. You certainly won't get a user chosen randomly among all your users.
To fix that problem, you'd have to multiply the rand() by the number of distinct user_id values. And that brings up the problem that you might have gaps, so a randomly chosen value won't match any existing user_id.
Re your comment:
You'll always see possible keys as NULL when you get an index scan (i.e., "type" is "index").
I tried your explain query on a similar table, and it appears that the optimizer can't figure out that the subquery is a constant expression. You can workaround this limitation by calculating the random number in application code and then using the result as a constant value in your query:
select user_id from MY_TABLE use index (PRIMARY)
where user_id = $random;

WHERE vs HAVING

Why do you need to place columns you create yourself (for example select 1 as "number") after HAVING and not WHERE in MySQL?
And are there any downsides instead of doing WHERE 1 (writing the whole definition instead of a column name)?
All other answers on this question didn't hit upon the key point.
Assume we have a table:
CREATE TABLE `table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And have 10 rows with both id and value from 1 to 10:
INSERT INTO `table`(`id`, `value`) VALUES (1, 1),(2, 2),(3, 3),(4, 4),(5, 5),(6, 6),(7, 7),(8, 8),(9, 9),(10, 10);
Try the following 2 queries:
SELECT `value` v FROM `table` WHERE `value`>5; -- Get 5 rows
SELECT `value` v FROM `table` HAVING `value`>5; -- Get 5 rows
You will get exactly the same results, you can see the HAVING clause can work without GROUP BY clause.
Here's the difference:
SELECT `value` v FROM `table` WHERE `v`>5;
The above query will raise error: Error #1054 - Unknown column 'v' in 'where clause'
SELECT `value` v FROM `table` HAVING `v`>5; -- Get 5 rows
WHERE clause allows a condition to use any table column, but it cannot use aliases or aggregate functions.
HAVING clause allows a condition to use a selected (!) column, alias or an aggregate function.
This is because WHERE clause filters data before select, but HAVING clause filters resulting data after select.
So put the conditions in WHERE clause will be more efficient if you have many many rows in a table.
Try EXPLAIN to see the key difference:
EXPLAIN SELECT `value` v FROM `table` WHERE `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | table | range | value | value | 4 | NULL | 5 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
EXPLAIN SELECT `value` v FROM `table` having `value`>5;
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
| 1 | SIMPLE | table | index | NULL | value | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
You can see either WHERE or HAVING uses index, but the rows are different.
Why is it that you need to place columns you create yourself (for example "select 1 as number") after HAVING and not WHERE in MySQL?
WHERE is applied before GROUP BY, HAVING is applied after (and can filter on aggregates).
In general, you can reference aliases in neither of these clauses, but MySQL allows referencing SELECT level aliases in GROUP BY, ORDER BY and HAVING.
And are there any downsides instead of doing "WHERE 1" (writing the whole definition instead of a column name)
If your calculated expression does not contain any aggregates, putting it into the WHERE clause will most probably be more efficient.
The main difference is that WHERE cannot be used on grouped item (such as SUM(number)) whereas HAVING can.
The reason is the WHERE is done before the grouping and HAVING is done after the grouping is done.
HAVING is used to filter on aggregations in your GROUP BY.
For example, to check for duplicate names:
SELECT Name FROM Usernames
GROUP BY Name
HAVING COUNT(*) > 1
These 2 will be feel same as first as both are used to say about a condition to filter data. Though we can use ‘having’ in place of ‘where’ in any case, there are instances when we can’t use ‘where’ instead of ‘having’. This is because in a select query, ‘where’ filters data before ‘select’ while ‘having’ filter data after ‘select’. So, when we use alias names that are not actually in the database, ‘where’ can’t identify them but ‘having’ can.
Ex: let the table Student contain student_id,name, birthday,address.Assume birthday is of type date.
SELECT * FROM Student WHERE YEAR(birthday)>1993; /*this will work as birthday is in database.if we use having in place of where too this will work*/
SELECT student_id,(YEAR(CurDate())-YEAR(birthday)) AS Age FROM Student HAVING Age>20;
/*this will not work if we use ‘where’ here, ‘where’ don’t know about age as age is defined in select part.*/
WHERE filters before data is grouped, and HAVING filters after data is grouped. This is an important distinction; rows that are
eliminated by a WHERE clause will not be included in the group. This
could change the calculated values which, in turn(=as a result) could affect which
groups are filtered based on the use of those values in the HAVING
clause.
And continues,
HAVING is so similar to WHERE that most DBMSs treat them as the same
thing if no GROUP BY is specified. Nevertheless, you should make that
distinction yourself. Use HAVING only in conjunction with GROUP BY
clauses. Use WHERE for standard row-level filtering.
Excerpt From:
Forta, Ben. “Sams Teach Yourself SQL in 10 Minutes (5th
Edition) (Sams Teach Yourself...).”.
Having is only used with aggregation but where with non aggregation statements
If you have where word put it before aggregation (group by)