Finding exact value in mysql - mysql

I'm trying to solve problem how to find exact value from string.
The problem is then searching in Column StringB for the value 1, it finds all rows containing 1. The idea is that if I look for value 1 in StringB it should only find where value is exact.
Using LIKE is not a perfect option since it will take all rows which contains 1, using = also is not a option since it searches for equal value.
Also tried to use INSTR, but it works almost same as LIKE.
Same with Locate.
There is currently stored formats:
number (example: "2" without "")
number. (example: "2." without "")
number.number (example: "2.23.52.12.35" without "")
And they don't change.
This column only stores numbers, no letter or other type of string ONLY numbers (integer type)
Is there any way to strictly search for value?
My database is InnoDB. Thank you for your time.

Try using REGEXP:
SELECT *
FROM yourTable
WHERE CONCAT('.', StringB, '.') REGEXP CONCAT('[.]', '2', '[.]');
Demo
We could also use LIKE instead of REGEXP:
SELECT *
FROM yourTable
WHERE CONCAT('.', StringB, '.') LIKE CONCAT('%.', '2', '.%');

If you do:
where stringB = 1
Then MySQL has to figure out what types to use. By the rules of SQL, it will convert '1.00' to a number -- and they match.
If you do
where stringB = '1'
Then the types do what you intend. And the values are compared as strings.
More: Keep the types consistent. Don't ever depend on implicit conversion.

Related

SQL Query on editing a quantity field

I have a dataset where the values are different, and I want to bring them into a single format.The values are stored as varchar
For ex.
1st Case: 1.23.45 should be 123.45
2nd Case: 125.45 should be 125.45
The first one, has two decimals. I want to remove the first decimal only(if there are 2) else let the value be as it is.
How do I do this?
I tried using replace(Qty,'.',''). But this is removing of them.
I think this can do (although I am not 100% sure about corner cases)
SET Qty = SUBSTRING(Qty, 1, LOCATE(Qty, '.') - 1) + SUBSTRING(Qty, LOCATE(Qty, '.') + 1, LENGTH(Qty) - LOCATE(Qty, '.') - 1)
WHERE LENGTH(Qty) - LENGTH(REPLACE(Qty, '.', '')
You can use a regular expression to handle this case.
Assuming there are only two decimals in your string the below query should be able to handle the case.
select (value,'^(\d+)(\.)?(\d+\.\d+)$',concat('$1','$2')) as a
Here we are matching a regular expression pattern and capturing the following
digits before first decimal occurrence in group one
digits before and after last decimal occurrence including the last decimal in group two.
Following that we are concatenating the two captured groups.
Note that the first decimal has been made optional using ? character and hence we are able to handle both type of cases.
Even if there are more than two decimal cases, I believe a properly constructed regular expression should be able to handle it.

Mixing quoted and unquoted values in IN() condition - MySQL quirk or general issue?

The MySQL manual contains the following interesting note about mixing quoted and unquoted values in an IN condition:
You should never mix quoted and unquoted values in an IN() list because the comparison rules for quoted values (such as strings) and unquoted values (such as numbers) differ. Mixing types may therefore lead to inconsistent results.
However, it doesn't really explain why this is a problem. It has examples, but it doesn't show either the data being queried or the results, so they only serve as illustrations without giving any explanation about the issue.
I have two questions:
Why does this cause problems in MySQL? Ideally, provide an example where the results are wrong/inconsistent/unintuitive, to demonstrate.
Is this a MySQL-specific quirk or does this apply to other database systems? In particular, I am interested in whether this issue affects SQL Server, but would ideally like the question answered in the general case.
It depends what you consider "non-intuitive". This returns false:
'00' in ('0', '01')
However, this returns true:
'00' in (0, '01')
I think the next few lines give an unintuitive example without mixing :
mysql> SELECT 'a' IN (0), 0 IN ('b');
-> 1, 1
That you can extend :
SELECT 'a' IN (0, 1, '2'), 'a' IN ('0', '1', '2');
-> 1, 0
SELECT 0 IN (0.0, 'b'), 0 IN ('0.0', 'b');
-> 1, 1
Also there is this other question :
In MySQL, why does the following query return '----', '0', '000', 'AK3462', 'AL11111', 'C131521', 'TEST', etc.?
select varCharColumn from myTable where varCharColumn in (-1, '');
I get none of these results when I do:
select varCharColumn from myTable where varCharColumn in (-1);
select varCharColumn from myTable where varCharColumn in ('');
Everything is cast into float, most likely, according to this link :
[...] In all other cases, the arguments are compared as floating-point (real) numbers. For example, a comparison of string and numeric operands takes places as a comparison of floating-point numbers.
And string are cast as 0.0, unless they start by digits. Also from the same link, there could be problems with floating point accuracy, and queries not using index because the type is not right (it must cast everything to float, so no index usage, I guess).
I think you might get something similar but not the same with every DBMS because you have to cast things to compare them. It might not be the exact same issue in SQL Server, because the data type precedence is not the same, but you should compare data of the same data type.
According to this link that gives data type precedence for SQL Server :
user-defined data types (highest)
sql_variant
xml
datetimeoffset
datetime2
datetime
smalldatetime
date
time
float
real
decimal
money
smallmoney
bigint
int
smallint
tinyint
bit
ntext
text
image
timestamp
uniqueidentifier
nvarchar (including nvarchar(max) )
nchar
varchar (including varchar(max) )
char
varbinary (including varbinary(max) )
binary (lowest)
int and string would be cast to int (not float) for a SQL server DBMS.
Running some simple tests seems that the control between data types is done correctly, despite what is written in the MySQL manual.
SELECT 0 IN ('0','00',0,00); -> TRUE
SELECT 0 IN ('0','01',1,01); -> TRUE
SELECT 0 IN ('1','00',1,10); -> TRUE
SELECT 0 IN ('11','10',0,10); -> TRUE
SELECT 0 IN ('1','01',1,00); -> TRUE
SELECT '0' IN ('1','01',1,00); -> TRUE
SELECT '0' IN ('0','00',0,00); -> TRUE
SELECT '0' IN ('0','01',1,01); -> TRUE
SELECT '0' IN ('1','00',1,10); -> FALSE
SELECT '0' IN ('11','10',0,10); -> TRUE
SELECT '1' IN ('11','10',1,10); -> TRUE
SELECT '15.32' IN ('11','10',1,15.32); -> TRUE
SELECT 13.12 IN ('11','10',1,13.12); -> TRUE
SELECT 00 IN ('11','00',1,13.12); -> TRUE
SELECT '00' IN ('11',00,1,13.12); -> TRUE
SELECT '00.0' IN ('11',00.0,1,13.12); -> TRUE
SELECT '00.00' IN ('11',0,1,13.12); -> TRUE
SELECT '00.01' IN ('11',0.01,1,13.12); -> TRUE
The above results can be seen in this SQLFiddle
But the above tests are not even close to testing all the different data types of MySQL.
In addition we should simply just think in what cases we would use the IN () operator.
MySQL writes that mixed data types offer surprises on results sometimes, but then again is it actually needed to have different data types inside IN ()?
In short no. What will be checked against the values inside the parenthesis will be a table column having specific data type.
For example doesn't comparing a column of TEXT against IN ('Hello','World',13) seems odd? I know that one could oppose the fact that in the column having data type TEXT you may have numerical values. Good, then just write the above like this IN ('Hello','World','13') since we were speaking about a TEXT column.
In case that we did not know the data type or if somehow the data type is dynamic and could some times change, then we should convert that field to the data type that we expect the majority of results would be.
1. Why does this cause problems in MySQL?
The example below should be able to show you the inconsistency about using IN across quoted (x='1a') and unquoted types (x=1). Note for the same value of x = 1, the same IN expression yields 0 in Query 1, but yields 1 in Query 2.
SELECT
x, x IN ('1b','a1')
FROM
(
select '1a' as x
union all select 1
) q1;
SELECT
x, x IN ('1b','a1')
FROM
(
select 1 as x
) q1;
Results:
Query 1:
'1a': 0
1: 0
Query 2:
1: 1
For far I cannot observe inconsistency if I only alter the list inside IN. But I observed that pattern is like:
expr IN (...array of values)
For expr with string, against string values: compare as string
For expr without string, against string values: compare as number
For expr with string, against numeric values: compare as number
For expr without string, against numeric values: compare as number
2. Is this a MySQL-specific quirk or does this apply to other database systems?
Case by case. For MSSQL I tell you no because when comparing string with number, they give you an error message like:
Conversion failed when converting the varchar value '1a' to data type int.
1. Why does this cause problems in MySQL?
Engine needs to know how it will make comparisons.
If you compare column with integers, the column integer value will be compared with the IN list. If IN list items are strings, comparison will differ.
https://dev.mysql.com/doc/refman/8.0/en/type-conversion.html
2. Is this a MySQL-specific quirk or does this apply to other database systems?
It is not MYSQL specific. For performance reasons (indexing) it is always better not to make casting.
Why does this cause problems in MySQL?
It's not a bug, it's a feature. 😬
Basically it's about how the database handles the field comparison. In particular, MySQL automatically converts the string value to a numeric value when comparing the numeric with string values. Since MySQL is written in C++ , somewhere in the code base, they should cast the string value to double prior to field comparison.
There is nothing special about the IN clause, I think. In the MySQL source code, I saw comments similar to this one:
`WHERE a IN (b, c)` can also be rewritten as `WHERE a = b OR a = c`
Which makes sense and IN is (probably) treated the same way in code base. So based on this, if we have let's say something like this:
... WHERE '04.2' IN ('0', 4.2);
Which means '04.2' = '0' OR '04.2' = 4.2, and will return true, because, in C/C++:
"04.2" = "0" // string value comparison -> false
cast_as_double("04.2") = 4.2 // double value comparison -> true
The same applies for other cases, which resolve as true, e.g. 42 IN ('0042', 0), '3.00' IN (3, '1'), 0 IN (3, '0.00') etc.
Is this a MySQL-specific quirk or does this apply to other database systems?
This seems to be the case with other databases as well. If you like, you can test them online
MySQL: https://www.db-fiddle.com
PostgreSQL: https://www.db-fiddle.com
MS SQL Server 2017: http://sqlfiddle.com/#!18/ff6b8/12807
Whilst there have been a lot of lot of answers and comments that provide examples of 'unintuitive' behaviour, most of these examples seem to be explained by the standard casting rules. In other words, the results were entirely consistent with what would be returned from SELECT A = B; for the given A and B.
"Because casting" doesn't seem like a particularly satisfying explanation for the paragraph I quoted in the question. That paragraph comes after a number of paragraphs explaining how type conversion affects the IN() statement, so it seems somewhat repetitive and redundant if that is all it's referring to.
My interpretation of the quoted paragraph is that it is an explicit statement that a IN(b, c) may give different results to a = b OR a = c in situations where b and c are quoted differently.
I was therefore looking to find an example where the result couldn't be explained by the usual casting rules.
I think the reason that we haven't seen a good example yet is because most answers focussed on comparing numbers, in string and non-string representations. However, by basing the test around string values instead, I have managed to construct a non-intuitive example that is not explained by simple type conversion rules and which is not equivalent to the individual comparisons ORed together; the comparison between 'test' and 23 gives different results depending on what other values are in the IN() list:
SELECT 'test' IN('fish'); --> 0
SELECT 'test' IN(23); --> 0
SELECT 'test' IN('fish', 23); --> 1 !!!
I have yet to come up with a good explanation about what is happening here - is there some rule being followed, or is it just a MySQL quirk? I also haven't got an answer to the second question, as that somewhat depends on the reason for the behaviour (e.g. if it is defined by the standard or is an artefact of an obvious optimisation, vs. just being a MySQL-specific quirk) but I guess this could be figured out by running the above test on other RDBMSs.
Any comments to help flesh this out (or answers that cover the missing elements) will be appreciated - I will update this answer with any further details that I manage to deduce and don't plan on accepting any answer (including my own) until I understand what's going on a little bit better.

MySQL - search for patterns

I'm trying to figure out if someone has an elegant way to look for patterns in data stored in a varchar field where a value is not known -- meaning I can't use LIKE. For example, say a table called test looked like this:
id, str
and the data looked like this:
1, YUUUY
2, DDDMM
3, MMMMT
4, XMXMX
and I want to do a select that will return anything where the value of str has a pattern that matches the pattern ABABA. ABABA here shows a pattern and not literal letters. So the only one that matches this pattern would be id = 4. Is there a regular expression that I can use to pattern match like this? To make sure I'm clear regarding the patterns:
The pattern for id=1 is ABBBA.
The pattern for id=2 is AAABB.
The pattern for id=3 is AAAAB.
When running the query, all I will know is the pattern to search for.
Alternatively, if it makes it easier, I can have the table set up like:
id,c1,c2,c3,c4,c5
and the data would look like this:
1,Y,U,U,U,Y
2,D,D,D,M,M
3,M,M,M,M,T
4,X,M,X,M,X
Not sure if that makes it easier, but I think regexp is out the window if the data is set up like that.
No regular expression support in MySQL to do that kind of pattern matching, no.
SQL wasn't specifically designed for pattern matching of strings (or patterns of values in separate columns.)
But... we could come up with something workable, even if it's not a regular expression and it's not elegant.
Assuming we don't have a custom built user-defined function, and we want to use native MySQL functions and expression...
And assuming that the patterns we are looking for are guaranteed to consist of only two distinct characters...
And assuming that we're looking at exactly five character positions...
And assuming that the pattern string we're matching to will always begin with the letter 'A', and the "other" letter in the pattern will also be 'B'
It wouldn't be overly ugly to do something like this:
SELECT t.id
, t.str
FROM myable t
WHERE CONCAT('A'
,IF(MID(t.str,2,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,3,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,4,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,5,1)=MID(t.str,1,1),'A','B')
) = 'ABBBA'
The first character in the string is automatically converted to an 'A'.
The second character, if that matches the first character, then it's also an 'A' otherwise it's a 'B'.
We do the same thing for the third, fourth and fifth characters.
Concatenate the 'A' and 'B' characters into a single string, and we can now perform an equality comparison to a pattern string, consisting of 'A' and 'B', starting with an 'A'.
But that is going to fall apart if the stated assumptions aren't true. If str is less than five characters in length, if it contains more than two distinct characters (we'll see the first character as matching... this would see str=XYYZX as matching pattern ABBBA. (First character is automatic match to A, and the fifth character matches the first, so it's an A, and all of the other characters don't match, so they are 'B', even though they aren't the same.
And so on.
We could add some additional checks.
For example, to guaranteed that str is exactly five characters in length...
AND CHAR_LENGTH(t.str)=5
Note that the default collation in MySQL is case insensitive. That means means a str value of MmmmM would be converted to 'AAAAA', not 'ABBBA'. And a str value of MmmKk would match 'AAABB'.
Unfortunately, it doesn't look like MySQL supports regex groups. I was hoping you could do something like this to match ABBBA for example:
([A-Z])([A-Z])\2\2\1
Example here: http://regexr.com/3d8gu
It looks like there is a MySQL plugin that might support it:
https://github.com/mysqludf/lib_mysqludf_preg
Here is a real hacky way to do it.
ABBBA (or YUUUY, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,5,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,3,1) = substring(name,4,1);
AAABB (or DDDMM, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,2,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,4,1) = substring(name,5,1);
AAAAB (or MMMMT, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,2,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,3,1) = substring(name,4,1) AND
substring(name,4,1) != substring(name,5,1);
You get the picture...
It would be similar if you separated the data into different columns. Instead of comparing substrings you would just compare the columns.

Extract Only charcters from a String

I have a column value like
lut00006300.txt
sand2a0000300.raw
I need to extract only character data from above given column values. I tried the below query and was able to get the first three characters.
select filesize,
substring(Filename FROM 1 FOR 3) AS Instrument from Collection;
Is there any approach to extract only the characters from the column value leaving the extensions
The results should be :
LUT
SAND2A
I think below query will helps you.
select filesize,Filename from Collection where Filename REGEXP '[:alpha]';
Refer:- http://dev.mysql.com/doc/refman/5.1/en/regexp.html
SELECT
filesize,
UPPER(SUBSTRING_INDEX(SUBSTRING_INDEX(Filename, '.', 1), '0', 1)) AS Instrument
FROM Collection;
This is a dirty solution, since you want to have the 2 in SAND2A.
Read more about the functions here.

Finding number of occurence of a specific string in MYSQL

Consider the string "55,33,255,66,55"
I am finding ways to count number of occurence of a specific characters ("55" in this case) in this string using mysql select query.
Currently i am using the below logic to count
select CAST((LENGTH("55,33,255,66,55") - LENGTH(REPLACE("55,33,255,66,55", "55", ""))) / LENGTH("55") AS UNSIGNED)
But the issue with this one is, it counts all occurence of 55 and the result is = 3,
but the desired output is = 2.
Is there any way i can make this work correct? please suggest.
NOTE : "55" is the input we are giving and consider the value "55,33,255,66,55" is from a database field.
Regards,
Balan
You want to match on ',55,', but there's the first and last position to worry about. You can use the trick of adding commas to the frot and back of the input to get around that:
select LENGTH('55,33,255,66,55') + 2 -
LENGTH(REPLACE(CONCAT(',', '55,33,255,66,55', ','), ',55,', 'xxx'))
Returns 2
I've used CONCAT to pre- and post-pend the commas (rather than adding a literal into the text) because I assume you'll be using this on a column not a literal.
Note also these improvements:
Removal of the cast - it is already numeric
By replacing with a string one less in length (ie ',55,' length 4 to 'xxx' length 3), the result doesn't need to be divided - it's already the correct result
2 is added to the length because of the two commas added front and back (no need to use CONCAT to calculate the pre-replace length)
Try this:
select CAST((LENGTH("55,33,255,66,55") + 2 - LENGTH(REPLACE(concat(",","55,33,255,66,55",","), ",55,", ",,"))) / LENGTH("55") AS UNSIGNED)
I would do an sub select in this sub select I would replace every 255 with some other unique signs and them count the new signs and the standing 55's.
If(row = '255') then '1337'
for example.