MySQL statement
mysql> select * from field where dflt=' '
appears to match empty values; and is different from statement
mysql> select * from field where concat('_',dflt,'_') = '_ _';
I couldn't find a description of this behavior in MySQL reference. How can I make MySQL interpret
input literally?
EDITED: This indeed won't match NULL values, but it does match empty values.
As mentioned in The CHAR and VARCHAR Types:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces.
The definition of the LIKE operator states:
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator:
As mentioned in this answer:
This behavior is specified in SQL-92 and SQL:2008. For the purposes of comparison, the shorter string is padded to the length of the longer string.
From the draft (8.2 <comparison predicate>):
If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.
In addition to the other excellent solutions:
select binary 'a' = 'a '
I couldn't find any documentation, but it is widely known that trailing spaces are ignored when doing a text comparison.
To force a literal match, try this:
select *
from field
where dflt = ' '
and length(dflt) = 1; // length does not ignore trailing spaces
Related
I have a query where changing from like to = results in a match. My understanding was the = and like functioned the same unless wildcards were present. Neither _ nor % are in my string. Characters present in my string are:
acdeijknoprtuy4#-.
My query was:
SELECT * FROM UserEmails WHERE email like ?
which returned 0 results, I changed it to:
SELECT * FROM UserEmails WHERE email = ?
and I got the 1 expected result returned. This is running on percona version of mysql 5.6.41.
Your string, or matching database value, most likely have trailing spaces.
See documentation for MySQL 5.6:
All MySQL collations are of type PAD SPACE. This means that all CHAR,
VARCHAR, and TEXT values are compared without regard to any trailing
spaces. “Comparison” in this context does not include the LIKE
pattern-matching operator, for which trailing spaces are significant.
In MySQL 8.0:
Most MySQL collations have a pad attribute of PAD SPACE. The
exceptions are Unicode collations based on UCA 9.0.0 and higher, which
have a pad attribute of NO PAD.
I have a column containing values as strings. I need to keep only those that contain one of the following substrings: |MB1, |MB2, |MB3, |MB4, |MB5 and |MB6.
My starting point is:
select * from table
where column like '%|MB_%';
However, this would keep any other row with values such as |MBa or others. How do I get rid of them?
P.S. I am using MySQL
You can use MySQL's regular expression pattern matching;
WHERE `column` REGEXP '\\|MB[1-6]'
The pattern '\\|[1-6]' can be analysed as follows:
\\ is the string-encoding of a literal backslash, which is the regular expression escape character, so that
| is given no special meaning (if not escaped by backslash it would signify alternation, which would lead to an invalid pattern in this case)
MB are literal characters
[1-6] represents a single character within the range 1 through to 6
I've created the following test table:
CREATE TABLE t (
a VARCHAR(32) BINARY,
b VARBINARY(32)
);
INSERT INTO t (a, b) VALUES ( 'test ', 'test ');
INSERT INTO t (a, b) VALUES ( 'test \0', 'test \0');
But this query indicated no difference between the two types:
SELECT a, LENGTH(a), HEX(a), b, LENGTH(b), HEX(b) FROM t;
a LENGTH(a) HEX(a) b LENGTH(b) HEX(b)
--------- --------- ------------------ --------- --------- --------------------
test 8 7465737420202020 test 8 7465737420202020
test 9 746573742020202000 test 9 746573742020202000
Here are the difference I was able to find reading the documentation :
VARCHAR BINARY
The BINARY attribute cause the binary collation for the column character set to be used, and the column itself contains nonbinary character strings rather than binary byte strings.
When BINARY values are stored, they are right-padded with the pad value to the specified length.
You should consider the preceding padding and stripping characteristics carefully if you plan to use the BINARY data type for storing binary data and you require that the value retrieved be exactly the same as the value stored.
VARBINARY
If strict SQL mode is not enabled and you try to assign a value that exceeds the column's maximum length, the value is truncated to fit and a warning is generated.
There is no padding on insert, and no bytes are stripped on select. All bytes are significant in comparisons.
Utilisation is preferable when the value retrieved must be the same as the value specified for storage with no padding.
As the MySQL manual page on String Data Type Syntax explains, VARBINARY is equivalent to VARCHAR CHARACTER SET binary, while VARCHAR BINARY is equivalent to VARCHAR CHARACTER SET latin1 COLLATE latin1_bin (or some other non-binary character set with the corresponding binary collation; it depends on table settings):
Specifying the CHARACTER SET binary attribute for a character string data type causes the column to be created as the corresponding binary string data type: CHAR becomes BINARY, VARCHAR becomes VARBINARY, and TEXT becomes BLOB.
The BINARY attribute is a nonstandard MySQL extension that is shorthand for specifying the binary (_bin) collation of the column character set (or of the table default character set if no column character set is specified).
So, VARBINARY stores bytes; VARCHAR BINARY stores character codes but sorts them like bytes (almost - see below).
What this means in practice is explained on the manual page The binary Collation Compared to _bin Collations:
VARBINARY sorts by comparing byte by byte; VARCHAR BINARY compares the byte groups that correspond to characters (not much of a difference for most encodings)
VARCHAR BINARY performs a character set conversion when assigning value from another column with a different encoding, or when the value is inserted/updated by a client with a different encoding; VARBINARY just takes the value as a raw byte string.
Case conversion in SQL (ie. the LOWER / UPPER functions) has no effect on VARBINARY (bytes have no case).
Trailing spaces will be usually ignored in VARCHAR BINARY comparisons (that is, 'x ' = 'x' will be true).
If I enter two Strings with only white space. I will Get this error message:
ERROR 1062: Duplicate entry ' ' for key 'PRIMARY'
How can I Turn off "Auto-Trim" ?
I'm Using this Charset: uft8-uft8_bin and This Datatype: Varchar.
According to the SQL 92 documentation, when two strings are compared they are first made equal in length by padding the shortest string with spaces.
Search for 8.2 <comparison predicate> in the document.
If the length in characters of X is not equal to the length
in characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad char-
acters, where the pad character is chosen based on CS. If
CS has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any char-
acter in the character set of X and Y that collates less
than any string under CS. Otherwise, the pad character is a
<space>.
So in other words. Its not about storing the value with the amount of spaces you entered, but its the comparisment it does to check for duplicate primary key. So you cannot have two strings with a different amount of spaces act as a primary key
I have a database table with a primary key called PremiseID.
Its MySQL column definition is CHAR(10).
The data that goes into the column is always 10 digits, which is either a 9-digit number followed by a space, like '113091000 ' or a 9-digit number followed by a letter, like '113091000A'.
I've tried writing one of these values into a table in a test MySQL database table t1. It has three columns
mainid integer
parentid integer
premiseid char(10)
If I insert a row that has the following values: 1,1,'113091000 ' and try to read row back, the '113991000 ' value is truncated, so it reads '113091000'; that is the space is removed. If I insert a number like '113091000A', that value is retained.
How can I get the CHAR(10) field retain the space character?
I have a programmatic way around this problem. It would be to take the len('113091000'), realize it's nine characters, and then realize a length of 9 infers there is a space suffix for that number.
To quote from the MySQL reference:
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
So there's no way around it. If you're using MySQL 5.0.3 or greater, then using VARCHAR is probably the best way to go (the overhead is only 1 extra byte):
VARCHAR values are not padded when they are stored. Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.
If you're using MySQL < 5.0.3, then I think you just have to check returned lengths, or use a character other than a space.
Probably the most portable solution would be to just use CHAR and check the returned length.
Q: How can I get the CHAR(10) field retain the space character?
Actually, that space is retained and stored. It's the retrieval of the value that's removing the spaces. (The removal of the trailing spaces on returned values is a documented "feature".)
One option (as a workaround) is to modify your SQL query to append trailing spaces to the returned value, e.g.
SELECT RPAD(premiseid,10,' ') AS premiseid FROM t1
That will return your value with as a character string with a length of 10 characters, padded with spaces if the value is shorter than 10 characters, or truncated to 10 characters if its longer.
A standard CHAR(10) column will always have trailing spaces to pad out the string to the required length of 10 characters. As such, any deliberately trailing spaces will be blended in and, typically, stripped by your database adapter.
If possible, convert to a VARCHAR(10) column if you want to preserve the trailing spaces. You can do this with the ALTER TABLE statement.
Though Gordon's answer may still be right by itself, there is on later versions than mentioned a solution.
In your code run SET sql_mode = 'PAD_CHAR_TO_FULL_LENGTH';
With this session setting you'll retrieve perfect columns on full length of the CHAR(10), while VARCHAR does not when no trailing spaces are entered beforehand. If you don't need the spaces, you can always rtrim().