mysql compares accented vs unaccented characters as the same? - mysql

This does not make sense to me. Can anyone explain it? I think the column values should be different, so
select * from a1 where f1 = f2;
should find no rows. But...
mysql> create table a1 (f1 varchar(63), f2 varchar(63));
Query OK, 0 rows affected (0.00 sec)
mysql> show create table a1 \G
*************************** 1. row ***************************
Table: a1
Create Table: CREATE TABLE `a1` (
`f1` varchar(63) DEFAULT NULL,
`f2` varchar(63) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
mysql>
mysql> insert into a1 values ('EFBBBFD187D0B5D0BBD0BED0B2D0B5D0BA', 'EFBBBFD187D0B5D0BBD0BED0B2D0B5CC81D0BA');
Query OK, 1 row affected (0.02 sec)
mysql> update a1 set f1 = unhex(f1);
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update a1 set f2 = unhex(f2);
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select * from a1;
+-------------------+---------------------+
| f1 | f2 |
+-------------------+---------------------+
| человек | челове́к |
+-------------------+---------------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> select * from a1 where f1 = f2;
+-------------------+---------------------+
| f1 | f2 |
+-------------------+---------------------+
| человек | челове́к |
+-------------------+---------------------+
1 row in set (0.00 sec)
mysql> select * from a1 where hex(f1) = hex(f2);
Empty set (0.00 sec)
mysql>

The 3 bytes on the beginning, EFBBBF, is "BOM", which indicates that the text is UTF-8-encoded.
The rest look like Cyrillic челове́к, except for the "CC81 -- NSM COMBINING ACUTE ACCENT"
Some collations, including utf8mb4_0900_ai_ci, handle "combining accents", some do not. The "ai" means "accent insensitive".
I would understand this equivalence for a "latin" e. I don't know the rules for a CYRILLIC SMALL LETTER IE, which looks the same е, but is encoded differently.
You might want COLLATE utf8mb4_0900_as_ci, which is "accent sensitive and case insensitive".

Character equivalence is defined by the collation used by the columns in question. A collation defines every pair of characters as equal, less than, or greater than, and this is used for comparisons and for sorting.
Your table uses utf8mb4_0900_ai_ci as the default collation, and this applies to all the columns, since they do not define a collation to override the table's default.
It's pretty common for collations to treat accented characters as equal to their unaccented versions.
If you want to choose a different collation, you may.

Related

What mean by char(40)?

I have a mysql table which has a data structure as follows,
create table data(
....
name char(40) NULL,
...
)
But I could insert names which has characters more than 40 in to name field. Can someone explain what is the actual meaning of char(40)?
You cannot insert a string of more than 40 characters in a column defined with the type CHAR(40).
If you run MySQL in strict mode, you will get an error if you try to insert a longer string.
mysql> create table mytable ( c char(40) );
Query OK, 0 rows affected (0.01 sec)
mysql> insert into mytable (c) values ('Now is the time for all good men to come to the aid of their country.');
ERROR 1406 (22001): Data too long for column 'c' at row 1
If you run MySQL in non-strict mode, the insert will succeed, but only the first 40 characters of your string is stored in the column. The characters beyond 40 are lost, and you get no error.
mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into mytable (c) values ('Now is the time for all good men to come to the aid of their country.');
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> show warnings;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'c' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
mysql> select c from mytable;
+------------------------------------------+
| c |
+------------------------------------------+
| Now is the time for all good men to come |
+------------------------------------------+
1 row in set (0.00 sec)
I recommend operating MySQL in strict mode (strict mode is the default since MySQL 5.7). I would prefer to get an error instead of losing data.

MySQL decimal field 'Data truncated for column x at row 1' issue

I have a mysql table with a decimal(16,2) field. Seems like the addition operation with another decimal(16,2) field string can cause the Data truncated for column x at row 1 issue, which raises exception in my django project.
I'm aware of multiplication or division operation of that field can cause this issue bacause the result is probably not fit in decimal(16,2) definition, but does the addition and subtraction operation the same?
My MySQL server version is 5.5.37-0ubuntu0.14.04.1. You can reproduce this issue from bellow:
mysql> drop database test;
Query OK, 1 row affected (0.10 sec)
mysql> create database test;
Query OK, 1 row affected (0.00 sec)
mysql> use test;
Database changed
mysql> create table t(price decimal(16,2));
Query OK, 0 rows affected (0.16 sec)
mysql> insert into t values('2004.74');
Query OK, 1 row affected (0.03 sec)
mysql> select * from t;
+---------+
| price |
+---------+
| 2004.74 |
+---------+
1 row in set (0.00 sec)
mysql> update t set price = price + '0.09';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> update t set price = price + '0.09';
Query OK, 1 row affected, 1 warning (0.03 sec)
Rows matched: 1 Changed: 1 Warnings: 1
mysql> show warnings;
+-------+------+--------------------------------------------+
| Level | Code | Message |
+-------+------+--------------------------------------------+
| Note | 1265 | Data truncated for column 'price' at row 1 |
+-------+------+--------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t;
+---------+
| price |
+---------+
| 2004.92 |
+---------+
1 row in set (0.00 sec)
There are two problems:
You are not storing decimal values, you're trying to store string/varchar, which is converted into double value by mysql, for example following code does not give errors update t set price = price + 0.09; (even executed several times)
Anyway this code gives expected warning (note number) update t set price = price + 0.091; you can change it to update t set price = price + cast(0.091 as decimal(16,2)); of course with cast you can use string values too update t set price = price + cast('0.09' as decimal(16,2));
In my case problem occurs when I try to insert a decimal with 3 digits after the the dot like: 0.xxx on a column defined as DECIMAL(10,2)
I changed it to DECIMAL(10,3) OR used php to enter values like 0.xx on DECIMAL(10,2) table

Confused about comparing date fields with literals

I am puzzled by the behaviour below. Why does the first SELECT statement return 1 while the second statement returns 0? I expect them both to return 1 as the date is greater than or equal to the literal.
Why does collation affect date comparison? When comparing dates against literals, is it wrong to represent the date (or date time) as a string? If so how should I be doing date vs literal comparisons?
mysql> CREATE DATABASE test;
Query OK, 1 row affected (0.00 sec)
mysql> USE test;
mysql> SET NAMES utf8 COLLATE utf8_general_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE foo (
bar date NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.15 sec)
mysql> INSERT INTO foo (bar) VALUES ('2013-01-01');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT COUNT(*) FROM foo WHERE bar >= '2013-01-01 00:00:00';
+----------+
| COUNT(*) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
mysql> SET NAMES utf8 COLLATE utf8_unicode_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT COUNT(*) FROM foo WHERE bar >= '2013-01-01 00:00:00';
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Have you tried
SELECT COUNT(*) FROM foo WHERE bar >= _utf8'2013-01-01 00:00:00'
Explanation's here
You probably assume that the db is doing an IMPLICIT type converion of your literal string into a date and comparing the dates. It looks like the db is doing an IMPLICIT type conversion of the date into a string and comparing the strings. The collation affects this conversion and so affects your result.
Try:
SET NAMES utf8 COLLATE utf8_general_ci;
SELECT * FROM foo;
SET NAMES utf8 COLLATE utf8_unicode_ci;
SELECT * FROM foo;
The two queries should give different results that explain the behaviour.
In any case cha's suggestion will work because you are telling the db to EXPLICITLY convert the literal string to a date and then compare the dates.

What happen's to content inside in rows on columns defined as varchar, when illegally reduced length?

Certainly a noobish question, but I got to ask: :-) Assuming a column of type varchar and length 255 and the longest string stored in a row at this column shold have length 200. What happens, if I altered the columns length to less then 200? Would the strings all get "cut"?
By default, it will allow you to alter the column, it will truncate strings longer than the new length, and it will generate a warning.
mysql> create table t (v varchar(20));
Query OK, 0 rows affected (0.06 sec)
mysql> insert into t values ('12345678901234567890');
Query OK, 1 row affected (0.00 sec)
mysql> alter table t modify column v varchar(10);
Query OK, 1 row affected, 1 warning (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 1
mysql> show warnings;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'v' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
mysql> select * from t;
+------------+
| v |
+------------+
| 1234567890 |
+------------+
1 row in set (0.00 sec)
If you have the SQL mode STRICT_ALL_TABLES or STRICT_TRANS_TABLES set, the warning becomes an error and the ALTER will fail.
mysql> alter table t modify column v varchar(10);
ERROR 1265 (01000): Data truncated for column 'v' at row 1

MySql FLOAT datatype and problems with more then 7 digit scale

We are using MySql 5.0 on Ubuntu 9.04. The full version is: 5.0.75-0ubuntu10
I created a test database. and a test table in it. I see the following output from an insert statement:
mysql> CREATE TABLE test (floaty FLOAT(8,2)) engine=InnoDb;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into test value(858147.11);
Query OK, 1 row affected (0.01 sec)
mysql> SELECT * FROM test;
+-----------+
| floaty |
+-----------+
| 858147.12 |
+-----------+
1 row in set (0.00 sec)
There seems to be a problem with the scale/precision set up in mySql...or did I miss anything?
UPDATE:
Found a boundary for one of the numbers we were inserting, here is the code:
mysql> CREATE TABLE test (floaty FLOAT(8,2)) engine=InnoDb;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into test value(131071.01);
Query OK, 1 row affected (0.01 sec)
mysql> insert into test value(131072.01);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM test;
+-----------+
| floaty |
+-----------+
| 131071.01 |
| 131072.02 |
+-----------+
2 rows in set (0.00 sec)
mysql>
Face Palm!!!!
Floats are 32 bit numbers stored as mantissa and exponents. I am not 100% sure how MySql will split the storage but taking Java as an example they would use 24 bits for a signed mantissa and 8 bits for an exponent (scientific notation). This means that the maximum value a FLOAT can have is +8388608*10^127 and the minimum is -8388608*10^127. This means only 7 significant digits, and my FLOAT definition used 8.
We are going to switch all of these 8,2 to DOUBLE from FLOAT.
MySQL docs mention "MySQL performs rounding when storing values" and I suspect this is the issue here. I duplicated your issue but changed the storage type to be DOUBLE:
CREATE TABLE test (val, DOUBLE);
and the retrieved value matched the test value you provided.
My suggestion, for what it's worth, is use DOUBLE or maybe DECIMAL. I tried the same original test with:
CREATE TABLE test (val, DECIMAL(8,2));
and it retrieved the value I gave it: 858147.11.