Why is MySQL treating é the same as e? - mysql

I'm storing Unicode strings in a MySQL database with a Django web application. I can store Unicode data fine, but when querying, I find that é and e are treated as if they were the same character:
In [1]: User.objects.filter(last_name='Jildén')
Out[1]: [<User: Anders Jildén>]
In [2]: User.objects.filter(last_name='Jilden')
Out[2]: [<User: Anders Jildén>]
This is also the case when using the MySQL shell directly:
mysql> select last_name from auth_user where last_name = 'Jildén';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.01 sec)
Here are the database charset settings:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/Cellar/mysql/5.1.54/share/mysql/charsets/ |
+--------------------------+------------------------------------------------------+
here's the table schema:
CREATE TABLE `auth_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) CHARACTER SET utf8 NOT NULL,
`first_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`last_name` varchar(30) CHARACTER SET utf8 NOT NULL,
`email` varchar(200) CHARACTER SET utf8 NOT NULL,
`password` varchar(128) CHARACTER SET utf8 NOT NULL,
`is_staff` tinyint(1) NOT NULL,
`is_active` tinyint(1) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`last_login` datetime NOT NULL,
`date_joined` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=7952 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and here are the options I'm passing via Django's DATABASES setting:
DATABASES = {
'default': {
# ...
'OPTIONS': {
'charset': 'utf8',
'init_command': 'SET storage_engine=INNODB;',
},
},
}
Note that I have tried setting the table collation to utf8_bin, with no effect:
mysql> alter table auth_user collate utf8_bin;
mysql> select last_name from auth_user where last_name = 'Jilden';
+-----------+
| last_name |
+-----------+
| Jildén |
+-----------+
1 row in set (0.00 sec)
How can I get MySQL to treat these as different characters?

You were nearly there when you changed the table collation, but not quite. In MySQL, each column in a table has its own character set and collation. The table has its own character set and collation, but this does not override the column collations; it only determines what the collation will be for new columns that are added for which you don't specify the collation. So you haven't changed the collation of the column that you're interested in.
ALTER TABLE tablename MODIFY columnname
varchar(???) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL

You need to set a collation that treats diacritics as significant. Try using utf8_bin

I think it would be important to know the charset of the table and the field you are querying.
The answer to your question could be found here
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html
Maybe the field you are querying has the utf8_general_ci charset.
To obtain what you want you should set the charset of that field as utf8_unicode_ci
Remember that, as the manual says, queries on utf8_unicode_ci charset fields are slower than queries on utf8_general_ci fields

Related

Why can't I store 4 byte emojis in this mysql field?

I have a database in which I require that only two fields in one table allow for 4 byte emojis to be stored. I did this (obviously with the correct table and column names):
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
I know it worked, because when I do show create table chatbots_proposalarea; it shows me this:
CREATE TABLE `chatbots_proposalarea` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) COLLATE utf8mb4_bin NOT NULL,
`proposal` varchar(1500) COLLATE utf8mb4_bin DEFAULT NULL,
`candidate_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` (`candidate_id`),
CONSTRAINT `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` FOREIGN KEY (`candidate_id`) REFERENCES `chatbots_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
The fields "name" and "proposal" do seem to have the uft8mb4_bin collation, and it showed no error when running the commands. However, when I try and save a value there such as "Seguridad 🍞", it gives me the error
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x8D\x9E' for column 'name' at row 1
Any help in discovering what am I missing would be very appreciated.
NOTES
This is on a Django project, and mounted on an Ubuntu server, and the SQL version is mysql Ver 14.14 Distrib 5.7.28, for Linux (x86_64)
I don't see why that would make a difference, but the same happens when I update directly to the database by doing
UPDATE chatbots_proposalarea SET name='Seguridad 🍞' where id=1;
A solution was recommended, but that one depended on the usage of triggers, and that was the cause of the issue, which isn't my case.
UPDATE
If it shows any important information, when I run show variables where Variable_name like 'character\_set\_%' or Variable_name like 'collation%';
I get the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+--------------------------+--------------------+
I changed these values to the same that a coworker has in one of his projects, in which he also requires 4 byte emojis to be stored.
That's not the table's fault; it is the connection. Something like:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'charset': 'utf8mb4',
'use_unicode': True, },
},
}

Mysql select always returns empty set

So for some reason my Mysql table always returns an empty set
mysql> show table status like 'test_table';
+-----------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+------------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-----------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+------------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| test_table | InnoDB | 10 | Compact | 1625218 | 749 | 1218363392 | 0 | 0 | 1234173952 | NULL | 2015-07-25 12:03:40 | NULL | NULL | utf8mb4_unicode_ci | NULL | | |
+-----------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+------------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
1 row in set (0.00 sec)
mysql> select * from test_table;
Empty set (0.00 sec)
mysql>
Any advice on how I can debug this?
Here's the create table
| test_table | CREATE TABLE `test_table` (
`export_date` bigint(20) DEFAULT NULL,
`id` int(11) NOT NULL DEFAULT '0',
`title` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`recommended_age` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`artist_name` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`seller_name` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`company_url` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`support_url` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`view_url` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`artwork_url_large` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`artwork_url_small` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`release_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
Turns out it was an import problem. I imported my data with a Python script and didn't have autocommit set to true (the script was written for an older version of MYSQL)
Its possibly the mySQL server does not know which database you are searching and is defaulting to the default schema where the table doesnt exist?
Try using the USE verb with your database name.
eg: if your database was named db1 and your table was mytable.
USE db1;
SELECT * FROM mytable;
# selects from db1.mytable
It is possible that your data is not flush properly in to the table, run flush table by the following command and then check again.
FLUSH TABLES;

Non ASCII colum name in mysql

Can I use UTF-8 names in column name on data base? Like example here:
$zapytaj = mysql_query("SELECT * FROM users WHERE `użytkownicy` = '$nazwaużytkownika' ");
This give me error:
Unknown column 'użytkownicy' in 'where clause'
Can someone explain why this is not working?
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-------------
| Variable_name | Value
+--------------------------+-------------
| character_set_client | utf8mb4
| character_set_connection | utf8mb4
| character_set_database | utf8mb4
| character_set_filesystem | binary
| character_set_results | utf8mb4
| character_set_server | latin1
| character_set_system | utf8
mysql> SELECT COLUMN_NAME, HEX(COLUMN_NAME)
FROM information_schema.columns WHERE table_name = "so31349641";
+--------------+--------------------------+
| COLUMN_NAME | HEX(COLUMN_NAME) |
+--------------+--------------------------+
| id | 6964 |
| użytkownicy | 75C5BC79746B6F776E696379 | -- Note the C5BC for ż
| hasło | 686173C5826F | -- and the C582 for ł
+--------------+--------------------------+
If I delete `` from użytkownicy I see this error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '�ytkownicy = 'xxx'' at line 1
Maybe PHP file are don't have UTF8 coding? How to check this file in PHPStorm?
!SOLUTION!
If You have this error just change mysql to PDO that should fix Your problem.
To answer your stated question, column names are utf8:
mysql> SHOW CREATE TABLE information_schema.columns\G
*************************** 1. row ***************************
Table: COLUMNS
Create Table: CREATE TEMPORARY TABLE `COLUMNS` (
`TABLE_CATALOG` varchar(512) NOT NULL DEFAULT '',
`TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
`TABLE_NAME` varchar(64) NOT NULL DEFAULT '',
`COLUMN_NAME` varchar(64) NOT NULL DEFAULT '', -- NOTE --
`ORDINAL_POSITION` bigint(21) unsigned NOT NULL DEFAULT '0',
`COLUMN_DEFAULT` longtext,
`IS_NULLABLE` varchar(3) NOT NULL DEFAULT '',
`DATA_TYPE` varchar(64) NOT NULL DEFAULT '',
`CHARACTER_MAXIMUM_LENGTH` bigint(21) unsigned DEFAULT NULL,
`CHARACTER_OCTET_LENGTH` bigint(21) unsigned DEFAULT NULL,
`NUMERIC_PRECISION` bigint(21) unsigned DEFAULT NULL,
`NUMERIC_SCALE` bigint(21) unsigned DEFAULT NULL,
`DATETIME_PRECISION` bigint(21) unsigned DEFAULT NULL,
`CHARACTER_SET_NAME` varchar(32) DEFAULT NULL,
`COLLATION_NAME` varchar(32) DEFAULT NULL,
`COLUMN_TYPE` longtext NOT NULL,
`COLUMN_KEY` varchar(3) NOT NULL DEFAULT '',
`EXTRA` varchar(30) NOT NULL DEFAULT '',
`PRIVILEGES` varchar(80) NOT NULL DEFAULT '',
`COLUMN_COMMENT` varchar(1024) NOT NULL DEFAULT ''
) ENGINE=MyISAM DEFAULT CHARSET=utf8
To get to the root of the implied question ("Why does the query fail"), let's see
SHOW VARIABLES LIKE 'character%';
Edit
Well, something un-obvious going on. This works for me:
mysql> create table so31349641 (
id int(11) NOT NULL AUTO_INCREMENT,
użytkownicy varchar(24) NOT NULL,
hasło varchar(24) NOT NULL, PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
mysql> INSERT INTO so31349641 VALUES (1, 'a', 'b');
mysql> SELECT * FROM so31349641 WHERE użytkownicy = 'a';
+----+--------------+--------+
| id | użytkownicy | hasło |
+----+--------------+--------+
| 1 | a | b |
+----+--------------+--------+
This seems ordinary:
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-------------
| Variable_name | Value
+--------------------------+-------------
| character_set_client | utf8
| character_set_connection | utf8
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | utf8
| character_set_server | latin1
| character_set_system | utf8
Looking in the IS:
mysql> SELECT COLUMN_NAME, HEX(COLUMN_NAME)
FROM information_schema.columns WHERE table_name = "so31349641";
+--------------+--------------------------+
| COLUMN_NAME | HEX(COLUMN_NAME) |
+--------------+--------------------------+
| id | 6964 |
| użytkownicy | 75C5BC79746B6F776E696379 | -- Note the C5BC
| hasło | 686173C5826F | -- and the C582 for ł
+--------------+--------------------------+
That is as I would expect it.
My char% values are different than yours, but I think we are both "OK" for this situation.
Try a SELECT on the information_schema similar to what I did.
Next, what is your client? PHP? Something else? Perhaps the encoding is incorrect in the client.
(Rather than trying to use HTML tags in a Comment, Edit your original question with the added info.)
It looks like UTF-8 in SQL is not default, but tables/databases can be changed to be so.
Some potentially helpful links:
The mySQL documentation on charsets:
https://dev.mysql.com/doc/refman/5.0/en/charset.html
A SO question on determining the charset:
determining the character set of a table / database?
On changing the charset: http://makandracards.com/makandra/2529-show-and-change-mysql-default-character-set
Hope this helps.

I read that 'rlike' is case insensitive in MySQL -- but it's not working like that for me

According to http://dev.mysql.com/doc/refman/5.1/en/regexp.html, "REGEXP is not case sensitive, except when used with binary strings." Well... I'm not using binary strings -- at least, as I understand them (and as they imply in their examples). And yet...
mysql> select hostname from hosts where hostname regexp '17503a';
+-----------------------+
| hostname |
+-----------------------+
| ccdn-ats-tk-17503a-01 |
| ccdn-ats-tk-17503a-02 |
+-----------------------+
2 rows in set (0.08 sec)
mysql> select hostname from hosts where hostname regexp '17503A';
+-------------------+
| hostname |
+-------------------+
| ccdn-ss-17503A-01 |
| ccdn-ss-17503A-02 |
| ccdn-ss-17503A-03 |
| ccdn-ss-17503A-04 |
+-------------------+
4 rows in set (0.08 sec)
That looks an awful lot like a case-sensitive search to me. Any help?
As requested, here's the (abbreviated) schema:
CREATE TABLE `hosts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`hostname` varchar(60) COLLATE utf8_bin DEFAULT NULL,
`status` enum('active','decommissioned','offlined','deploy','down') COLLATE utf8_bin DEFAULT 'deploy',
`onteak` int(10) DEFAULT NULL,
`nagios` enum('monitored','unmonitored') COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `serial_num` (`serial_num`),
UNIQUE KEY `ip` (`ip`),
UNIQUE KEY `hostname` (`hostname`),
KEY `fk_loc` (`loc`),
KEY `hostname_idx` (`hostname`)
) ENGINE=MyISAM AUTO_INCREMENT=43075 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
And I'm going to go out on a limb and guess that the whole "collate=utf8_bin" is what's biting me. Thanks!
Update: spencer7593 nailed a work-around -- very excited:
mysql> SELECT hostname FROM hosts WHERE hostname REGEXP '17503a' COLLATE utf8_general_ci ;
+-----------------------+
| hostname |
+-----------------------+
| ccdn-ats-tk-17503a-01 |
| ccdn-ats-tk-17503a-02 |
| ccdn-ss-17503A-01 |
| ccdn-ss-17503A-02 |
| ccdn-ss-17503A-03 |
| ccdn-ss-17503A-04 |
+-----------------------+
6 rows in set (0.03 sec)
Yep. The utf8_bin is a binary collation, and that is effectively case sensitive.
You could try specifying a case insensitive collation; I've done this with equality comparisons, but never tried it with REGEXP ...
SELECT hostname FROM hosts WHERE hostname REGEXP '17503a' COLLATE utf8_general_ci ;
^^^^^^^^^^^^^^^^^^^^^^^
or
SELECT hostname FROM hosts WHERE hostname COLLATE utf8_general_ci REGEXP '17503a' ;
^^^^^^^^^^^^^^^^^^^^^^^
(Give one of those a whirl and see how big a smoke ball it makes.)

utf8 and unicode getting warning messages in mysql

I have a mysql table. When I try to insert, I get this:
Warning: Incorrect string value: '\xAE</...' for column 'value' at row 1
mysql> show create table Configurations;
| Configurations | CREATE TABLE `Configurations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`ckey` varchar(255) NOT NULL,
`value` mediumtext,
PRIMARY KEY (`id`),
KEY `ckey` (`ckey`),
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8 |
mysql> SHOW VARIABLES LIKE 'coll%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
I googled the hell out of the error, and it all seemed to boil down to utf8 being set as my default character set. I've been like that for a while. I'm not sure what else to do. Help?
You could try these-
Check what the encoding of the string is before you enter them into your DB, which probably expects it in utf8.
This guy suggests you update each of your table columns to take in utf8.