MariaDB Character Encoding - mysql

I have just started to port an older MySQL/Spring/Eclipselink project to MariaDB. I am encountering an issue with table creation that can be demonstrated as follows:
MariaDB [spasm]> CREATE TABLE Configuration (ID BIGINT NOT NULL, Attribute VARCHAR(190) NOT NULL UNIQUE, Value VARCHAR(255) NOT NULL, PRIMARY KEY (ID));
Query OK, 0 rows affected (0.07 sec)
MariaDB [spasm]> drop table Configuration;
Query OK, 0 rows affected (0.06 sec)
MariaDB [spasm]> CREATE TABLE Configuration (ID BIGINT NOT NULL, Attribute VARCHAR(255) NOT NULL UNIQUE, Value VARCHAR(255) NOT NULL, PRIMARY KEY (ID));
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
MariaDB [spasm]>
MariaDB [spasm]> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I understand that this is related to character encoding, however I don't know how to manage/correct it?

The new default CHARACTER SET is utf8mb4. It is complaining about the UNIQUE index:
Attribute VARCHAR(255) NOT NULL UNIQUE
If you are hitting the limit because of trying to use CHARACTER SET utf8mb4. Then do one of the following (each has a drawback) to avoid the error:
⚈ Upgrade to 5.7.7 (MariaDB 10.2.2?) for 3072 byte limit -- your cloud may not provide this;
⚈ Change 255 to 191 on the VARCHAR -- you lose any values longer than 191 characters (unlikely?);
⚈ ALTER .. CONVERT TO utf8 -- you lose Emoji and some of Chinese;
⚈ Use a "prefix" index -- you lose some of the performance benefits.
Or... Stay with 5.6/5.5/10.1 but perform 4 steps to raise the limit to 3072 bytes:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
SET GLOBAL innodb_large_prefix=1;
logout & login (to get the global values);
ALTER TABLE tbl ROW_FORMAT=DYNAMIC; -- (or COMPRESSED)

Related

Why can't I store 4 byte emojis in this mysql field?

I have a database in which I require that only two fields in one table allow for 4 byte emojis to be stored. I did this (obviously with the correct table and column names):
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
I know it worked, because when I do show create table chatbots_proposalarea; it shows me this:
CREATE TABLE `chatbots_proposalarea` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) COLLATE utf8mb4_bin NOT NULL,
`proposal` varchar(1500) COLLATE utf8mb4_bin DEFAULT NULL,
`candidate_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` (`candidate_id`),
CONSTRAINT `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` FOREIGN KEY (`candidate_id`) REFERENCES `chatbots_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
The fields "name" and "proposal" do seem to have the uft8mb4_bin collation, and it showed no error when running the commands. However, when I try and save a value there such as "Seguridad 🍞", it gives me the error
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x8D\x9E' for column 'name' at row 1
Any help in discovering what am I missing would be very appreciated.
NOTES
This is on a Django project, and mounted on an Ubuntu server, and the SQL version is mysql Ver 14.14 Distrib 5.7.28, for Linux (x86_64)
I don't see why that would make a difference, but the same happens when I update directly to the database by doing
UPDATE chatbots_proposalarea SET name='Seguridad 🍞' where id=1;
A solution was recommended, but that one depended on the usage of triggers, and that was the cause of the issue, which isn't my case.
UPDATE
If it shows any important information, when I run show variables where Variable_name like 'character\_set\_%' or Variable_name like 'collation%';
I get the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+--------------------------+--------------------+
I changed these values to the same that a coworker has in one of his projects, in which he also requires 4 byte emojis to be stored.
That's not the table's fault; it is the connection. Something like:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'charset': 'utf8mb4',
'use_unicode': True, },
},
}

Inserting 4-byte unicode characters into MySQL/MariaDB

When attempting to insert 💩 (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:
Incorrect string value: '\xF0\x9F\x92\xA9'
The statement:
mysql> insert into bob (test) values ('💩');
Here's my database's charset/collation:
mysql> select ##collation_database; +----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database; +--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
1 row in set (0.00 sec)
The server's character set:
mysql> show global variables like '%character_set_server%'\G; *************************** 1. row ***************************
Variable_name: character_set_server
Value: utf8mb4
The table:
create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
Can anyone point me in the right direction?
Yes, as you commented, you need to use SET NAMES utf8mb4.
Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.
SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.
In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.

Can you store more than N characters in varchar(N) in MySQL?

I have the following table on MySQL. I am using 5.6.32. The table contains about ~40 million records. I am only sharing columns which I feel are necessary to understand the issue.
Table Structure
create table `random` (
`id` bigint(20) not null auto_increment,
`some_id` bigint(20) not null,
`latitude` decimal(20,14) default null,
`longitude` decimal(20,14) default null,
`new_column` varchar(255) collate utf8_unicode_ci default null,
primary key (`id`)
) engine=innodb auto_increment=40878872 default charset=utf8 collate=utf8_unicode_ci;
So, I added a new column in this table called new_column varchar(255). But, when I do length(new_column), there are entries which have more than 255 characters.
The actual value being inserted:
random*GS02,355234054262743,GPS:356728;A;N33.614073;E77.063096;0;0;230118,STT:400;0,ADC:0���&�������������r�������r�������r������*GS02,39233054663793,GPS:173158;A;N33.614057;E77.0263201;0;0;210118,STT:200;0,ADC:0;24.7;1;29.9;2;4.2;3;0.0
On the MySQL Master (say, machine #1, my application was able to insert this value in new_column in the table without an issue. I have a MySQL slave (say, machine #2) using native MySQL replication and it was also able to replicate this record easily. But then I have another slave replicating from machine #2 which is using tungsten replicator. Whenever there is a string which is more than 255 characters, tungsten throws the following error and replication breaks
pendingError : Event application failed: seqno=2395306016 fragno=0 message=java.sql.SQLDataException: Data too long for column 'new_column' at row 1
pendingExceptionMessage: java.sql.SQLDataException: Data too long for column 'new_column' at row 1
EDIT:
Variables on Master and both Slaves are set to
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
Collation while creating the table is set to utf8_unicode_ci on all instances.
Why is it that MySQL native replication is allowing more characters to be written to the column? And why is it that Tungsten replicator is preventing it?

Can't insert unicode into mysql

I am having a problem with mysql when I tried to insert unicode (chinese here) in mysql, for example, I want to insert:
insert into site_parameter(name) values("测试");
however in mysql terminal, it become:
mysql> insert into site_parameter(name) values("");
Query OK, 1 row affected (0.00 sec)
I can't even type chinese under the mysql terminal.
here is my.conf
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
I have checked the collation:
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> show create table site_parameter;
| site_parameter | CREATE TABLE `site_parameter` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=388 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
1 row in set (0.00 sec)
could anyone help?
There is a charset property in console window (CMD).
You can check current or set another with chcp command -
>chcp
>will output current
>chcp 65001
>will set utf-8
Some information about this command - Chcp.
As Devart said, the link below does provide the solution:
https://superuser.com/questions/55224/change-the-default-codepage-from-latin1-to-utf8-on-a-linux-machine
Here is the solution detail:
Edit /var/lib/locales/supported.d/local and add your locale to the list of supported locales if it isn't there already, eg:
en_US UTF-8
Regenerate the supported locales on your machine:
sudo dpkg-reconfigure locales
Open /etc/default/locale and check if LANG and LANGUAGE are changed:
LANG="en_US"
LANGUAGE="en_US:UTF-8"
if they are not, you can manually update them now.
reboot.

mysql update off by one character

I am using mysql and in the table 'items' updates on the variable image_url 'succeed' with no warnings. But, in reality, the update is failing: it prepends the value with a space and deletes the last character of the value I give it.
Here is the update:
UPDATE items
SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg'
WHERE id=38;
Here is the select:
select * from items\G;
Here is one line of the output:
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
...
notice the missing 'g' at the end and the extra space at the beginning.
How do I stop this?
Here is some system info that may help:
mysql> show variables LIKE '%version%';
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.46 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.46-0ubuntu0.14.04.2 |
| version_comment | (Ubuntu) |
| version_compile_machine | i686 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
EDIT 1 Table description:
mysql> desc items;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
...
| image_url | varchar(255) | NO | | NULL | |
...
EDIT 2 Checking for triggers:
mysql> show triggers \G
Empty set (0.00 sec)
EDIT 3 Another example:
I am doing all these commands from command line. Another example:
UPDATE items SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg33333333333333' WHERE id=38;
select * from items\G;
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg3333333333333
...
EDIT 4 Checking length of inputs and outputs:
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
1 row in set (0.00 sec)
http://www.lettercount.com/ gives http://ecx.images-amazon.com/images/I/61Dz5t8wjQL.SX522.jpg 61 characters as well, which makes sense given that the update is not changing the length of the string, just deleting the last characters and adding a space to the beginning,
EDIT 5 Trying encoding:
base64 encoding:
aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==
mysql> UPDATE items SET image_url = 'aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==' WHERE id=38;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
length(image_url): 84
1 row in set (0.00 sec)
decoding: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
gives:
http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg
EDIT 6 Checking if Insert fails as well:
mysql> INSERT INTO items (url, image_url) VALUES('http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg', 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg');
Query OK, 1 row affected, 2 warnings (0.03 sec)
the warnings are because I did not give all the values where NULL:NO values in this insert
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field 'created_at' doesn't have a default value |
| Warning | 1364 | Field 'updated_at' doesn't have a default value |
+---------+------+-------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select image_url,length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
So, it also fails on insert.
EDIT 7 create table information
mysql> show create table items\G;
*************************** 1. row ***************************
Table: items
Create Table: CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`image_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`color` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`store` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_items_on_id` (`id`),
KEY `index_items_on_url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 8 More table and column information
select * from information_schema.columns where table_name='items' and column_name='image_url'\G;
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: development_database
TABLE_NAME: items
COLUMN_NAME: image_url
ORDINAL_POSITION: 5
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 255
CHARACTER_OCTET_LENGTH: 765
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
CHARACTER_SET_NAME: utf8
COLLATION_NAME: utf8_unicode_ci
COLUMN_TYPE: varchar(255)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
2 rows in set (0.01 sec)
ERROR:
No query specified
EDIT 9 Charlength readouts
mysql> select image_url,length(image_url),char_length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
char_length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 10 showing variables like character
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
EDIT 11: THE POTENTIAL ISSUE
The error does not appear in the users table, but it does occur in the items table. Here is the difference that I think may be causing the issue. (I do not yet have a solution since the item table has that UTF-8 for a reason: urls can have some funky characters)
show create table users\G;
ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
show create table items\G;
ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
To be honest I think this should be a Community Answer, as I was a
little later on the scene and others had done some important ground work
establishing what was and was not a factor in this issue.
This link may be relevant, as your table character set is utf8 so the last character in the string may be getting skewed (and not saving correctly, thus disappearing).
All of the rows in EDIT 10 which reference latin1 or utf8 character set collations should be the same, and ideally should be utf8mb4 . I would now hazard a guess that the saving of UTF-8 characters in a non-true-utf-8 character collation is meaning the final character of any string is an incomplete reference and so not displaying.
So to solve your issue run the command:
ALTER TABLE items CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For info / background:
utf8mb4 is the full and complete UTF-8 character set and so will show any and every character that can be used in a web address. If there are some obscure characters in the data I suggest you change the column to a BLOB column before then changing it to a utf8mb4 column, because this will preserve the correct character definitions as input rather than as assumed by MySQL on the data already entered.
You do not want utf8_ character sets, in MySQL that is as good as broken, what you want is utf8mb4, the standard UTF8 definition in MySQL is compromised because it saves 4-byte characters in 3-byte blocks and thus corrupts saved character data.