MySQL UTF8 varchar column size - mysql

MySQL documentation says that since 5.0, varchar lengths refer to character units, not bytes. However, I recently came across an issue where I was getting truncated data warnings when inserting values that should have fit into the varchar column it was designated.
I replicated this issue with a simple table in v5.1
mysql> show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`string` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
I then inserted multiple 10 characters values with differing amounts of UTF8 characters
mysql> insert into test (string) values
-> ('abcdefghij'),
-> ('ãáéíçãáéíç'),
-> ('ãáéíç67890'),
-> ('éíç4567890'),
-> ('íç34567890');
Query OK, 5 rows affected, 4 warnings (0.06 sec)
Records: 5 Duplicates: 0 Warnings: 4
mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'string' at row 2 |
| Warning | 1265 | Data truncated for column 'string' at row 3 |
| Warning | 1265 | Data truncated for column 'string' at row 4 |
| Warning | 1265 | Data truncated for column 'string' at row 5 |
+---------+------+---------------------------------------------+
mysql> select * from test;
+------------+
| string |
+------------+
| abcdefghij |
| ãáéíç |
| ãáéíç |
| éíç4567 |
| íç345678 |
+------------+
5 rows in set (0.00 sec)
I think that this shows that the varchar size is still defined in bytes or at least, is not accurate in character units.
The question is, am I understanding the documentation correctly and is this a bug? Or am I misinterpreting the documentation?

It's true that VARCHAR and CHAR sizes are considered in characters, not bytes.
I was able to recreate your issue when I set my connection character set to latin1 (single byte).
Ensure that you set your connection character set to UTF8 prior to running the insertion query with the following command:
SET NAMES utf8
If you don't do this, a two-byte UTF8 character will get sent as two single-byte characters.
You might consider changing your default client character set.

Related

AWS MariaDB Statement could not be executed

Seem to have a bit of a character encoding issue with MariaDB on AWS that I can't seem to resolve;
Statement could not be executed (22007 - 1366 - Incorrect string value: '\xA320 Of...'
Initially I presumed this was because the table was set to latin1 but I've since changed the table and the column to utf8mb4_unicode_ci and the error persists.
Since your column definition is utf8, you also need to insert utf8 data.
\xA320 is not a valid utf8 character:
mysql> select convert(X'A320' using utf8mb4);
+--------------------------------+
| convert(X'A320' using utf8mb4) |
+--------------------------------+
| ? |
+--------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------+
| Warning | 1300 | Invalid utf8mb4 character string: '\xA3 ' |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)

Inserting 4-byte unicode characters into MySQL/MariaDB

When attempting to insert 💩 (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:
Incorrect string value: '\xF0\x9F\x92\xA9'
The statement:
mysql> insert into bob (test) values ('💩');
Here's my database's charset/collation:
mysql> select ##collation_database; +----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database; +--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
1 row in set (0.00 sec)
The server's character set:
mysql> show global variables like '%character_set_server%'\G; *************************** 1. row ***************************
Variable_name: character_set_server
Value: utf8mb4
The table:
create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
Can anyone point me in the right direction?
Yes, as you commented, you need to use SET NAMES utf8mb4.
Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.
SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.
In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.

mysql update off by one character

I am using mysql and in the table 'items' updates on the variable image_url 'succeed' with no warnings. But, in reality, the update is failing: it prepends the value with a space and deletes the last character of the value I give it.
Here is the update:
UPDATE items
SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg'
WHERE id=38;
Here is the select:
select * from items\G;
Here is one line of the output:
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
...
notice the missing 'g' at the end and the extra space at the beginning.
How do I stop this?
Here is some system info that may help:
mysql> show variables LIKE '%version%';
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.46 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.46-0ubuntu0.14.04.2 |
| version_comment | (Ubuntu) |
| version_compile_machine | i686 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
EDIT 1 Table description:
mysql> desc items;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
...
| image_url | varchar(255) | NO | | NULL | |
...
EDIT 2 Checking for triggers:
mysql> show triggers \G
Empty set (0.00 sec)
EDIT 3 Another example:
I am doing all these commands from command line. Another example:
UPDATE items SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg33333333333333' WHERE id=38;
select * from items\G;
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg3333333333333
...
EDIT 4 Checking length of inputs and outputs:
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
1 row in set (0.00 sec)
http://www.lettercount.com/ gives http://ecx.images-amazon.com/images/I/61Dz5t8wjQL.SX522.jpg 61 characters as well, which makes sense given that the update is not changing the length of the string, just deleting the last characters and adding a space to the beginning,
EDIT 5 Trying encoding:
base64 encoding:
aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==
mysql> UPDATE items SET image_url = 'aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==' WHERE id=38;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
length(image_url): 84
1 row in set (0.00 sec)
decoding: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
gives:
http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg
EDIT 6 Checking if Insert fails as well:
mysql> INSERT INTO items (url, image_url) VALUES('http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg', 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg');
Query OK, 1 row affected, 2 warnings (0.03 sec)
the warnings are because I did not give all the values where NULL:NO values in this insert
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field 'created_at' doesn't have a default value |
| Warning | 1364 | Field 'updated_at' doesn't have a default value |
+---------+------+-------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select image_url,length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
So, it also fails on insert.
EDIT 7 create table information
mysql> show create table items\G;
*************************** 1. row ***************************
Table: items
Create Table: CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`image_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`color` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`store` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_items_on_id` (`id`),
KEY `index_items_on_url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 8 More table and column information
select * from information_schema.columns where table_name='items' and column_name='image_url'\G;
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: development_database
TABLE_NAME: items
COLUMN_NAME: image_url
ORDINAL_POSITION: 5
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 255
CHARACTER_OCTET_LENGTH: 765
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
CHARACTER_SET_NAME: utf8
COLLATION_NAME: utf8_unicode_ci
COLUMN_TYPE: varchar(255)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
2 rows in set (0.01 sec)
ERROR:
No query specified
EDIT 9 Charlength readouts
mysql> select image_url,length(image_url),char_length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
char_length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 10 showing variables like character
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
EDIT 11: THE POTENTIAL ISSUE
The error does not appear in the users table, but it does occur in the items table. Here is the difference that I think may be causing the issue. (I do not yet have a solution since the item table has that UTF-8 for a reason: urls can have some funky characters)
show create table users\G;
ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
show create table items\G;
ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
To be honest I think this should be a Community Answer, as I was a
little later on the scene and others had done some important ground work
establishing what was and was not a factor in this issue.
This link may be relevant, as your table character set is utf8 so the last character in the string may be getting skewed (and not saving correctly, thus disappearing).
All of the rows in EDIT 10 which reference latin1 or utf8 character set collations should be the same, and ideally should be utf8mb4 . I would now hazard a guess that the saving of UTF-8 characters in a non-true-utf-8 character collation is meaning the final character of any string is an incomplete reference and so not displaying.
So to solve your issue run the command:
ALTER TABLE items CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For info / background:
utf8mb4 is the full and complete UTF-8 character set and so will show any and every character that can be used in a web address. If there are some obscure characters in the data I suggest you change the column to a BLOB column before then changing it to a utf8mb4 column, because this will preserve the correct character definitions as input rather than as assumed by MySQL on the data already entered.
You do not want utf8_ character sets, in MySQL that is as good as broken, what you want is utf8mb4, the standard UTF8 definition in MySQL is compromised because it saves 4-byte characters in 3-byte blocks and thus corrupts saved character data.

load data infile partially cuts phone numbers?

I made a table which name is 'test' in mysql in my PC like belows.
create table test(
telnum varchar(20) not null,
reg_date datetime not null default '0000-00-00 00:00:00',
remarks text,
primary key(telnum)
);
And I uploaded a file named 1.txt into table 'test'.
1.txt's contents are like belows :
01011112222
01022223333
01033334444
And 'load data infile' syntax are like belows :
load data infile "c:/temp/1.txt"
ignore into table test;
But there is a problem.
Every phone numbers were cut like below.
Query OK, 3 rows affected, 3 warnings (0.00 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 3
mysql> select * from test;
+-------------+---------------------+---------+
| telnum | reg_date | remarks |
+-------------+---------------------+---------+
|12222 | 0000-00-00 00:00:00 |
|23333 | 0000-00-00 00:00:00 |
|34444 | 0000-00-00 00:00:00 |
+-------------+---------------------+---------+
3 rows in set (0.00 sec)
Only 5 characters are remained from 11 characters.
6 characters disappeared.
And second problem is 'warnings'. Reason of warnings is like below.
mysql> show warnings;
+---------+------+---------------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------------+
| Warning | 1264 | Out of range value for column 'reg_date' at row 1 |
| Warning | 1264 | Out of range value for column 'reg_date' at row 2 |
| Warning | 1264 | Out of range value for column 'reg_date' at row 3 |
+---------+------+---------------------------------------------------+
3 rows in set (0.00 sec)
I did'nt put anything into the reg_date field.
But the message says out of range value for column 'reg_date'.
What are the reasons and how can I solve these problems?
Try to change the line delimiter. On Windows it's usually \r\n rather then \n which is default if you omit LINES TERMINATED BY clause
LOAD DATA INFILE "/tmp/1.txt"
IGNORE INTO TABLE test
LINES TERMINATED BY '\n'

CPanel/MySql ENUM sets default to ' '?

Hey guys I created a database column in my regular LAMP stack that seems to work great, the trouble is when migrating this into CPanel, it seems that my Default values in enum revert to ' ' or whitespace?
the command I used to create this column was
`status` ENUM('0','1','2') NOT NULL DEFAULT '0',
But it seems this doesn't actually happen.....
Is there an error in my syntax? A stupidity of CPanel?
What's going on here?
EDIT
It looks like it has something to do with the input button
submitting a blank value? Anyone heard of this before?
MariaDB [test]> create table settest(attrib set('bold','italic','underline') DEF
AULT 'bold',color enum('red','green','blue') DEFAULT 'blue');
MariaDB [test]> INSERT INTO settest VALUES('a','s');
Query OK, 1 row affected, 2 warnings (0.14 sec)
MariaDB [test]> SHOW WARNINGS;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'attrib' at row 1 |
| Warning | 1265 | Data truncated for column 'color' at row 1 |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)
MariaDB [test]> SELECT * FROM settest;
+--------+-------+
| attrib | color |
+--------+-------+
| | |
| | |
+--------+-------+
Looks like the answer to get a default is NOT NULL DEFAULT 1 as per 1.3. ENUM