mysql update off by one character - mysql

I am using mysql and in the table 'items' updates on the variable image_url 'succeed' with no warnings. But, in reality, the update is failing: it prepends the value with a space and deletes the last character of the value I give it.
Here is the update:
UPDATE items
SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg'
WHERE id=38;
Here is the select:
select * from items\G;
Here is one line of the output:
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
...
notice the missing 'g' at the end and the extra space at the beginning.
How do I stop this?
Here is some system info that may help:
mysql> show variables LIKE '%version%';
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.46 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.46-0ubuntu0.14.04.2 |
| version_comment | (Ubuntu) |
| version_compile_machine | i686 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
EDIT 1 Table description:
mysql> desc items;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
...
| image_url | varchar(255) | NO | | NULL | |
...
EDIT 2 Checking for triggers:
mysql> show triggers \G
Empty set (0.00 sec)
EDIT 3 Another example:
I am doing all these commands from command line. Another example:
UPDATE items SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg33333333333333' WHERE id=38;
select * from items\G;
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg3333333333333
...
EDIT 4 Checking length of inputs and outputs:
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
1 row in set (0.00 sec)
http://www.lettercount.com/ gives http://ecx.images-amazon.com/images/I/61Dz5t8wjQL.SX522.jpg 61 characters as well, which makes sense given that the update is not changing the length of the string, just deleting the last characters and adding a space to the beginning,
EDIT 5 Trying encoding:
base64 encoding:
aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==
mysql> UPDATE items SET image_url = 'aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==' WHERE id=38;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
length(image_url): 84
1 row in set (0.00 sec)
decoding: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
gives:
http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg
EDIT 6 Checking if Insert fails as well:
mysql> INSERT INTO items (url, image_url) VALUES('http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg', 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg');
Query OK, 1 row affected, 2 warnings (0.03 sec)
the warnings are because I did not give all the values where NULL:NO values in this insert
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field 'created_at' doesn't have a default value |
| Warning | 1364 | Field 'updated_at' doesn't have a default value |
+---------+------+-------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select image_url,length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
So, it also fails on insert.
EDIT 7 create table information
mysql> show create table items\G;
*************************** 1. row ***************************
Table: items
Create Table: CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`image_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`color` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`store` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_items_on_id` (`id`),
KEY `index_items_on_url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 8 More table and column information
select * from information_schema.columns where table_name='items' and column_name='image_url'\G;
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: development_database
TABLE_NAME: items
COLUMN_NAME: image_url
ORDINAL_POSITION: 5
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 255
CHARACTER_OCTET_LENGTH: 765
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
CHARACTER_SET_NAME: utf8
COLLATION_NAME: utf8_unicode_ci
COLUMN_TYPE: varchar(255)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
2 rows in set (0.01 sec)
ERROR:
No query specified
EDIT 9 Charlength readouts
mysql> select image_url,length(image_url),char_length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
char_length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 10 showing variables like character
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
EDIT 11: THE POTENTIAL ISSUE
The error does not appear in the users table, but it does occur in the items table. Here is the difference that I think may be causing the issue. (I do not yet have a solution since the item table has that UTF-8 for a reason: urls can have some funky characters)
show create table users\G;
ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
show create table items\G;
ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

To be honest I think this should be a Community Answer, as I was a
little later on the scene and others had done some important ground work
establishing what was and was not a factor in this issue.
This link may be relevant, as your table character set is utf8 so the last character in the string may be getting skewed (and not saving correctly, thus disappearing).
All of the rows in EDIT 10 which reference latin1 or utf8 character set collations should be the same, and ideally should be utf8mb4 . I would now hazard a guess that the saving of UTF-8 characters in a non-true-utf-8 character collation is meaning the final character of any string is an incomplete reference and so not displaying.
So to solve your issue run the command:
ALTER TABLE items CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For info / background:
utf8mb4 is the full and complete UTF-8 character set and so will show any and every character that can be used in a web address. If there are some obscure characters in the data I suggest you change the column to a BLOB column before then changing it to a utf8mb4 column, because this will preserve the correct character definitions as input rather than as assumed by MySQL on the data already entered.
You do not want utf8_ character sets, in MySQL that is as good as broken, what you want is utf8mb4, the standard UTF8 definition in MySQL is compromised because it saves 4-byte characters in 3-byte blocks and thus corrupts saved character data.

Related

How to change encoding on fly in SELECT statement?

I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.
Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?
I.e. I need something like this
SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable
Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.
mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
mysql> insert into weirdTable values ('ЂЃЉЌ');
mysql> select * from weirdTable;
+-------------+
| weirdColumn |
+-------------+
| ЂЃЉЌ |
+-------------+
Use MySQL's CONVERT() function to force the characters to a different encoding:
mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:
mysql> create table w2
as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
Query OK, 1 row affected (0.07 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> show create table w2\G
*************************** 1. row ***************************
Table: w2
Create Table: CREATE TABLE `w2` (
`weirdColumnUtf8` longtext CHARACTER SET utf8
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> select * from w2;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.
If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:
mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------------+---------+-----+---------+----------+---------+---------------+
...
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | PAD SPACE |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | PAD SPACE |
...

AWS MariaDB Statement could not be executed

Seem to have a bit of a character encoding issue with MariaDB on AWS that I can't seem to resolve;
Statement could not be executed (22007 - 1366 - Incorrect string value: '\xA320 Of...'
Initially I presumed this was because the table was set to latin1 but I've since changed the table and the column to utf8mb4_unicode_ci and the error persists.
Since your column definition is utf8, you also need to insert utf8 data.
\xA320 is not a valid utf8 character:
mysql> select convert(X'A320' using utf8mb4);
+--------------------------------+
| convert(X'A320' using utf8mb4) |
+--------------------------------+
| ? |
+--------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+-------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------+
| Warning | 1300 | Invalid utf8mb4 character string: '\xA3 ' |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)

Inserting 4-byte unicode characters into MySQL/MariaDB

When attempting to insert 💩 (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:
Incorrect string value: '\xF0\x9F\x92\xA9'
The statement:
mysql> insert into bob (test) values ('💩');
Here's my database's charset/collation:
mysql> select ##collation_database; +----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database; +--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
1 row in set (0.00 sec)
The server's character set:
mysql> show global variables like '%character_set_server%'\G; *************************** 1. row ***************************
Variable_name: character_set_server
Value: utf8mb4
The table:
create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
Can anyone point me in the right direction?
Yes, as you commented, you need to use SET NAMES utf8mb4.
Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.
SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.
In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.

MariaDB Character Encoding

I have just started to port an older MySQL/Spring/Eclipselink project to MariaDB. I am encountering an issue with table creation that can be demonstrated as follows:
MariaDB [spasm]> CREATE TABLE Configuration (ID BIGINT NOT NULL, Attribute VARCHAR(190) NOT NULL UNIQUE, Value VARCHAR(255) NOT NULL, PRIMARY KEY (ID));
Query OK, 0 rows affected (0.07 sec)
MariaDB [spasm]> drop table Configuration;
Query OK, 0 rows affected (0.06 sec)
MariaDB [spasm]> CREATE TABLE Configuration (ID BIGINT NOT NULL, Attribute VARCHAR(255) NOT NULL UNIQUE, Value VARCHAR(255) NOT NULL, PRIMARY KEY (ID));
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
MariaDB [spasm]>
MariaDB [spasm]> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I understand that this is related to character encoding, however I don't know how to manage/correct it?
The new default CHARACTER SET is utf8mb4. It is complaining about the UNIQUE index:
Attribute VARCHAR(255) NOT NULL UNIQUE
If you are hitting the limit because of trying to use CHARACTER SET utf8mb4. Then do one of the following (each has a drawback) to avoid the error:
⚈ Upgrade to 5.7.7 (MariaDB 10.2.2?) for 3072 byte limit -- your cloud may not provide this;
⚈ Change 255 to 191 on the VARCHAR -- you lose any values longer than 191 characters (unlikely?);
⚈ ALTER .. CONVERT TO utf8 -- you lose Emoji and some of Chinese;
⚈ Use a "prefix" index -- you lose some of the performance benefits.
Or... Stay with 5.6/5.5/10.1 but perform 4 steps to raise the limit to 3072 bytes:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
SET GLOBAL innodb_large_prefix=1;
logout & login (to get the global values);
ALTER TABLE tbl ROW_FORMAT=DYNAMIC; -- (or COMPRESSED)

MySQL UTF8 varchar column size

MySQL documentation says that since 5.0, varchar lengths refer to character units, not bytes. However, I recently came across an issue where I was getting truncated data warnings when inserting values that should have fit into the varchar column it was designated.
I replicated this issue with a simple table in v5.1
mysql> show create table test\G
*************************** 1. row ***************************
Table: test
Create Table: CREATE TABLE `test` (
`string` varchar(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
I then inserted multiple 10 characters values with differing amounts of UTF8 characters
mysql> insert into test (string) values
-> ('abcdefghij'),
-> ('ãáéíçãáéíç'),
-> ('ãáéíç67890'),
-> ('éíç4567890'),
-> ('íç34567890');
Query OK, 5 rows affected, 4 warnings (0.06 sec)
Records: 5 Duplicates: 0 Warnings: 4
mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'string' at row 2 |
| Warning | 1265 | Data truncated for column 'string' at row 3 |
| Warning | 1265 | Data truncated for column 'string' at row 4 |
| Warning | 1265 | Data truncated for column 'string' at row 5 |
+---------+------+---------------------------------------------+
mysql> select * from test;
+------------+
| string |
+------------+
| abcdefghij |
| ãáéíç |
| ãáéíç |
| éíç4567 |
| íç345678 |
+------------+
5 rows in set (0.00 sec)
I think that this shows that the varchar size is still defined in bytes or at least, is not accurate in character units.
The question is, am I understanding the documentation correctly and is this a bug? Or am I misinterpreting the documentation?
It's true that VARCHAR and CHAR sizes are considered in characters, not bytes.
I was able to recreate your issue when I set my connection character set to latin1 (single byte).
Ensure that you set your connection character set to UTF8 prior to running the insertion query with the following command:
SET NAMES utf8
If you don't do this, a two-byte UTF8 character will get sent as two single-byte characters.
You might consider changing your default client character set.