I'm using mariadb (" 10.1.20-MariaDB-1~trusty") with utf8mb4. Now I'm in the process of converting all tables to "row_format = dynamic" and table collation "utf8mb4_unicode_ci". I've noticed that there are some rogue tables in my database that still have "utf8mb4_general_ci" as collation, like this one:
use database;
SHOW TABLE STATUS WHERE COLLATION != "utf8mb4_unicode_ci";
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+--------------------+---------+
| table | InnoDB | 10 | Dynamic | 5 | 3276 | 16384 | 0 | 32768 | 0 | NULL | 2016-12-21 21:12:18 | NULL | NULL | utf8mb4_general_ci | NULL | row_format=DYNAMIC |
Then of course i would run something like this:
ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Which would finish without error. Checking Table Status again afterwards, still reads
Collation = utf8mb4_general_ci
for that table.
Dumping and importing that same database into my local 5.6.32-78.0 Percona Server and doing the same there will result in the table collation being converted to utf8mb4_unicode_ci as desired.
Does anyone have an idea what might be the cause for that?
Most likely there are no columns in the table to convert, so the operation is skipped. Try to run
ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, FORCE;
or
ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, ALGORITHM=COPY;
A bug report has been created based on this question:
https://jira.mariadb.org/browse/MDEV-11637
Related
In my.ini I've changed properties from latin1 to cp1251 (then restarted the server)
[mysql]
default-character-set=cp1251
............................
[mysqld]
default-character-set=cp1251
I create database
CREATE DATABASE library DEFAULT CHARSET=cp1251;
Make request to check out the encoding:
SELECT ##character_set_database, ##collation_database;
+--------------------------+----------------------+
| ##character_set_database | ##collation_database |
+--------------------------+----------------------+
| cp1251 | cp1251_general_ci |
+--------------------------+----------------------+
show variables like "char%";
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp1251 |
| character_set_connection | cp1251 |
| character_set_database | cp1251 |
| character_set_filesystem | binary |
| character_set_results | cp1251 |
| character_set_server | cp1251 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+---------------------------------------------------------+
Create a table
CREATE TABLE genres (g_id INT, g_name VARCHAR(150)) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
As I try to insert cyrillic data, the Command Line window gets stuck:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Поэзия');
'>
'>
'>
'>
Latin strings get inserted ok:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Poetry');
Query OK, 1 row affected (0.06 sec)
Yesterday, after the whole day of trying and testing, I got it working well. Created some more tables and inserted some Cyrillic strings. But next morning and the whole day long I can't get it working again. The previously inserted data wouldn't display. After firing
set names utf8
the Cyrillic words appeared, numeric columns didn't show right. What have I missed?
It's not just one change.
character_set_client/connection/results, but not the other two that you changed, specify the encoding of the client.
The column definitions in the database tables need to have a character set that can handle Cyrillic. One way is to do this to each table:
ALTER TABLE t CONVERT TO cp1251;
Have you have already stored Cyrillic in latin1 columns?
Check by doing SELECT HEX(col) .... You may need the 2-step Alter as discussed in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
It would be best to switch to utf8mb4; that way you could handle all character sets throughout the world.
See also Trouble with UTF-8 characters; what I see is not what I stored
I have found a workaround. After starting cmd
C:\Users\nikol>chcp 866
Active code page: 866
Then after starting mysql
mysql> set names cp866;
Query OK, 0 rows affected (0.00 sec)
But when I select the data, there are multiple trailing spaces
mysql> SELECT * FROM genres;
+------+------------------+
| g_id | g_name |
+------+------------------+
| 1 | Поэзия |
| 2 | Программирование |
| 3 | Психология |
| 4 | Наука |
| 5 | Классика |
| 6 | Фантастика |
+------+------------------+
6 rows in set (0.00 sec)
I guess I'll have to TRIM
I have the following table on MySQL. I am using 5.6.32. The table contains about ~40 million records. I am only sharing columns which I feel are necessary to understand the issue.
Table Structure
create table `random` (
`id` bigint(20) not null auto_increment,
`some_id` bigint(20) not null,
`latitude` decimal(20,14) default null,
`longitude` decimal(20,14) default null,
`new_column` varchar(255) collate utf8_unicode_ci default null,
primary key (`id`)
) engine=innodb auto_increment=40878872 default charset=utf8 collate=utf8_unicode_ci;
So, I added a new column in this table called new_column varchar(255). But, when I do length(new_column), there are entries which have more than 255 characters.
The actual value being inserted:
random*GS02,355234054262743,GPS:356728;A;N33.614073;E77.063096;0;0;230118,STT:400;0,ADC:0���&�������������r�������r�������r������*GS02,39233054663793,GPS:173158;A;N33.614057;E77.0263201;0;0;210118,STT:200;0,ADC:0;24.7;1;29.9;2;4.2;3;0.0
On the MySQL Master (say, machine #1, my application was able to insert this value in new_column in the table without an issue. I have a MySQL slave (say, machine #2) using native MySQL replication and it was also able to replicate this record easily. But then I have another slave replicating from machine #2 which is using tungsten replicator. Whenever there is a string which is more than 255 characters, tungsten throws the following error and replication breaks
pendingError : Event application failed: seqno=2395306016 fragno=0 message=java.sql.SQLDataException: Data too long for column 'new_column' at row 1
pendingExceptionMessage: java.sql.SQLDataException: Data too long for column 'new_column' at row 1
EDIT:
Variables on Master and both Slaves are set to
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
Collation while creating the table is set to utf8_unicode_ci on all instances.
Why is it that MySQL native replication is allowing more characters to be written to the column? And why is it that Tungsten replicator is preventing it?
I'm using MariaDb server (Ver 15.1 Distrib 10.2.7-MariaDB).
When I execute
CREATE TABLE `my_table` (
`id` INT NOT NULL,
`name` NVARCHAR(64) NULL,
PRIMARY KEY (`id`)
);
Describe output:
MariaDB [db]> describe my_table;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(64) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Why there is no error, and "name" column datatype is varchar (not nvarchar)?
db schema details:
Default collation: utf8_general_ci
Default characterset: utf8
NVARCHAR is a synonym for VARCHAR in MySQL/MariaDB. But you need to add the CHARACTER SET utf8mb4 to be sure that you get full UTF-8 support.
What you show as the default for that database is only the subset, called 'utf8'. It will not handle Emoji or some of Chinese.
These queries both give the result I expect:
SELECT sex
FROM ponies
ORDER BY sex COLLATE latin1_swedish_ci ASC
SELECT sex
FROM ponies
ORDER BY CONVERT(sex USING utf8) COLLATE utf8_general_ci ASC
| f |
| f |
| m |
| m |
+---+
But this query gives a different result:
SELECT sex FROM ponies ORDER BY sex ASC
| m |
| m |
| f |
| f |
+---+
Here's the configuration:
SHOW VARIABLES LIKE 'collation\_%'
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
The table collation is latin1_swedish_ci.
MySQL server is 5.5.16.
Table Collations
Collation defaults are stored on a table-by-table basis. There is a server-set default, but that is applied to the table at the time it is created.
To find the collation for a specific table, run this query:
SHOW TABLE STATUS LIKE 'ponies'\G
You should see output like this:
*************************** 1. row ***************************
Name: ponies
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 8
Avg_row_length: 20
Data_length: 160
Max_data_length: 5629499534213119
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2012-02-27 10:16:25
Update_time: 2012-02-27 10:17:40
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
And you can see the Collation setting in that result.
Column collations
You can also override collation settings on particular columns within a table. A create table statement like this would create a latin1_swedish_ci table, with a utf8_polish_ci column:
CREATE TABLE ponies (
sex CHAR(1) COLLATE utf8_polish_ci
) CHARACTER SET latin1 COLLATE latin1_swedish_ci;
The best way to view the results of this is like this:
SHOW FULL COLUMNS FROM ponies;
Output:
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| sex | char(1) | utf8_polish_ci | YES | | NULL | | select,insert,update,references | |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
The documentation says it uses a case insensitive character comparison by default. I don't see why you are not getting that result though.
The documentation also suggests using the binary qualifier for case sensitive comparison. I wonder if that would affect your result?:
SELECT sex FROM ponies ORDER BY BINARY sex ASC
This behaviour can be observed when sex is an ENUM in which case it is usually sorted by the numerical position in the ENUM definition. Only when a collation is explicitly given an it is sorted in alphabetical order.
I've recently dusted off an old Ruby on Rails project of mine. In the past, I've never had any problems getting all the tests to pass, but now there is one test that gives me the following error:
ActiveRecord::StatementInvalid: Mysql::Error: #HY000Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '=': SELECT * FROM cards WHERE (cards.l1_description = '是' AND cards.l2_word = '')
So I go to my test db and ask:
mysql> use flashcard_test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show full columns from cards;
+----------------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+----------------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| l2_word | varchar(255) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| l1_description | text | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
| l1_id | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
| l2_id | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
+----------------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+---------+
5 rows in set (0.01 sec)
And as you can see, the collation is latin1_swedish_ci, and presumably if it were "utf8_general_ci", my problems would be solved. Thankfully, my development database is already okay, so I go and
rake db:test:clone_structure
and back to MySql and check again in the test db
mysql> show full columns from cards;
+----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| l2_word | varchar(255) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
| l1_description | text | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
| l1_id | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
| l2_id | int(11) | NULL | YES | | NULL | | select,insert,update,references | |
+----------------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
5 rows in set (0.00 sec)
Ah, so now everything is looking good, so once again I
rake test
But I get the same problem all over again, and when I check my test db, I find that the collation column has been reset to latin1_swedish_ci.
I do not understand very well how rake test works, but my working hypothesis is that it recreates the DB using schema.rb. Now, in one of my migrations, I've got
class CreateCards < ActiveRecord::Migration
def self.up
create_table :cards, :options => "DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci" do |t|
t.column :english_word, :string
t.column :chinese_description, :text
end
end
def self.down
drop_table :cards
end
end
And this apparently has taken care of the collate problem there. (I've got another migration which renames english_word and chinese_description to l2_word and l1_description, respectively.) But this information has not made it into schema.rb. And somehow, apparently, MySql has decided to assume that I want latin1_swedish_ci.
So, to summarize, what I think I need to do is somehow edit something so that I'll be using the utf8_general_ci collation, and then my problems will go away (right?). But I cannot figure out how to make the code that gets run when you "rake test" do this. Can anybody help?
For what it's worth, both the test and development databases were created as
create database flashcard_test default character set utf8 default collate utf8_general_ci;
and
create database flashcard_development default character set utf8 default collate utf8_general_ci;
And my database.yml has
development:
adapter: mysql
database: flashcard_development
username: root
password:
encoding: utf8
test:
adapter: mysql
database: flashcard_test
username: root
password:
encoding: utf8
collation: utf8_general_ci
http://nhw.pl/wp/2008/09/16/mysql-collate-setting-in-rails-application seems to suggest that this problem has something to do with the connection between RoR and MySql, but I haven't had any luck with the suggestions there.
Adding the collation: utf8_general_ci to your database.yml file like you have done should do the trick. Try recreating the test database using "rake RAILS_ENV=test db:migrate:reset db:fixtures load" - warning this will clear all data you have there beyond the fixtures.
That worked for me. To verify see the collation on the database, tables, and columns you can execute the following:
-- Database Collations:
SELECT schema_name,default_character_set_name,default_collation_name
FROM information_schema.SCHEMATA
WHERE schema_name not IN ('mysql');
-- Table Collations:
SELECT T.table_schema, T.table_name, T.TABLE_COLLATION, CCSA.CHARACTER_SET_NAME
FROM information_schema.`TABLES` T,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
AND T.table_schema not IN ('mysql');
-- Column Collations:
SELECT table_schema, table_name, column_name, collation_name, character_set_name
FROM information_schema.`COLUMNS` C
WHERE C.table_schema not IN ('mysql')
ORDER BY 1,2,4;
Everything in your test database should now have the collation specified in database.yml.