Why does `SET NAMES utf8` change the behaviour of `REPLACE(uuid(),...)` calls? - mysql

While investigating an issue with a failing migration I found out the following strange behaviour. Using SET NAMES utf8 on my client session changes the behaviour of REPLACE(uuid(),'','') calls.
mysql> select replace(uuid(),'','') from mysql.user;
+--------------------------------------+
| replace(uuid(),'','') |
+--------------------------------------+
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> select replace(uuid(),'','') from mysql.user;
+--------------------------------------+
| replace(uuid(),'','') |
+--------------------------------------+
| 539c0b5c-ecdc-11e8-844f-0242ac120002 |
| 539c0b79-ecdc-11e8-844f-0242ac120002 |
| 539c0b7f-ecdc-11e8-844f-0242ac120002 |
+--------------------------------------+
3 rows in set (0.01 sec)
As you can see the generated UUID's are unique only after setting NAMES to utf8. The way I found out about SET NAMES utf8 was passing the query through MySQL Workbench.
I would greatly appreciate if someone here can explain how character sets (NAMES) influence the output of REPLACE(UUID(), ...) calls. Thanks in advance.
Update: adding a snippet to prove that the problem 1) is not with UUID() generating non-unique values and 2) relates to utf8mb4 charset
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| e41d2fe4-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
| e41d3042-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
| e41d309c-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| e9059092-ed70-11e8-844f-0242ac120002 | e9059117-ed70-11e8-844f-0242ac120002 |
| e90591a1-ed70-11e8-844f-0242ac120002 | e905923e-ed70-11e8-844f-0242ac120002 |
| e9059380-ed70-11e8-844f-0242ac120002 | e90593e1-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| ef564f32-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
| ef564fa4-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
| ef565019-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
Update 2: Adding EXPLAIN queries below to trace the actual SQL code as suggested by #qbolec. That reveals that the use of CONVERT(... using ...) is the culprit for the non-unique UUIDs. I still do not exactly understand why as this is not the behaviour I expect from CONVERT() func.
mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select replace(uuid(),'','') from mysql.user;\W
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 7 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select replace(convert(uuid() using utf8mb4),'','') AS `replace(uuid(),'','')` from `mysql`.`user`
Show warnings enabled.
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select replace(uuid(),'','') from mysql.user;\W
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 7 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select replace(uuid(),'','') AS `replace(uuid(),'','')` from `mysql`.`user`
Show warnings enabled.

Related

full text INDEX can not search the content with "," in it (mysql5.7.20, MyISAM)

I have create the full text index. (mysql 5.7.20, MyISAM)
I have modified the ft_stopword_file to '', and then restarted the server, recreated the table.
But still I can not search the word with "," in it, as below:
mysql> show create table tmp;
+-------+-----------------------------+
| Table | Create Table |
+-------+------------------------------+
| tmp | CREATE TABLE `tmp` (
`book_name` char(32) NOT NULL,
FULLTEXT KEY `book_name` (`book_name`) /*!50100 WITH PARSER `ngram` */
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 |
+-------+---------------------------------+
1 row in set (0.00 sec)
mysql> select * from tmp;
+-----------------+
| book_name |
+-----------------+
| hi,there |
+-----------------+
1 rows in set (0.00 sec)
mysql> show variables like '%ngram%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| ngram_token_size | 2 |
+------------------+-------+
1 row in set (0.01 sec)
mysql> show variables like '%stopword%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| ft_stopword_file | |
| innodb_ft_enable_stopword | ON |
| innodb_ft_server_stopword_table | |
| innodb_ft_user_stopword_table | |
+---------------------------------+-------+
mysql> select book_name from tmp where match(book_name) against('"hi,there"' in boolean mode);
Empty set (0.00 sec)
Why?

mysql query to required to be faster

Hi Is it possible to speed the following query up? I have tried using different indexes, Would I be right in thinking that If its constantly taking more than 0.20 seconds that the query cache is not being used?
Also is forcing an index a good idea in this case?
FYI all three columns used after the where clause are indexed.
mysql> SELECT count( item_t0.PK ) FROM table item_t0 WHERE ( item_t0.PK <>87984555070 AND item_t0.p_internalurl ='replicated273784712' AND ( item_t0.p_datapk =8987719073822 OR ( item_t0.p_datapk IS NULL AND item_t0.PK =8987719073822))) AND (item_t0.TypePkString IN (N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N) );
+---------------------+
| count( item_t0.PK ) |
+---------------------+
| 0 |
+---------------------+
1 row in set (0.32 sec)
mysql> explain extended SELECT count( item_t0.PK ) FROM medias item_t0 WHERE ( item_t0.PK <>8798340055070 AND item_t0.p_internalurl ='replicated273654712' AND ( item_t0.p_datapk =8987719073822 OR ( item_t0.p_datapk IS NULL AND item_t0.PK =8987717719073822))) AND (item_t0.TypePkString IN (N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N) ); 7961003
+----+-------------+---------+------+--------------------------------------------------------+-----------------+---------+-------+-------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------+--------------------------------------------------------+-----------------+---------+-------+-------+----------+-------------+
| 1 | SIMPLE | item_t0 | ref | PRIMARY,dataPK_idx_30,internalurl_idx,typepkstring_idx | internalurl_idx | 768 | const | 74102 | 100.00 | Using where |
+----+-------------+---------+------+--------------------------------------------------------+-----------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
I have looked at whether the query_cache needs to be looked into and saw the following values, Can anyone suggest if the following results show anything...
mysql> show global status like "%Qcache_hits%";
+---------------+--------+
| Variable_name | Value |
+---------------+--------+
| Qcache_hits | 352251 |
+---------------+--------+
mysql> show global status like "%com_select%";
+---------------+----------+
| Variable_name | Value |
+---------------+----------+
| Com_select | 33525493 |
+---------------+----------+
1 row in set (0.00 sec)
mysql> show global status like "%prunes%";
+----------------------+----------+
| Variable_name | Value |
+----------------------+----------+
| Qcache_lowmem_prunes | 15729370 |
+----------------------+----------+
1 row in set (0.00 sec)

How to solve Chinese character display in MySQL?

after Mysql installation , I create a table and add some records like this
create table score (
student_id int,
subject varchar(10),
subject_score int
);
insert into score values (001,'数学',50),(002,'数学',60),(003,'数学',70);
and the result is
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
+------------+---------+---------------+
3 rows in set (0.00 sec)
and then I changed "character set"
mysql> alter table score character set utf8;
mysql> alter table score modify subject varchar(10) character set utf8;
mysql> set character_set_server=utf8;
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.5.47-linux2.6-x86_64/share/charsets/ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)
and my table struture is
mysql> show create table score\G;
*************************** 1. row ***************************
Table: score
Create Table: CREATE TABLE `score` (
`student_id` int(11) DEFAULT NULL,
`subject` varchar(10) DEFAULT NULL,
`subject_score` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
I think everything is fine but when I query my table, Chinese character still cannot display correctly
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
+------------+---------+---------------+
3 rows in set (0.00 sec)
but when I insert the same records again , new record is fine!
mysql> insert into score values (001,'数学',50),(002,'数学',60),(003,'数学',70);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
| 1 | 数学 | 50 |
| 2 | 数学 | 60 |
| 3 | 数学 | 70 |
+------------+---------+-------
how to save my old records? I need somebody help and thank you very much
Use UTF-8 when you create table.
create table table_name () CHARACTER SET = utf8;
Use UTF-8 when you insert to table
set username utf8; INSERT INTO table_name (ABC,VAL);

MySQL can select litral emoji but wont store emoji into table

I am trying to insert some emoji characters into a table in MySQL, but values are stored as question marks (????).
I made sure to create the database with the proper utf8mb4 encoding:
mysql> describe users;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(191) | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
Then I tried to make sure, does MySql understand emoji or not, so I did this:
mysql> select '🌰';
+------+
| 🌰 |
+------+
| 🌰 |
+------+
1 row in set (0.00 sec)
Then I did this:
mysql> insert into users (name) values ('🌰');
Query OK, 1 row affected, 1 warning (0.05 sec)
mysql> select * from users;
+----+------------+
| id | name |
+----+------------+
| 21 | فاضل |
| 30 | سلاحف |
| 46 | ???? |
| 47 | ???? |
| 48 | ???? |
| 49 | ???? |
+----+------------+
6 rows in set (0.01 sec)
I don't know what to do to fix that..
** EDIT ** : as requested in the comments, I ran the following command:
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-------------------------+
| Variable_name | Value |
+--------------------------+-------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /static/share/charsets/ |
+--------------------------+-------------------------+
8 rows in set (0.00 sec)
Your connection is set up for utf8; it needs to be set up for utf8mb4.
How did you set it? Change to whichever of these applies.
SET NAMES utf8mb4
PDO(... charset=utf8mb4)
mysqli::set_charset('utf8mb4')
etc
Emoji are 4-byte utf8 codes, hence the four question marks.

Why Are These Tables the Same Size?

I was trying to measure the difference between TINYINT and INT when I came across something interesting. For tables with small numbers of columns, the choice of data type does not seem to affect the size of the table.
Server version: 5.1.41-3ubuntu12.10 (Ubuntu)
Example:
mysql> describe tinyint_test;
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| test_int | tinyint(4) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> describe tinyint_id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id | tinyint(4) | YES | | NULL | |
+-------+------------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> describe int_test;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| not_id | int(11) | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> select * from tinyint_test;
+------+----------+
| id | test_int |
+------+----------+
| 1 | 1 |
| 2 | 2 |
| 3 | 127 |
| 10 | 50 |
+------+----------+
4 rows in set (0.00 sec)
mysql> select * from tinyint_id_test;
+------+
| id |
+------+
| 1 |
| 2 |
| 127 |
| 50 |
+------+
4 rows in set (0.00 sec)
mysql> select * from int_test;
+--------+
| not_id |
+--------+
| 1 |
| 2 |
| 127 |
| 50 |
+--------+
4 rows in set (0.00 sec)
mysql> SELECT TABLE_NAME, DATA_LENGTH FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA like '%test%';
+-----------------+-------------+
| TABLE_NAME | DATA_LENGTH |
+-----------------+-------------+
| int_test | 28 |
| tinyint_id_test | 28 |
| tinyint_test | 28 |
+-----------------+-------------+
3 rows in set (0.00 sec)
I vaguely suspect that there might be an internal column in each row, or that the minimum data size for a given row must be at least the size of a full INT, but neither of these suspicions really account for what's happening here. What could be the case is my choice of DATA_LENGTH is the incorrect tool for measuring the true size of the tables, in which case an acceptable answer would point me in the right direction for actually measuring these tables.
EDIT:
I can generate a table of a different size by using two INTs:
mysql> describe int_id_test;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| test_int | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
2 rows in set (0.01 sec)
mysql> select * from int_id_test;
+------+----------+
| id | test_int |
+------+----------+
| 1 | 1 |
| 2 | 2 |
| 3 | 127 |
| 10 | 50 |
+------+----------+
4 rows in set (0.00 sec)
mysql> SELECT TABLE_NAME, DATA_LENGTH FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA like '%test%';
+-----------------+-------------+
| TABLE_NAME | DATA_LENGTH |
+-----------------+-------------+
| int_id_test | 36 |
| int_test | 28 |
| tinyint_id_test | 28 |
| tinyint_test | 28 |
+-----------------+-------------+
4 rows in set (0.01 sec)
the data_length column is how much hard drive space the operating system allocates
for a table.
mysql database page sizes configurable default is 16KB, the three table's data may used same pages, so the data_length are same!!
edit:
innodb engine default page size is 16KB, i don't know this size for other engines
I have found a work around for this problem as well as something of an explanation.
After looking at the table structure in a hex editor (on my linux machines these were located in /var/lib/mysql/[DATABASE NAME]/[TABLE NAME].MYD), I found that in all cases the records were created using a minimum of 7 bytes for a row, regardless of the actual data types involved. Any extra bytes that were not used by the table were zeroed out.
Here is an example with a smaller data set to illustrate:
mysql> describe int_test_2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> select * from int_test_2;
+------+
| id |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.00 sec)
Looking at this guy in a hex editor, we see:
fd01 0000 0000 00fd 0200 0000 0000
Using information from Neo's link, I was able to decode this row:
fd Record header bits.
01000000 Integer value "1" (little endian)
0000 Wasted Space!
fd Record header bits.
02000000 Integer value "2" (little endian)
0000 Wasted Space!
However, notice the following:
mysql> alter table int_test_2 MAX_ROWS=50000000, AVG_ROW_LENGTH=4;
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
Now, the MYD file looks like this:
fd01 0000 00fd 0200 0000
That is, it uses the correct sizes.
One thing to note is that the number in brackets does not effect the size of that column, i.e an INT(4) is the same size as an INT(11) in terms of storage, all the number in brackets does is pad the returned value with spaces so that it fills 11 or 4 characters.
I suspect if you trully want to work out the size of the tables, you will need to look in the MySQL file itself and see how they are stored. All the data is stored in /var/lib/mysql/ - ibdata & ib_logfile are the main files. Open this in a text editor (Caution - this file may be HUGE depending on the sizes of your databases.. also DO NOT modify this file!!)
All the tables and cells are stored in here, however they are not delimeted, so its very difficult to see where one column ends and the next begins - it is all based on the data size which you are trying to establish. If you know the data in the table you should be able to work out the structure.
Edit: I think some of the data in these files may be stored in hex, so if it doesnt immediately make sense, try a hex editor.