UPDATE slow when using FULLTEXT-indexed column - mysql

I'm running a simple update query against my notifications table:
UPDATE `notifications`
SET is_unread = 0
WHERE MATCH(`grouping_string`) AGAINST("f89dc707afa38520224d887f897478a9")
The grouping_string column has a FULLTEXT index and the notifications table has 2M+ rows.
Now, the UPDATE above takes over 70 seconds to execute. However, if I run a SELECT using the same WHERE, the result is immediate.
What might be causing this and how can the UPDATE be optimized?
Environment: MySQL 5.6 (InnoDB) on Amazon Aurora engine
UPDATE: Using EXPLAIN on the query shows that the fulltext index is one of the possible ones to use, but it's not used during execution. Instead, only the PRIMARY (id) is used. Rows affected equals to the number of rows in the table (2M+).
UPDATE 2: Result of SHOW VARIABLES LIKE 'query%':
+------------------------------+-----------+
| Variable_name | Value |
+------------------------------+-----------+
| query_alloc_block_size | 8192 |
| query_cache_limit | 1048576 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 444890112 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| query_prealloc_size | 8192 |
+------------------------------+-----------+

Related

MySQL 8.0.23 tmp table keeps filling up

I have not had this issue with older versions of MySQL including up to 8.0.21 which I run in production within AWS RDS. I have a query that gets run only once a year. Here is the relevant code:
create table medicare_fee_history (
year int unsigned,
mac int unsigned,
locality int unsigned,
hcpcs varchar(10),
modifier varchar(10),
index (mac, locality, hcpcs, modifier),
non_facility decimal(17, 4),
facility decimal(17, 4)
) engine = myisam;
load data local infile 'PFALL.csv'
into table medicare_fee_history
fields terminated by ',' enclosed by '"'
(year, mac, locality, hcpcs, modifier, non_facility, facility);
create table medicare_fee_first (
year int unsigned,
hcpcs varchar(10),
modifier varchar(10),
index (hcpcs, modifier),
facility decimal(17, 4),
non_facility decimal(17, 4)
) engine = myisam;
insert into medicare_fee_first (year, hcpcs, modifier, facility, non_facility)
(
select min(year), hcpcs, modifier, avg(facility), avg(non_facility)
from medicare_fee_history group by hcpcs, modifier
);
During the insert select I get the following error:
ERROR 1114 (HY000): The table '/tmp/#sql4984_9_3' is full
Table medicare_fee_history has 16042724 rows. To reproduce this, the dataset can be found at https://drive.google.com/file/d/1p7Yf7wsCnBXl7UaxeFC1AP0youl-KCdZ/view?usp=sharing
The query generally returns 10823 rows. If you eliminate avg(facility) and avg(non_facility) it seems to work. There is plenty of space in /tmp. 92% of 100G is free. I set tmp_table_size to max. Here are the current server settings:
mysql> show variables like '%tmp%';
+---------------------------------+----------------------+
| Variable_name | Value |
+---------------------------------+----------------------+
| default_tmp_storage_engine | InnoDB |
| innodb_tmpdir | |
| internal_tmp_mem_storage_engine | TempTable |
| slave_load_tmpdir | /tmp |
| tmp_table_size | 18446744073709551615 |
| tmpdir | /tmp |
+---------------------------------+----------------------+
6 rows in set (0.00 sec)
mysql> show variables like '%temp%';
+-----------------------------+-----------------------+
| Variable_name | Value |
+-----------------------------+-----------------------+
| avoid_temporal_upgrade | OFF |
| innodb_temp_data_file_path | ibtmp1:12M:autoextend |
| innodb_temp_tablespaces_dir | ./#innodb_temp/ |
| show_old_temporals | OFF |
| temptable_max_mmap | 1073741824 |
| temptable_max_ram | 1073741824 |
| temptable_use_mmap | ON |
+-----------------------------+-----------------------+
7 rows in set (0.00 sec)
Any ideas on how to work around this?
I think the relevant setting for you to adjust is temptable_max_mmap
See: https://docs.amazonaws.cn/en_us/AmazonRDS/latest/AuroraUserGuide/ams3-temptable-behavior.html
Example 1
You know that your temporary tables grow to a cumulative size of 20
GiB. You want to set in-memory temporary tables to 2 GiB and to grow
to a maximum of 20 GiB on disk.
Set temptable_max_ram to 2,147,483,648 and temptable_max_mmap to
21,474,836,480. These values are in bytes.

Select from information_schema tables very slow

Query for * from information_schema.tables is very slow.
Innodb_stats_on_metadata is off, and select table_name from tables is fast, just selecting more fields is very slow (12 minutes!)
mysql> select * from tables limit 1;
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def | information_schema | CHARACTER_SETS | SYSTEM VIEW | MEMORY | 10 | Fixed | NULL | 384 | 0 | 32869632 | 0 | 0 | NULL | 2016-12-19 23:55:46 | NULL | NULL | utf8_general_ci | NULL | max_rows=87381 | |
+---------------+--------------------+----------------+-------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (**12 min 27.02 sec**)
Additional information:
mysql> select count(*) from tables;
+----------+
| count(*) |
+----------+
| 194196 |
+----------+
1 row in set (0.57 sec)
mysql> show global variables like '%innodb%metada%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_stats_on_metadata | OFF |
+--------------------------+-------+
1 row in set (0.00 sec)
Selecting more columns means the server has to do more work -- interrogating the storage engines for all of the tables in all of the schemas to obtain what you requested.
The tables in information_schema are not real tables. They are server internals, exposed via an SQL interface, in some cases allowing you to query information the server doesn't store and must calculate or gather because you asked. The server code knows what columns you ask for, and only gathers that information.
LIMIT 1 doesn't help, because information_schema doesn't handle LIMIT as you would expect -- the entire table is rendered in memory before the first row is returned and the rest are discarded.
Even in 5.7, the information about the tables is scattered in files on disk. Reading 200K files takes a lot of time.
That is one reason why 200K tables is not a good design. Other reasons have to do with caching -- there are practical limits on such.
You will see variations on timings of I_S queries because of caching.
Advice: Re-think your schema design.
8.0 Stores all that info in an InnoDB table, so it will be enormously faster.

Mysql: Why are these queries being 'logged' as 'not using indexes' when they

So, I'm trying to find joins that aren't properly using indexes, but the log is being filled with queries which appear to have indexes to me.
I turn on slow_query_log and turn on log_queries_not_using_indexes and set the long_query_time to 10 seconds.
The log starts flooding with lines like this...
Query_time: 0.320889 Lock_time: 0.000030 Rows_sent: 0 Rows_examined: 338336
SET timestamp=1422564398;
select * from fversions where author=155669 order by entryID desc limit 40;
The query time is below 10 seconds, and from this explain, it seems to be using the primary key as the index.
Why is this query being logged? I can't see the problem queries to add indexes to them. Too much noise.
Thanks in advance!
PS. The answer for this doesn't seem to apply as I have a 'where'. MySQL why logged as slow query/log-queries-not-using-indexes when have indexes?
mysql> explain select * from fversions where author=155669 order by entryID desc limit 40;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------
| 1 | SIMPLE | fversions | index | NULL | PRIMARY | 8 | NULL | 40 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+----
Blockquote
--+------
1 row in set (0.00 sec)
mysql> show variables like 'slow_query_log';
+----------------+-------+
| Variable_name | Value |
+----------------+-------+
| slow_query_log | ON |
mysql> show variables like 'long_query_time';
+-----------------+-----------+
| Variable_name | Value |
+-----------------+-----------+
| long_query_time | 10.000000 |
mysql> show variables like 'log_queries_not_using_indexes';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| log_queries_not_using_indexes | ON |

MySQL slow query that uses the index and isn't slow when I run it through the profiler

In my slow query log I am seeing slow queries like
# Time: 121107 16:34:02
# User#Host: web_node[web_node] # localhost [127.0.0.1]
# Thread_id: 34436186 Schema: test_db Last_errno: 0 Killed: 0
# Query_time: 1.413751 Lock_time: 0.000222 Rows_sent: 203 Rows_examined: 203 Rows_affected: 0 Rows_read: 203
# Bytes_sent: 7553 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 9B04384
SET timestamp=1352334842;
SELECT id, email FROM test_data WHERE id IN (13089576,3002681,3117763,1622233,2941590,12305279,1732672,2446772,3189510,13084725,4943929,5855071,6572137,2266261,3003496,2024860,3336832,13758671,6477694,1796684,13001771,4690025,1071744,1017876,5175795,795988,1619821,2481819,2941090,4770802,13438250,3254708,2323402,526303,13219855,3313573,3190479,1733761,3300577,2941758,6474118,1733379,11523598,4205064,6521805,2492903,1860388,3337093,5205317,1213970,5442738,12194039,1214203,12970536,3076611,3126152,3677156,5305021,2751587,4954875,875480,2105172,5309382,12981920,5204330,13729768,3254503,5030441,2680750,590661,1338572,7272410,1860386,2567550,5434143,1918035,5329411,1683235,3254119,5175784,1855380,3336834,2102567,4749746,37269,3207031,6464336,2227907,2713471,3937600,2940442,2233821,5619141,5204711,5988803,5050821,10109926,5226877,5050275,1874115,13677832,5338699,2423773,6432937,6443660,1990611,6090667,6527411,6568731,3254846,3414049,2011907,5180984,12178711,8558260,3130655,5864745,2059318,3480233,2104948,2387703,1939395,5356002,2681209,1184622,1184456,10390165,510854,7983305,795991,2622393,4490187,9436477,5356051,2423464,5205318,1600499,13623229,3255205,12200483,6477706,3445661,5226284,1176639,13760962,2101681,6022818,12909371,1732457,2377496,7260091,12191702,2492899,2630691,13047691,1684470,9382108,2233737,13117701,1796698,2535914,4941741,4565958,1100410,2321180,13080467,813342,4563877,4689365,2104756,1102802,2714488,3188947,1599770,1558291,5592740,5233428,5204830,1574452,3188956,13693326,2102349,3704111,1748303,790889,9323280,4741494,2387900,5338213,3583795,2283942,3189482,3002296,4490123,3585020,962926,3481423,1600920,1682364,4693123,6487778,2677582,2377195);
When I run the slow query through the profiler using SQL_NO_CACHE it looks says
203 rows in set (0.03 sec)
show profile for query 33;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000187 |
| checking permissions | 0.000012 |
| Opening tables | 0.000034 |
| System lock | 0.000016 |
| init | 0.000087 |
| optimizing | 0.000024 |
| statistics | 0.028694 |
| preparing | 0.000074 |
| executing | 0.000005 |
| Sending data | 0.001596 |
| end | 0.000009 |
| query end | 0.000008 |
| closing tables | 0.000014 |
| freeing items | 0.001600 |
| logging slow query | 0.000007 |
| cleaning up | 0.000011 |
+----------------------+----------+
when I run the query with explain it says
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
| 1 | SIMPLE | test_data | range | PRIMARY,id_email | id_email | 4 | NULL | 203 | Using where; Using index |
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
the create table looks like
CREATE TABLE `test_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) DEFAULT NULL,
`domain` varchar(254) DEFAULT NULL,
`age` smallint(6) DEFAULT NULL,
`gender` tinyint(1) DEFAULT NULL,
`location_id` int(11) unsigned DEFAULT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`unistall_date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`subscription_date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `domain` (`domain`),
KEY `id_email` (`id`,`email`),
KEY `email_id` (`email`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13848530 DEFAULT CHARSET=utf8
There is another query that gets run regularly selecting the id and email from a list of email addresses hence the email, id key, the email address need to be unique hence why that is a unique key. The table only has ~14M rows
I thought maybe the indexes where getting to big for memory and swapping but the box has 8 gigs of ram.
SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024 "Data Base Size in MB", SUM( index_length) / 1024 / 1024 "Index Size in MB" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index Size in MB |
+--------------------+----------------------+------------------+
| metrics | 3192.50000000 | 1594.42187500 |
| data | 8096.48437500 | 5639.51562500 |
| raw_data | 6000.35937500 | 745.07812500 |
| information_schema | 0.00878906 | 0.00878906 |
| mysql | 0.04319191 | 0.04101563 |
| performance_schema | 0.00000000 | 0.00000000 |
+--------------------+----------------------+------------------+
Setting innodb_file_per_table=1 in the my.cnf file appears to have solved the issue.
This improved the execution time, my understanding is that having the single file per table means that the disk needle doesn't need to move such large distances.
Questions
If the query can be evaluated using the indexes why does setting innodb_file_per_table=1 improve the performance
Why isnt the query slow when it is run for the profiler without using the cache?
Should my primary key be (id, email) ?
Update
Originally there was no /etc/my.cnf file then I created one with the following
[mysqld]
server-id=1
max_connections=1500
key_buffer_size=50M
query_cache_limit=16M
query_cache_size=256M
thread_cache=16
table_open_cache=4096
sort_buffer_size=512K
join_buffer_size=8M
read_buffer_size=8M
skip_name_resolve=1
thread_cache_size=256
innodb_buffer_pool_size=6G
innodb_buffer_pool_instances=1
innodb_thread_concurrency=96
innodb_additional_mem_pool_size=32M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=256M
innodb_flush_method=O_DIRECT
innodb_file_per_table=1
net_read_timeout=15
net_write_timeout=30
log-bin=mysql-bin
sync_binlog=0
datadir=/var/lib/mysql
You have too much data for your innodb_log_buffer.
What are the values of:
innodb_buffer_pool_size
innodb_log_file_size
All of InnoDB must run in memory. When you split up the files it is running more efficiently because it is swapping in and out of memory with less disk reads and writes as one larger file takes longer to scan for the data.
Its not swapping because your innodb_buffer_pool_size is constraining the amount of memory that MySQL loads into memory.
The only way to fix your problem is get more memory and allocate enough innodb_buffer_pool_size for all of your innodb tables and indexes.

more records takes less time

This is almost driving me insane
I do the following query:
SELECT * FROM `photo_person` WHERE photo_person.photo_id IN (SELECT photo_id FROM photo_person WHERE `photo_person`.`person_id` ='1')
When I change the id, I get different processing time. Although it's all the same queries and tables.
By changing the person_id I get the following:
-- person_id=1 ( 3 total, Query took 0.4523 sec)
-- person_id=2 ( 99 total, Query took 0.1340 sec)
-- person_id=3 ( 470 total, Query took 0.0194 sec)
-- person_id=4 ( 1,869 total, Query took 0.0024 sec)
I do not understand how with the increase of the number of records/results the query time is lower.
The table structures are very straight forward
UPDATE: I have already disabled mysql query cache, so every time I run the query, I would get the same exact value (of course it varies on the milisecond level but this is can be neglected)
UPDATE: table is MyISAM
CREATE TABLE IF NOT EXISTS `photo_person` (
`entry_id` int(11) NOT NULL AUTO_INCREMENT,
`photo_id` int(11) NOT NULL DEFAULT '0',
`person_id` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`entry_id`),
UNIQUE KEY `PhotoID` (`photo_id`,`person_id`),
KEY `photo_id` (`photo_id`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=182072 ;
Here is the results of the profiling
+----------+------------+-----------------------------+
| Query_ID | Duration |Query |
+----------+------------+-----------------------------+
| 1 | 0.45541200 | SELECT ...`person_id` ='1') |
| 2 | 0.44833700 | SELECT ...`person_id` ='2') |
| 3 | 0.45587800 | SELECT ...`person_id` ='3') |
| 4 | 0.45074900 | SELECT ...`person_id` ='4') |
+----------+------------+-----------------------------+
now since the number are the same, it must be the caching :(
So the aparently the caching kicks in a certain number of records or bytes
mysql> SHOW VARIABLES LIKE "%cac%";
+------------------------------+------------+
| Variable_name | Value |
+------------------------------+------------+
| binlog_cache_size | 32768 |
| have_query_cache | YES |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| max_binlog_cache_size | 4294963200 |
| query_cache_limit | 1024 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 1024 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| table_definition_cache | 256 |
| table_open_cache | 64 |
| thread_cache_size | 8 |
+------------------------------+------------+
14 rows in set (0.00 sec)
How are you testing the query speeds? I suspect it's not an appropriate way. The more you query the table, the more likely MySQL is to do some agressive pre-fetching on the table, meaning further queries on the table will be faster, despite they require scanning more data. The reason it is so is because MySQL will not have to load the pages from disk, since it's already pre-fetched them in memory.
As other people have stated, query cache could also mess up you test's results, especially if they implied re-running the query several times in a row to get an "average" runtime.
Add SQL_NO_CACHE to your query to see if it is the cache that tricks you.
To see what is taking time try to use PROFILING like this:
mysql> SET profiling = 1;
mysql> Your select goes here;
mysql> SHOW PROFILES;
Also, try to use the simpler query:
SELECT * FROM photo_person WHERE `photo_person`.`person_id` ='1'
I don't know if MySQL is optimising or not your query, but logically, your and this are equivalent - except that your uses a subquery - always avoid subqueries where possible