I have not had this issue with older versions of MySQL including up to 8.0.21 which I run in production within AWS RDS. I have a query that gets run only once a year. Here is the relevant code:
create table medicare_fee_history (
year int unsigned,
mac int unsigned,
locality int unsigned,
hcpcs varchar(10),
modifier varchar(10),
index (mac, locality, hcpcs, modifier),
non_facility decimal(17, 4),
facility decimal(17, 4)
) engine = myisam;
load data local infile 'PFALL.csv'
into table medicare_fee_history
fields terminated by ',' enclosed by '"'
(year, mac, locality, hcpcs, modifier, non_facility, facility);
create table medicare_fee_first (
year int unsigned,
hcpcs varchar(10),
modifier varchar(10),
index (hcpcs, modifier),
facility decimal(17, 4),
non_facility decimal(17, 4)
) engine = myisam;
insert into medicare_fee_first (year, hcpcs, modifier, facility, non_facility)
(
select min(year), hcpcs, modifier, avg(facility), avg(non_facility)
from medicare_fee_history group by hcpcs, modifier
);
During the insert select I get the following error:
ERROR 1114 (HY000): The table '/tmp/#sql4984_9_3' is full
Table medicare_fee_history has 16042724 rows. To reproduce this, the dataset can be found at https://drive.google.com/file/d/1p7Yf7wsCnBXl7UaxeFC1AP0youl-KCdZ/view?usp=sharing
The query generally returns 10823 rows. If you eliminate avg(facility) and avg(non_facility) it seems to work. There is plenty of space in /tmp. 92% of 100G is free. I set tmp_table_size to max. Here are the current server settings:
mysql> show variables like '%tmp%';
+---------------------------------+----------------------+
| Variable_name | Value |
+---------------------------------+----------------------+
| default_tmp_storage_engine | InnoDB |
| innodb_tmpdir | |
| internal_tmp_mem_storage_engine | TempTable |
| slave_load_tmpdir | /tmp |
| tmp_table_size | 18446744073709551615 |
| tmpdir | /tmp |
+---------------------------------+----------------------+
6 rows in set (0.00 sec)
mysql> show variables like '%temp%';
+-----------------------------+-----------------------+
| Variable_name | Value |
+-----------------------------+-----------------------+
| avoid_temporal_upgrade | OFF |
| innodb_temp_data_file_path | ibtmp1:12M:autoextend |
| innodb_temp_tablespaces_dir | ./#innodb_temp/ |
| show_old_temporals | OFF |
| temptable_max_mmap | 1073741824 |
| temptable_max_ram | 1073741824 |
| temptable_use_mmap | ON |
+-----------------------------+-----------------------+
7 rows in set (0.00 sec)
Any ideas on how to work around this?
I think the relevant setting for you to adjust is temptable_max_mmap
See: https://docs.amazonaws.cn/en_us/AmazonRDS/latest/AuroraUserGuide/ams3-temptable-behavior.html
Example 1
You know that your temporary tables grow to a cumulative size of 20
GiB. You want to set in-memory temporary tables to 2 GiB and to grow
to a maximum of 20 GiB on disk.
Set temptable_max_ram to 2,147,483,648 and temptable_max_mmap to
21,474,836,480. These values are in bytes.
In my slow query log I am seeing slow queries like
# Time: 121107 16:34:02
# User#Host: web_node[web_node] # localhost [127.0.0.1]
# Thread_id: 34436186 Schema: test_db Last_errno: 0 Killed: 0
# Query_time: 1.413751 Lock_time: 0.000222 Rows_sent: 203 Rows_examined: 203 Rows_affected: 0 Rows_read: 203
# Bytes_sent: 7553 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0
# InnoDB_trx_id: 9B04384
SET timestamp=1352334842;
SELECT id, email FROM test_data WHERE id IN (13089576,3002681,3117763,1622233,2941590,12305279,1732672,2446772,3189510,13084725,4943929,5855071,6572137,2266261,3003496,2024860,3336832,13758671,6477694,1796684,13001771,4690025,1071744,1017876,5175795,795988,1619821,2481819,2941090,4770802,13438250,3254708,2323402,526303,13219855,3313573,3190479,1733761,3300577,2941758,6474118,1733379,11523598,4205064,6521805,2492903,1860388,3337093,5205317,1213970,5442738,12194039,1214203,12970536,3076611,3126152,3677156,5305021,2751587,4954875,875480,2105172,5309382,12981920,5204330,13729768,3254503,5030441,2680750,590661,1338572,7272410,1860386,2567550,5434143,1918035,5329411,1683235,3254119,5175784,1855380,3336834,2102567,4749746,37269,3207031,6464336,2227907,2713471,3937600,2940442,2233821,5619141,5204711,5988803,5050821,10109926,5226877,5050275,1874115,13677832,5338699,2423773,6432937,6443660,1990611,6090667,6527411,6568731,3254846,3414049,2011907,5180984,12178711,8558260,3130655,5864745,2059318,3480233,2104948,2387703,1939395,5356002,2681209,1184622,1184456,10390165,510854,7983305,795991,2622393,4490187,9436477,5356051,2423464,5205318,1600499,13623229,3255205,12200483,6477706,3445661,5226284,1176639,13760962,2101681,6022818,12909371,1732457,2377496,7260091,12191702,2492899,2630691,13047691,1684470,9382108,2233737,13117701,1796698,2535914,4941741,4565958,1100410,2321180,13080467,813342,4563877,4689365,2104756,1102802,2714488,3188947,1599770,1558291,5592740,5233428,5204830,1574452,3188956,13693326,2102349,3704111,1748303,790889,9323280,4741494,2387900,5338213,3583795,2283942,3189482,3002296,4490123,3585020,962926,3481423,1600920,1682364,4693123,6487778,2677582,2377195);
When I run the slow query through the profiler using SQL_NO_CACHE it looks says
203 rows in set (0.03 sec)
show profile for query 33;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000187 |
| checking permissions | 0.000012 |
| Opening tables | 0.000034 |
| System lock | 0.000016 |
| init | 0.000087 |
| optimizing | 0.000024 |
| statistics | 0.028694 |
| preparing | 0.000074 |
| executing | 0.000005 |
| Sending data | 0.001596 |
| end | 0.000009 |
| query end | 0.000008 |
| closing tables | 0.000014 |
| freeing items | 0.001600 |
| logging slow query | 0.000007 |
| cleaning up | 0.000011 |
+----------------------+----------+
when I run the query with explain it says
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
| 1 | SIMPLE | test_data | range | PRIMARY,id_email | id_email | 4 | NULL | 203 | Using where; Using index |
+----+-------------+------------------+-------+------------------+----------+---------+------+------+--------------------------+
the create table looks like
CREATE TABLE `test_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) DEFAULT NULL,
`domain` varchar(254) DEFAULT NULL,
`age` smallint(6) DEFAULT NULL,
`gender` tinyint(1) DEFAULT NULL,
`location_id` int(11) unsigned DEFAULT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`unistall_date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`subscription_date` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `domain` (`domain`),
KEY `id_email` (`id`,`email`),
KEY `email_id` (`email`,`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13848530 DEFAULT CHARSET=utf8
There is another query that gets run regularly selecting the id and email from a list of email addresses hence the email, id key, the email address need to be unique hence why that is a unique key. The table only has ~14M rows
I thought maybe the indexes where getting to big for memory and swapping but the box has 8 gigs of ram.
SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024 "Data Base Size in MB", SUM( index_length) / 1024 / 1024 "Index Size in MB" FROM information_schema.TABLES GROUP BY table_schema;
+--------------------+----------------------+------------------+
| Data Base Name | Data Base Size in MB | Index Size in MB |
+--------------------+----------------------+------------------+
| metrics | 3192.50000000 | 1594.42187500 |
| data | 8096.48437500 | 5639.51562500 |
| raw_data | 6000.35937500 | 745.07812500 |
| information_schema | 0.00878906 | 0.00878906 |
| mysql | 0.04319191 | 0.04101563 |
| performance_schema | 0.00000000 | 0.00000000 |
+--------------------+----------------------+------------------+
Setting innodb_file_per_table=1 in the my.cnf file appears to have solved the issue.
This improved the execution time, my understanding is that having the single file per table means that the disk needle doesn't need to move such large distances.
Questions
If the query can be evaluated using the indexes why does setting innodb_file_per_table=1 improve the performance
Why isnt the query slow when it is run for the profiler without using the cache?
Should my primary key be (id, email) ?
Update
Originally there was no /etc/my.cnf file then I created one with the following
[mysqld]
server-id=1
max_connections=1500
key_buffer_size=50M
query_cache_limit=16M
query_cache_size=256M
thread_cache=16
table_open_cache=4096
sort_buffer_size=512K
join_buffer_size=8M
read_buffer_size=8M
skip_name_resolve=1
thread_cache_size=256
innodb_buffer_pool_size=6G
innodb_buffer_pool_instances=1
innodb_thread_concurrency=96
innodb_additional_mem_pool_size=32M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=256M
innodb_flush_method=O_DIRECT
innodb_file_per_table=1
net_read_timeout=15
net_write_timeout=30
log-bin=mysql-bin
sync_binlog=0
datadir=/var/lib/mysql
You have too much data for your innodb_log_buffer.
What are the values of:
innodb_buffer_pool_size
innodb_log_file_size
All of InnoDB must run in memory. When you split up the files it is running more efficiently because it is swapping in and out of memory with less disk reads and writes as one larger file takes longer to scan for the data.
Its not swapping because your innodb_buffer_pool_size is constraining the amount of memory that MySQL loads into memory.
The only way to fix your problem is get more memory and allocate enough innodb_buffer_pool_size for all of your innodb tables and indexes.
This is almost driving me insane
I do the following query:
SELECT * FROM `photo_person` WHERE photo_person.photo_id IN (SELECT photo_id FROM photo_person WHERE `photo_person`.`person_id` ='1')
When I change the id, I get different processing time. Although it's all the same queries and tables.
By changing the person_id I get the following:
-- person_id=1 ( 3 total, Query took 0.4523 sec)
-- person_id=2 ( 99 total, Query took 0.1340 sec)
-- person_id=3 ( 470 total, Query took 0.0194 sec)
-- person_id=4 ( 1,869 total, Query took 0.0024 sec)
I do not understand how with the increase of the number of records/results the query time is lower.
The table structures are very straight forward
UPDATE: I have already disabled mysql query cache, so every time I run the query, I would get the same exact value (of course it varies on the milisecond level but this is can be neglected)
UPDATE: table is MyISAM
CREATE TABLE IF NOT EXISTS `photo_person` (
`entry_id` int(11) NOT NULL AUTO_INCREMENT,
`photo_id` int(11) NOT NULL DEFAULT '0',
`person_id` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`entry_id`),
UNIQUE KEY `PhotoID` (`photo_id`,`person_id`),
KEY `photo_id` (`photo_id`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=182072 ;
Here is the results of the profiling
+----------+------------+-----------------------------+
| Query_ID | Duration |Query |
+----------+------------+-----------------------------+
| 1 | 0.45541200 | SELECT ...`person_id` ='1') |
| 2 | 0.44833700 | SELECT ...`person_id` ='2') |
| 3 | 0.45587800 | SELECT ...`person_id` ='3') |
| 4 | 0.45074900 | SELECT ...`person_id` ='4') |
+----------+------------+-----------------------------+
now since the number are the same, it must be the caching :(
So the aparently the caching kicks in a certain number of records or bytes
mysql> SHOW VARIABLES LIKE "%cac%";
+------------------------------+------------+
| Variable_name | Value |
+------------------------------+------------+
| binlog_cache_size | 32768 |
| have_query_cache | YES |
| key_cache_age_threshold | 300 |
| key_cache_block_size | 1024 |
| key_cache_division_limit | 100 |
| max_binlog_cache_size | 4294963200 |
| query_cache_limit | 1024 |
| query_cache_min_res_unit | 4096 |
| query_cache_size | 1024 |
| query_cache_type | ON |
| query_cache_wlock_invalidate | OFF |
| table_definition_cache | 256 |
| table_open_cache | 64 |
| thread_cache_size | 8 |
+------------------------------+------------+
14 rows in set (0.00 sec)
How are you testing the query speeds? I suspect it's not an appropriate way. The more you query the table, the more likely MySQL is to do some agressive pre-fetching on the table, meaning further queries on the table will be faster, despite they require scanning more data. The reason it is so is because MySQL will not have to load the pages from disk, since it's already pre-fetched them in memory.
As other people have stated, query cache could also mess up you test's results, especially if they implied re-running the query several times in a row to get an "average" runtime.
Add SQL_NO_CACHE to your query to see if it is the cache that tricks you.
To see what is taking time try to use PROFILING like this:
mysql> SET profiling = 1;
mysql> Your select goes here;
mysql> SHOW PROFILES;
Also, try to use the simpler query:
SELECT * FROM photo_person WHERE `photo_person`.`person_id` ='1'
I don't know if MySQL is optimising or not your query, but logically, your and this are equivalent - except that your uses a subquery - always avoid subqueries where possible