i recently upgraded from mySQL 5.6.34 -> 8.0.16 (on macOS 10.14.5) and i am noticing very strange behavior with the row counts returned from "SHOW TABLE STATUS" as well as the row counts in the "information_schema" table. consider this simple schema:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `test` (`id`, `name`) VALUES
(1, 'one'),
(2, 'two'),
(3, 'three'),
(4, 'four'),
(5, 'five');
when i then run the following query i see the expected output:
SELECT * FROM test;
+----+-------+
| id | name |
+----+-------+
| 1 | one |
| 2 | two |
| 3 | three |
| 4 | four |
| 5 | five |
+----+-------+
likewise when i then run the following query i see the expected output:
SELECT COUNT(*) FROM test;
+----------+
| COUNT(*) |
+----------+
| 5 |
+----------+
however when i then run the following query:
SHOW TABLE STATUS \G
*************************** 1. row ***************************
Name: test
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 0
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 0
Auto_increment: 1
Create_time: 2019-05-30 13:56:46
Update_time: 2019-05-30 16:02:24
Check_time: NULL
Collation: utf8_unicode_ci
Checksum: NULL
Create_options:
Comment:
it appears that there are no rows (even though there are 5). likewise i see the same results when i run:
SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema = 'test';
+------------+------------+
| TABLE_NAME | TABLE_ROWS |
+------------+------------+
| test | 0 |
+------------+------------+
no rows? if i add/delete rows to the table the counts do not change. only after i run:
ANALYZE TABLE `test`
...do i see all of the row counts as correct. i am only seeing this on mySQL 8. everything worked as expected on mySQL 5. i am aware of problems with accurate row counts using InnoDB tables, but these are all MyISAM tables, which should always show the correct row counts. any help is appreciated. thanks.
The information schema tables underwent significant, incompatible changes in MySQL 8 with the introduction of the global data dictionary:
Previously, INFORMATION_SCHEMA queries for table statistics in the STATISTICS and TABLES tables retrieved statistics directly from storage engines. As of MySQL 8.0, cached table statistics are used by default.
The cache is controlled by the system variable information_schema_stats_expiry:
Some INFORMATION_SCHEMA tables contain columns that provide table statistics:
[...] TABLES.TABLE_ROWS [...]
Those columns represent dynamic table metadata; that is, information that changes as table contents change.
By default, MySQL retrieves cached values for those columns from the mysql.index_stats and mysql.table_stats dictionary tables when the columns are queried, which is more efficient than retrieving statistics directly from the storage engine. If cached statistics are not available or have expired, MySQL retrieves the latest statistics from the storage engine and caches them in the mysql.index_stats and mysql.table_stats dictionary tables. Subsequent queries retrieve the cached statistics until the cached statistics expire.
[...]
To update cached values at any time for a given table, use ANALYZE TABLE.
To always retrieve the latest statistics directly from the storage engine and bypass cached values, set information_schema_stats_expiry to 0.
This is consistent with your behaviour.
You can set information_schema_stats_expiry globally to 0, or per session whenever you need accurate statistics.
Related
I have a very simple query that is running extremely slowly despite being indexed.
My table is as follows:
mysql> show create table mytable
CREATE TABLE `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_time` datetime DEFAULT NULL,
`status` varchar(64) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_status_user_id_start_time` (`status`,`user_id`,`start_time`),
### other columns and indices, not relevant
) ENGINE=InnoDB AUTO_INCREMENT=115884841 DEFAULT CHARSET=utf8
Then the following query takes more than 10 seconds to run:
select id from mytable USE INDEX (ix_status_user_id_start_time) where status = 'running';
There are about 7 million rows in the table, and approximately 200 of rows have status running.
I would expect this query to take less than a tenth of a second. It should find the first row in the index with status running. And then scan the next 200 rows until it finds the first non-running row. It should not need to look outside the index.
When I explain the query I get a very strange result:
mysql> explain select id from mytable USE INDEX (ix_status_user_id_start_time) where status =
'running';
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
| 1 | SIMPLE | mytable | NULL | ref | ix_status_user_id_start_time | ix_status_user_id_start_time | 195 | const | 2118793 | 100.00 | Using index |
+----+-------------+---------+------------+------+------------------------------+------------------------------+---------+-------+---------+----------+-------------+
It is estimating a scan of more than 2 million rows! Also, the cardinality of the status index does not seem correct. There are only about 5 or 6 different statuses, not 344.
Other info
There are somewhat frequent insertions and updates to this table. About 2 rows inserted per second, and 10 statuses updated per second. I don't know how much impact this has, but I would not expect it to be 30 seconds worth.
If I query by both status and user_id, sometimes it is fast (sub 0.1s) and sometimes it is slow (> 1s), depending on the user_id. This does not seem to depend on the size of the result set (some users with 20 rows are quick, others with 4 are slow)
Can anybody explain what is going on here and how it can be fixed?
I am using mysql version 5.7.33
As already mentioned in the comment, you are using many indexes on a big table. So the required memory for this indexes is very high.
You can increase the index buffer size in the my.cnf by changing the innodb_buffer_pool_size to a higher value.
But probably it is more efficient to use less indexes and do not use combined indexes if not absolutely needed.
My guess is, that if you remove all indexes and create only one on status this query will run in under 1s.
I need some help figuring out a performance issue. A database containing a single table with a growing number of METARs (aviation weather reports) is slowing down after about 8 million records are present. This despite indexes being in use. Performance can be recovered by rebuilding indexes, but that's really slow and takes the database offline, so I've resorted to just dropping the table and recreating it (losing the last few weeks of data).
The behaviour is the same whether a query is run trying to retrieve an actual metar, or whether a simple select count(*) is executed.
The table creation syntax is as follows:
CREATE TABLE `metars` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`tstamp` timestamp NULL DEFAULT NULL,
`metar` varchar(255) DEFAULT NULL,
`icao` char(7) DEFAULT NULL,
`qnh` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `timestamp` (`tstamp`),
KEY `icao` (`icao`),
KEY `qnh` (`qnh`),
KEY `metar` (`metar`)
) ENGINE=InnoDB AUTO_INCREMENT=812803050 DEFAULT CHARSET=latin1;
Up to about 8 million records, a select count(*) returns in about 500ms. Then it gradually increases, currently again at 14 million records, the count takes between 3 and 30 seconds. I was surprised to see that when explaining the count query, it's using the timestamp as an index, not the primary key. Using the primary key this should be a matter of just a few ms to return the number of records:
mysql> explain select count(*) from metars;
+----+-------------+--------+-------+---------------+-----------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-----------+---------+------+----------+-------------+
| 1 | SIMPLE | metars | index | NULL | timestamp | 5 | NULL | 14693048 | Using index |
+----+-------------+--------+-------+---------------+-----------+---------+------+----------+-------------+
1 row in set (0.00 sec)
Forcing it to use the primary index is even slower:
mysql> select count(*) from metars use index(PRIMARY);
+----------+
| count(*) |
+----------+
| 14572329 |
+----------+
1 row in set (37.87 sec)
Oddly, the typical use case query is to get the weather for an airport nearest to a specific point in time which continues to perform very well, despite being more complex than a simple count:
mysql> SELECT qnh, metar from metars WHERE icao like 'KLAX' ORDER BY ABS(TIMEDIFF(tstamp, STR_TO_DATE('2019-10-10 00:00:00', '%Y-%m-%d %H:%i:%s'))) LIMIT 0,1;
+------+-----------------------------------------------------------------------------------------+
| qnh | metar |
+------+-----------------------------------------------------------------------------------------+
| 2980 | KLAX 092353Z 25012KT 10SM FEW015 20/14 A2980 RMK AO2 SLP091 T02000139 10228 20200 56007 |
+------+-----------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
What am I doing wrong here?
InnoDB performs a plain COUNT(*) by traversing some index. It prefers the smallest index because that will require touching the least number of blocks.
The PRIMARY KEY is clustered with the data, so that index is actually the biggest.
What version are you using? TIMESTAMP changed at some point. Perhaps that explains why tstamp is used instead of qnh.
If you are purging old data by using DELETE, see http://mysql.rjweb.org/doc.php/partitionmaint for a faster way.
I assume the data is static; that is it is never UPDATEd? Consider building and maintaining a summary table, perhaps indexed by date. This could have various counts for each day. Then a fetch from that table would be much faster than hitting the raw data. More: http://mysql.rjweb.org/doc.php/summarytables
How many rows for KLAX? That query must fetch all of them in order to convert the timestamp before doing the LIMIT. If you had INDEX(icao, tstamp), you could find the next before or after a given time even faster.
This question already has an answer here:
Mysql inconsistent number of rows count(*) vs table.table_rows in information_schema
(1 answer)
Closed 6 years ago.
Figures reported by MySQL count(*) and on information_schema.TABLES are totally different.
mysql> SELECT * FROM information_schema.TABLES WHERE TABLE_NAME = 'my_table'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: my_db
TABLE_NAME: my_table
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Compact
TABLE_ROWS: 31016698
AVG_ROW_LENGTH: 399
DATA_LENGTH: 12378439680
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 4863262720
DATA_FREE: 5242880
AUTO_INCREMENT: NULL
CREATE_TIME: 2016-06-14 18:54:24
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: utf8_general_ci
CHECKSUM: NULL
CREATE_OPTIONS:
TABLE_COMMENT:
1 row in set (0.00 sec)
mysql> select count(*) from my_table;
+----------+
| count(*) |
+----------+
| 46406095 |
+----------+
1 row in set (27.45 sec)
Note that there are 31,016,698 rows according to information_schema, count() however report 46,406,095 rows...
Now which one can be trusted? Why these stats are different?
I'm using MySQL server v5.6.30.
The count in that metadata, similar to the output of SHOW TABLE STATUS, cannot be trusted. It's often off by a factor of 100 or more, either over or under.
The reason for this is the engine does not know how many rows are in the table until it calculates this. Under heavy load you might have a lot of contention on the primary key index which makes pinning down an exact value an expensive computation.
This approximation is computed based on the total data length divided by the average row length. It's rarely even close to what it should be unless your records are all about the same length and you haven't been deleting a lot of them.
The only value that can be truly trusted is COUNT(*) but that operation can take a long time to complete, so be warned.
I have 2 tables which are connected with a relationship table.
More details about the tables:
stores (Currently 140.000 rows)
id (index)
store_name
city_id (index)
...
categories (Currently 400 rows)
id (index)
cat_name
store_cat_relation
store_id
cat_id
Every store belongs in one or more categories.
In the store_cat_relation table, I have indexes on (store_id, cat_id) and (cat_id, store_id).
I need to find the total amount of let's say supermarkets (cat_id = 1) in Paris (city_id = 1). I have a working query, but it takes too long when the database contains lots of stores in Paris or the database has lots of supermarkets.
This is my query:
SELECT COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id
This query takes about 0,05s. Database contains about 8000 supermarkets (stores with category 1) and about 8000 stores in Paris (store_id = 1). Combined 550 supermarkets in Paris at the moment.
I want to reduce the query time to below 0,01s because the database is only getting bigger.
The result of EXPLAIN is this:
id: 1
select_type: SIMPLE
table: store_cat_relation
type: ref
possible_keys: cat_id_store_id, store_id_cat_id
key: cat_id_store_id
key_len: 4
ref: const
rows: 8043
Extra: Using index
***************************************
id: 1
select_type: SIMPLE
table: stores
type: eq_ref
possible_keys: PRIMARY, city_id
key: PRIMARY
key_len: 4
ref: store_cat_relation.store_id
rows: 1
Extra: Using index condition; Using where
Anyone an idea why this query takes so long?
EDIT: I also created a SQL fiddle with 300 rows per table. With low amount of rows, it's quite fast, but I need it to be fast with +100.000 rows.
http://sqlfiddle.com/#!9/675a3/1
i have made some test and the best performance is to use the Query cache. You can enable them and use it ON DEMAND. so you can say which query are insert into the cache. if you want to use it you must make the changes in the /etc/my.cnf to make them persistent. If you change the tables you can also run some queries to warm up the cache
Here a Sample
Table size
MariaDB [yourSchema]> select count(*) from stores;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1 min 23.50 sec)
MariaDB [yourSchema]> select count(*) from store_cat_relation;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (2.45 sec)
MariaDB [yourSchema]>
Verify cache is on
MariaDB [yourSchema]> SHOW VARIABLES LIKE 'have_query_cache';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| have_query_cache | YES |
+------------------+-------+
1 row in set (0.01 sec)
set cache size and on DEMAND
MariaDB [yourSchema]> SET GLOBAL query_cache_size = 1000000;
Query OK, 0 rows affected, 1 warning (0.00 sec)
MariaDB [yourSchema]> SET GLOBAL query_cache_type=DEMAND;
Query OK, 0 rows affected (0.00 sec)
Enable Profiling
MariaDB [yourSchema]> set profiling=on;
First execute your query - takes 0.68 sec
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.68 sec)
now get it from cache
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.00 sec)
see the Profile with duration in uS
MariaDB [yourSchema]> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000039 |
| Waiting for query cache lock | 0.000008 |
| init | 0.000005 |
| checking query cache for query | 0.000056 |
| checking privileges on cached | 0.000026 |
| checking permissions | 0.000014 |
| checking permissions | 0.000025 |
| sending cached result to clien | 0.000027 |
| updating status | 0.000048 |
| cleaning up | 0.000025 |
+--------------------------------+----------+
10 rows in set (0.05 sec)
MariaDB [yourSchema]>
What you are looking at is index scenarios:
Using the optimizer a DBMS tries to find the optimal path to the data. Depending on the data itself, this can lead to different access paths depending on the conditions (WHERE/JOINS/GROUP BY, sometimes ORDER BY) supplied. The data distribution in this can be key to fast queries or very slow queries.
So you have at this moment 2 tables, store and store_cat_relation. On store you have 2 indexes:
id (primary)
city_id
You have a where on city_id, and a join on id. The internal execution in the DBMS engine is then as follows:
1) Read index city_id
2) Then read table (ok, primary key index) to find id
3) Join on ID
This can be a bit more optimized with a multi column index:
CREATE INDEX idx_nn_1 ON store(city_id,id);
This should result in:
1) Read index idx_nn_1
2) Join using this index idx_nn_1
You do have fairly lob sided data in your current example with all city_id=1 in your example. This kind of distribution of the data in the real data, can give you problems since where city_id= is then similar to saying "Just select everything from table store". The histogram information on that column can result into a different plan in those kind of cases, however if your data distribution is not so lob sided, it should work nicely.
On your second table store_cat_relation you might try an index like this:
CREATE INDEX idx_nn_2 ON store_cat_relation(store_id,cat_id);
To see if the DBMS then decides that leads to a better data access path.
With every join you see, study the join and see if a multi column index can reduce the number of reads.
Do not index all your columns: Too many columns in an index will lead to slower inserts and updates.
Also some scenarios might require you to create indexes in different order, leading to many indexes on a table (one with column(1,2,3), the next with column(1,3,2), etc). That is also not a real happy scenario, in which single column or a limitation of the columns and just reading the table for column 2,3 might be preferred.
Indexing requires testing your most common scenarios, which can be a lot of fun since you will see how a slow query running for seconds can suddenly run within 100s of seconds or even faster.
I've partitioned my table horizontally and I'd like to see how the rows are currently distributed. Searching the web didn't bring any relevant results.
Could anyone tell me if this is possible?
You could get rows of each partitions using information_schema.
Here are my sample tests.
mysql> SELECT PARTITION_ORDINAL_POSITION, TABLE_ROWS, PARTITION_METHOD
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'db_name' AND TABLE_NAME = 'tbl_name';
+----------------------------+------------+------------------+
| PARTITION_ORDINAL_POSITION | TABLE_ROWS | PARTITION_METHOD |
+----------------------------+------------+------------------+
| 1 | 2 | HASH |
| 2 | 3 | HASH |
+----------------------------+------------+------------------+
mysql> SHOW CREATE TABLE tbl_name\G
*************************** 1. row ***************************
Table: p
Create Table: CREATE TABLE `tbl_name` (
`a` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (a)
PARTITIONS 2 */
1 row in set (0.00 sec)
mysql> SELECT * FROM tbl_name;
+------+
| a |
+------+
| 2 |
| 4 |
| 1 |
| 3 |
| 5 |
+------+
5 rows in set (0.00 sec)
UPDATED
From MySQL Manual:
For partitioned InnoDB tables, the row count given in the TABLE_ROWS column is only an estimated value used in SQL optimization, and may not always be exact.
Thanks to #Constantine.
Just to add to Jason's answers, as per reference manual, you have following ways to get information about existing partitions on your table -
Using Show Create Table - to view the partitioning clauses used in creating a partitioned table;
Syntax :
show create table table_name;
Sample Output :
CREATE TABLE 'trb3' (
'id' int(11) default NULL,
'name' varchar(50) default NULL,
'purchased' date default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
PARTITION BY RANGE (YEAR(purchased)) (
PARTITION p0 VALUES LESS THAN (1990) ENGINE = MyISAM,
PARTITION p1 VALUES LESS THAN (1995) ENGINE = MyISAM,
PARTITION p2 VALUES LESS THAN (2000) ENGINE = MyISAM,
PARTITION p3 VALUES LESS THAN (2005) ENGINE = MyISAM
)
Using Show Table Status - to determine whether a table is partitioned;
Syntax:
show table status in db_name like table_name;
Sample Output: Shows lots of information about table like Name, Engine, Version, Data Length etc. You get value 'partitioned' for 'Create_options' parameter in output.
Querying the INFORMATION_SCHEMA.PARTITIONS table. (Refer Jason's answers, you can optionally add SUBPARTITION_NAME, SUBPARTITION_ORDINAL_POSITION, SUBPARTITION_METHOD, PARTITION_EXPRESSION etc Select parameters to get more information. Refer MySQL Ref Manual )
Using the statement EXPLAIN PARTITIONS SELECT - see which partitions are used by a given SELECT
Syntax: EXPLAIN PARTITIONS SELECT * FROM trb1
Sample Output:
id: 1
select_type: SIMPLE
table: trb1
partitions: p0,p1,p2,p3
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10
Extra: Using filesort
Read More At MySQL Ref Manual