MySQL information_schema report less rows than count() [duplicate] - mysql

This question already has an answer here:
Mysql inconsistent number of rows count(*) vs table.table_rows in information_schema
(1 answer)
Closed 6 years ago.
Figures reported by MySQL count(*) and on information_schema.TABLES are totally different.
mysql> SELECT * FROM information_schema.TABLES WHERE TABLE_NAME = 'my_table'\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: my_db
TABLE_NAME: my_table
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Compact
TABLE_ROWS: 31016698
AVG_ROW_LENGTH: 399
DATA_LENGTH: 12378439680
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 4863262720
DATA_FREE: 5242880
AUTO_INCREMENT: NULL
CREATE_TIME: 2016-06-14 18:54:24
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: utf8_general_ci
CHECKSUM: NULL
CREATE_OPTIONS:
TABLE_COMMENT:
1 row in set (0.00 sec)
mysql> select count(*) from my_table;
+----------+
| count(*) |
+----------+
| 46406095 |
+----------+
1 row in set (27.45 sec)
Note that there are 31,016,698 rows according to information_schema, count() however report 46,406,095 rows...
Now which one can be trusted? Why these stats are different?
I'm using MySQL server v5.6.30.

The count in that metadata, similar to the output of SHOW TABLE STATUS, cannot be trusted. It's often off by a factor of 100 or more, either over or under.
The reason for this is the engine does not know how many rows are in the table until it calculates this. Under heavy load you might have a lot of contention on the primary key index which makes pinning down an exact value an expensive computation.
This approximation is computed based on the total data length divided by the average row length. It's rarely even close to what it should be unless your records are all about the same length and you haven't been deleting a lot of them.
The only value that can be truly trusted is COUNT(*) but that operation can take a long time to complete, so be warned.

Related

mysql 8 MyISAM "SHOW TABLE STATUS" row count

i recently upgraded from mySQL 5.6.34 -> 8.0.16 (on macOS 10.14.5) and i am noticing very strange behavior with the row counts returned from "SHOW TABLE STATUS" as well as the row counts in the "information_schema" table. consider this simple schema:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `test` (`id`, `name`) VALUES
(1, 'one'),
(2, 'two'),
(3, 'three'),
(4, 'four'),
(5, 'five');
when i then run the following query i see the expected output:
SELECT * FROM test;
+----+-------+
| id | name |
+----+-------+
| 1 | one |
| 2 | two |
| 3 | three |
| 4 | four |
| 5 | five |
+----+-------+
likewise when i then run the following query i see the expected output:
SELECT COUNT(*) FROM test;
+----------+
| COUNT(*) |
+----------+
| 5 |
+----------+
however when i then run the following query:
SHOW TABLE STATUS \G
*************************** 1. row ***************************
Name: test
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 0
Avg_row_length: 0
Data_length: 0
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 0
Auto_increment: 1
Create_time: 2019-05-30 13:56:46
Update_time: 2019-05-30 16:02:24
Check_time: NULL
Collation: utf8_unicode_ci
Checksum: NULL
Create_options:
Comment:
it appears that there are no rows (even though there are 5). likewise i see the same results when i run:
SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema = 'test';
+------------+------------+
| TABLE_NAME | TABLE_ROWS |
+------------+------------+
| test | 0 |
+------------+------------+
no rows? if i add/delete rows to the table the counts do not change. only after i run:
ANALYZE TABLE `test`
...do i see all of the row counts as correct. i am only seeing this on mySQL 8. everything worked as expected on mySQL 5. i am aware of problems with accurate row counts using InnoDB tables, but these are all MyISAM tables, which should always show the correct row counts. any help is appreciated. thanks.
The information schema tables underwent significant, incompatible changes in MySQL 8 with the introduction of the global data dictionary:
Previously, INFORMATION_SCHEMA queries for table statistics in the STATISTICS and TABLES tables retrieved statistics directly from storage engines. As of MySQL 8.0, cached table statistics are used by default.
The cache is controlled by the system variable information_schema_stats_expiry:
Some INFORMATION_SCHEMA tables contain columns that provide table statistics:
[...] TABLES.TABLE_ROWS [...]
Those columns represent dynamic table metadata; that is, information that changes as table contents change.
By default, MySQL retrieves cached values for those columns from the mysql.index_stats and mysql.table_stats dictionary tables when the columns are queried, which is more efficient than retrieving statistics directly from the storage engine. If cached statistics are not available or have expired, MySQL retrieves the latest statistics from the storage engine and caches them in the mysql.index_stats and mysql.table_stats dictionary tables. Subsequent queries retrieve the cached statistics until the cached statistics expire.
[...]
To update cached values at any time for a given table, use ANALYZE TABLE.
To always retrieve the latest statistics directly from the storage engine and bypass cached values, set information_schema_stats_expiry to 0.
This is consistent with your behaviour.
You can set information_schema_stats_expiry globally to 0, or per session whenever you need accurate statistics.

MySQL many-to-many relation slow on big table

I have 2 tables which are connected with a relationship table.
More details about the tables:
stores (Currently 140.000 rows)
id (index)
store_name
city_id (index)
...
categories (Currently 400 rows)
id (index)
cat_name
store_cat_relation
store_id
cat_id
Every store belongs in one or more categories.
In the store_cat_relation table, I have indexes on (store_id, cat_id) and (cat_id, store_id).
I need to find the total amount of let's say supermarkets (cat_id = 1) in Paris (city_id = 1). I have a working query, but it takes too long when the database contains lots of stores in Paris or the database has lots of supermarkets.
This is my query:
SELECT COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id
This query takes about 0,05s. Database contains about 8000 supermarkets (stores with category 1) and about 8000 stores in Paris (store_id = 1). Combined 550 supermarkets in Paris at the moment.
I want to reduce the query time to below 0,01s because the database is only getting bigger.
The result of EXPLAIN is this:
id: 1
select_type: SIMPLE
table: store_cat_relation
type: ref
possible_keys: cat_id_store_id, store_id_cat_id
key: cat_id_store_id
key_len: 4
ref: const
rows: 8043
Extra: Using index
***************************************
id: 1
select_type: SIMPLE
table: stores
type: eq_ref
possible_keys: PRIMARY, city_id
key: PRIMARY
key_len: 4
ref: store_cat_relation.store_id
rows: 1
Extra: Using index condition; Using where
Anyone an idea why this query takes so long?
EDIT: I also created a SQL fiddle with 300 rows per table. With low amount of rows, it's quite fast, but I need it to be fast with +100.000 rows.
http://sqlfiddle.com/#!9/675a3/1
i have made some test and the best performance is to use the Query cache. You can enable them and use it ON DEMAND. so you can say which query are insert into the cache. if you want to use it you must make the changes in the /etc/my.cnf to make them persistent. If you change the tables you can also run some queries to warm up the cache
Here a Sample
Table size
MariaDB [yourSchema]> select count(*) from stores;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1 min 23.50 sec)
MariaDB [yourSchema]> select count(*) from store_cat_relation;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (2.45 sec)
MariaDB [yourSchema]>
Verify cache is on
MariaDB [yourSchema]> SHOW VARIABLES LIKE 'have_query_cache';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| have_query_cache | YES |
+------------------+-------+
1 row in set (0.01 sec)
set cache size and on DEMAND
MariaDB [yourSchema]> SET GLOBAL query_cache_size = 1000000;
Query OK, 0 rows affected, 1 warning (0.00 sec)
MariaDB [yourSchema]> SET GLOBAL query_cache_type=DEMAND;
Query OK, 0 rows affected (0.00 sec)
Enable Profiling
MariaDB [yourSchema]> set profiling=on;
First execute your query - takes 0.68 sec
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.68 sec)
now get it from cache
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.00 sec)
see the Profile with duration in uS
MariaDB [yourSchema]> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000039 |
| Waiting for query cache lock | 0.000008 |
| init | 0.000005 |
| checking query cache for query | 0.000056 |
| checking privileges on cached | 0.000026 |
| checking permissions | 0.000014 |
| checking permissions | 0.000025 |
| sending cached result to clien | 0.000027 |
| updating status | 0.000048 |
| cleaning up | 0.000025 |
+--------------------------------+----------+
10 rows in set (0.05 sec)
MariaDB [yourSchema]>
What you are looking at is index scenarios:
Using the optimizer a DBMS tries to find the optimal path to the data. Depending on the data itself, this can lead to different access paths depending on the conditions (WHERE/JOINS/GROUP BY, sometimes ORDER BY) supplied. The data distribution in this can be key to fast queries or very slow queries.
So you have at this moment 2 tables, store and store_cat_relation. On store you have 2 indexes:
id (primary)
city_id
You have a where on city_id, and a join on id. The internal execution in the DBMS engine is then as follows:
1) Read index city_id
2) Then read table (ok, primary key index) to find id
3) Join on ID
This can be a bit more optimized with a multi column index:
CREATE INDEX idx_nn_1 ON store(city_id,id);
This should result in:
1) Read index idx_nn_1
2) Join using this index idx_nn_1
You do have fairly lob sided data in your current example with all city_id=1 in your example. This kind of distribution of the data in the real data, can give you problems since where city_id= is then similar to saying "Just select everything from table store". The histogram information on that column can result into a different plan in those kind of cases, however if your data distribution is not so lob sided, it should work nicely.
On your second table store_cat_relation you might try an index like this:
CREATE INDEX idx_nn_2 ON store_cat_relation(store_id,cat_id);
To see if the DBMS then decides that leads to a better data access path.
With every join you see, study the join and see if a multi column index can reduce the number of reads.
Do not index all your columns: Too many columns in an index will lead to slower inserts and updates.
Also some scenarios might require you to create indexes in different order, leading to many indexes on a table (one with column(1,2,3), the next with column(1,3,2), etc). That is also not a real happy scenario, in which single column or a limitation of the columns and just reading the table for column 2,3 might be preferred.
Indexing requires testing your most common scenarios, which can be a lot of fun since you will see how a slow query running for seconds can suddenly run within 100s of seconds or even faster.

why is mysql select count(1) taking so long?

When I first started using MySQL, a select count(*) or select count(1) was almost instantaneous. But I'm now using version 5.6.25 hosted at Dreamhost, and it's taking 20-30 seconds, sometimes, to do a select count(1). However, the second time it's fast---like the index is cached---but not super fast, like the data are coming from just the metadata index.
Anybody understand what's going on, and why it has changed?
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1511553 |
+----------+
1 row in set (22.04 sec)
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1512007 |
+----------+
1 row in set (0.54 sec)
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.25-log |
+------------+
1 row in set (0.00 sec)
mysql>
I guess when you first started, you used MyISAM, and now you are using InnoDB. InnoDB just doesn't store this information. See documentation: Limits on InnoDB Tables
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool. To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. If an approximate row count is sufficient, SHOW TABLE STATUS can be used. See Section 9.5, “Optimizing for InnoDB Tables”.
So when your index is entirely in the buffer pool after the (slower) first query, the second query is fast again.
MyISAM doesn't need to care about problems that concurrent transactions might create, because it doesn't support transactions, and so select count(*) from t will just look up and return a stored value very fast.

Cardinality is reported as "1" even though there are 2 unique values stored for the corresponding column

I am trying to optimize my database by adjusting indices.
SHOW INDEXES FROM my_table
outputs
Table ... Key_name ... Column_name ... Cardinality ...
---------------------------------------------------------------------
my_table ... idx_field1 ... field1 ... 1 ...
while
SELECT field1 FROM my_table PROCEDURE ANALYSE()\G
outputs
*************************** 1. row ***************************
Field_name: my_db.my_table.field1
Min_value: ow
Max_value: rt
Min_length: 2
Max_length: 2
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 2.0000
Std: NULL
Optimal_fieldtype: ENUM('ow','rt') NOT NULL
1 row in set (0.26 sec)
i.e., the reported cardinality (1) is not equal to the number of unique values (2). Why?
PS. I did perform
analyze table my_table
before running the queries.
The "cardinality" in SHOW INDEXES is an approximation. ANALYSE() gets the exact value because it is derived from an exhaustive scan of the table.
The former is used for deciding how to optimize a query. Generally, a low cardinality (whether 1 or 2) implies that an index on that field is not worth using.
Where are you headed with this question?

MySQL show status

I have a table with a row count of 48769914. The problem is the bogus information when querying the database, i.e., the data_length. Any ideas on how to correct this misbehavior?
mysql> show table status like "events"\G
*************************** 1. row ***************************
Name: events
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 32768
Data_free: 7405043712
Auto_increment: 59816602
Create_time: 2012-06-05 05:12:37
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.88 sec)
exact count:
mysql> select count(id) from events;
+-----------+
| count(id) |
+-----------+
| 48769914 |
+-----------+
1 row in set (5 min 37.67 sec)
Update The status information looks like the table was empty. Zero rows, zero row length and basically no data in the table. How can I get MySQL to show correct estimates for that data.
InnoDB row count is not precise, because InnoDB does not keep track of records count internally, it can only estimate this by amount of allocated space in the tablespace.
See InnoDB restrictions in the manual for more information
InnoDB doesn't store row count in the table status so that isn't bogus. You solve this by running your SELECT query where you want the row count.