SETTING ROW_FORMAT=Dynamic compress the index? [duplicate] - mysql

I know that you shouldn't rely on the values returned by InnoDB's SHOW TABLE STATUS.
In particular, the row count and avg data length.
But I thought maybe it was an accurate value taken at some point, and then innodb only refreshes it during an ANALYZE table or maybe some other infrequent event.
Instead what Im seeing is that I can run a SHOW TABLE STATUS on the same table 5 times in 5 seconds, and just get completely different numbers each time (despite the table not having any insert/delete activity in between)
Where are these values actually coming from? Are they just corrupt in innodb?

The official MySQL 5.1 documentation acknowledges that InnoDB does not give accurate statistics with SHOW TABLE STATUS. Whereas MYISAM tables specifically keep an internal cache of meta-data such as number of rows etc, the InnoDB engine stores both table data and indexes in */var/lib/mysql/ibdata**
InnoDB has no expedient index file allowing a quick query of row numbers.
Inconsistent table row numbers are reported by SHOW TABLE STATUS because InnoDB dynamically estimates the 'Rows' value by sampling a range of the table data (in */var/lib/mysql/ibdata**) and then extrapolates the approximate number of rows. So much so that the InnoDB documentation acknowledges row number inaccuracy of up to 50% when using SHOW TABLE STATUS
MySQL documentation suggests using the MySQL query cache to get consistent row number queries, but the docs don't specify how. A succinct explanation of how this can be done follows.
First, check that query caching is enabled:
mysql> SHOW VARIABLES LIKE 'have_query_cache';
If the value of have_query_cache is NO then enable the query cache by adding the following lines to /etc/my.cnf and then restart mysqld.
have_query_cache=1 # added 2017 08 24 wh
query_cache_size = 1048576
query_cache_type = 1
query_cache_limit = 1048576
(for more information see http://dev.mysql.com/doc/refman/5.1/en/query-cache.html)
Query the contents of the cache with
mysql> SHOW STATUS LIKE 'Qcache%';
Now use the SQL_CALC_FOUND_ROWS statement in a SELECT query:
SELECT SQL_CALC_FOUND_ROWS COUNT(*) FROM my_innodb_table
SQL_CALC_FOUND_ROWS will attempt a read from cache and, should this query not be found, execute the query against the specified table and then commit the number of table rows to the query cache. Additional executions of the above query (or other 'cachable' SELECT statements - see below) will consult the cache and return the correct result.
Subsequent 'cachable' SELECT queries - even if they LIMIT the result - will consult the query cache and allow you to get (once-off only) the total table row numbers with
SELECT FOUND_ROWS();
which returns the previous cached query's correct table row total.

The reasons for not keeping accurate statistics, including the row count in the table, is the multiversioning of rows InnoDB utilizes to provide transactions. What is the actual count of rows actually depends on isolation level of the transactions (as not-commited transaction may have deleted or inserted records), and different transactions can run in different isolation levels, which means that the question 'how many records there are' may be answered correctly only if there are no transactions running. So keeping a counter of the rows or data length is nearly impossible.
Read more about InnoDB restrictions

Related

My exported and re-imported sql database has more rows than original [duplicate]

Without any changes on database I got very different rows count on my tables.
What can cause this?
Server version: 5.1.63-cll
Engine: InnoDB
Unlike MyISAM tables, InnoDB tables don't keep track of how many rows the table contains.
Because of this, the only way to know the exact number of rows in an InnoDB table is to inspect each row in the table and accumulate a count. Therefore, on large InnoDB tables, performing a SELECT COUNT(*) FROM innodb_table query can be very slow because it does a full table scan to get the number of rows.
phpMyAdmin uses a query like SHOW TABLE STATUS to get an estimated count of the number of rows in the table from the engine (InnoDB). Since it's just an estimate, it varies each time you call it, sometimes the variations can be fairly large, and sometimes they are close.
Here is an informative blog post about COUNT(*) for InnoDB tables by Percona.
The MySQL manual page for SHOW TABLE STATUS states:
The number of rows. Some storage engines, such as MyISAM, store the
exact count. For other storage engines, such as InnoDB, this value is
an approximation, and may vary from the actual value by as much as 40
to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate
count.
The page on InnoDB restrictions goes into some more detail:
SHOW TABLE STATUS does not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization.
InnoDB does not keep an internal count of rows in a table because
concurrent transactions might “see” different numbers of rows at the
same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans
an index of the table, which takes some time if the index is not
entirely in the buffer pool. If your table does not change often,
using the MySQL query cache is a good solution. To get a fast count,
you have to use a counter table you create yourself and let your
application update it according to the inserts and deletes it does. If
an approximate row count is sufficient, SHOW TABLE STATUS can be used.
See Section 14.3.14.1, “InnoDB Performance Tuning Tips”.
phpMyAdmin uses a quick method to get the row count, and this method only returns an approximate count in the case of InnoDB tables.
See $cfg['MaxExactCount'] for a way to modify those results, but this could have a serious impact on performance.

MySQL: SELECT(*) results in less rows than COUNT(*) [duplicate]

Without any changes on database I got very different rows count on my tables.
What can cause this?
Server version: 5.1.63-cll
Engine: InnoDB
Unlike MyISAM tables, InnoDB tables don't keep track of how many rows the table contains.
Because of this, the only way to know the exact number of rows in an InnoDB table is to inspect each row in the table and accumulate a count. Therefore, on large InnoDB tables, performing a SELECT COUNT(*) FROM innodb_table query can be very slow because it does a full table scan to get the number of rows.
phpMyAdmin uses a query like SHOW TABLE STATUS to get an estimated count of the number of rows in the table from the engine (InnoDB). Since it's just an estimate, it varies each time you call it, sometimes the variations can be fairly large, and sometimes they are close.
Here is an informative blog post about COUNT(*) for InnoDB tables by Percona.
The MySQL manual page for SHOW TABLE STATUS states:
The number of rows. Some storage engines, such as MyISAM, store the
exact count. For other storage engines, such as InnoDB, this value is
an approximation, and may vary from the actual value by as much as 40
to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate
count.
The page on InnoDB restrictions goes into some more detail:
SHOW TABLE STATUS does not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization.
InnoDB does not keep an internal count of rows in a table because
concurrent transactions might “see” different numbers of rows at the
same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans
an index of the table, which takes some time if the index is not
entirely in the buffer pool. If your table does not change often,
using the MySQL query cache is a good solution. To get a fast count,
you have to use a counter table you create yourself and let your
application update it according to the inserts and deletes it does. If
an approximate row count is sufficient, SHOW TABLE STATUS can be used.
See Section 14.3.14.1, “InnoDB Performance Tuning Tips”.
phpMyAdmin uses a quick method to get the row count, and this method only returns an approximate count in the case of InnoDB tables.
See $cfg['MaxExactCount'] for a way to modify those results, but this could have a serious impact on performance.

Phantom MySQL rows? [duplicate]

Without any changes on database I got very different rows count on my tables.
What can cause this?
Server version: 5.1.63-cll
Engine: InnoDB
Unlike MyISAM tables, InnoDB tables don't keep track of how many rows the table contains.
Because of this, the only way to know the exact number of rows in an InnoDB table is to inspect each row in the table and accumulate a count. Therefore, on large InnoDB tables, performing a SELECT COUNT(*) FROM innodb_table query can be very slow because it does a full table scan to get the number of rows.
phpMyAdmin uses a query like SHOW TABLE STATUS to get an estimated count of the number of rows in the table from the engine (InnoDB). Since it's just an estimate, it varies each time you call it, sometimes the variations can be fairly large, and sometimes they are close.
Here is an informative blog post about COUNT(*) for InnoDB tables by Percona.
The MySQL manual page for SHOW TABLE STATUS states:
The number of rows. Some storage engines, such as MyISAM, store the
exact count. For other storage engines, such as InnoDB, this value is
an approximation, and may vary from the actual value by as much as 40
to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate
count.
The page on InnoDB restrictions goes into some more detail:
SHOW TABLE STATUS does not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization.
InnoDB does not keep an internal count of rows in a table because
concurrent transactions might “see” different numbers of rows at the
same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans
an index of the table, which takes some time if the index is not
entirely in the buffer pool. If your table does not change often,
using the MySQL query cache is a good solution. To get a fast count,
you have to use a counter table you create yourself and let your
application update it according to the inserts and deletes it does. If
an approximate row count is sufficient, SHOW TABLE STATUS can be used.
See Section 14.3.14.1, “InnoDB Performance Tuning Tips”.
phpMyAdmin uses a quick method to get the row count, and this method only returns an approximate count in the case of InnoDB tables.
See $cfg['MaxExactCount'] for a way to modify those results, but this could have a serious impact on performance.

InnoDB lock exhaustion with batched transactional DELETEs under DBI

I am using perl and DBI to perform deletes in chunks of 1000 on a very large mysql table. But I am receiving this error: DBD::mysql::db do failed: The total number of locks exceeds the lock table size.
Here is the perl code with the sql statement that performs the deletes
my $q = q{
DELETE FROM table
WHERE date_format(date, '%Y-%m') > '2015-01' LIMIT 1000
};
my $rc = '';
until ($rc eq '0E0') {
$rc = $dbh->do($q);
$dbh->commit();
}
In my experience this error has only occurred when trying to delete or insert a very large number of records all at once with one statement. In fact the viable solutions I have been able to find are:
Increase the innodb buffer pool size using the innodb_buffer_pool_size global variable.
perform the delete in chunks.
I have not tried solution 1. for two reasons. First being that it seems in my specific situation it would only increase the time before the buffer is eventually filled, though I am not sure about that, and second because we are not certain what effect it may have on the application using the database.
I would like to know:
*Why is this error occurring even though I am deleting in chunks?
*Is there a quick high level solution to this problem with perl and/or DBI?
*Any other info that could lead to a soution.
Why is this error occurring even though I am deleting in chunks?
InnoDB uses row-level locking:
14.5.8 Locks Set by Different SQL Statements in InnoDB
A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL statement. It does not matter whether there are WHERE conditions in the statement that would exclude the row. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. The locks are normally next-key locks that also block inserts into the “gap” immediately before the record.
[...]
DELETE FROM ... WHERE ... sets an exclusive next-key lock on every record the search encounters.
(emphasis added)
This means that your query will lock every row it scans, even rows that don't match the condition in your WHERE clause.
I don't know the exact execution details of your query, but I imagine that with a large table, it wouldn't be difficult to overrun the default 128 MB of innodb_buffer_pool_size (which I believe is shared by all sessions; other sessions could be locking rows at the same time as your query). Especially so if your query doesn't use indexes and triggers a table scan.
Is there a quick high level solution to this problem?
The MySQL manual describes a simple workaround for exactly this situation:
If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use DELETE at all) might be helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
No other sessions can access the tables involved while RENAME TABLE executes, so the rename operation is not subject to concurrency problems. See Section 13.1.20, “RENAME TABLE Syntax”.
Have INDEX(date)
date is of type DATETIME or DATE or TIMESTAMP
Perform the query this way:
DELETE FROM table
WHERE date > '2015-01-31'
ORDER BY date DESC
LIMIT 1000
stop when the DELETE has rows_affected == 0

Why is the estimated rows count very different in phpmyadmin results?

Without any changes on database I got very different rows count on my tables.
What can cause this?
Server version: 5.1.63-cll
Engine: InnoDB
Unlike MyISAM tables, InnoDB tables don't keep track of how many rows the table contains.
Because of this, the only way to know the exact number of rows in an InnoDB table is to inspect each row in the table and accumulate a count. Therefore, on large InnoDB tables, performing a SELECT COUNT(*) FROM innodb_table query can be very slow because it does a full table scan to get the number of rows.
phpMyAdmin uses a query like SHOW TABLE STATUS to get an estimated count of the number of rows in the table from the engine (InnoDB). Since it's just an estimate, it varies each time you call it, sometimes the variations can be fairly large, and sometimes they are close.
Here is an informative blog post about COUNT(*) for InnoDB tables by Percona.
The MySQL manual page for SHOW TABLE STATUS states:
The number of rows. Some storage engines, such as MyISAM, store the
exact count. For other storage engines, such as InnoDB, this value is
an approximation, and may vary from the actual value by as much as 40
to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate
count.
The page on InnoDB restrictions goes into some more detail:
SHOW TABLE STATUS does not give accurate statistics on InnoDB tables, except for the physical size reserved by the table. The row count is only a rough estimate used in SQL optimization.
InnoDB does not keep an internal count of rows in a table because
concurrent transactions might “see” different numbers of rows at the
same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans
an index of the table, which takes some time if the index is not
entirely in the buffer pool. If your table does not change often,
using the MySQL query cache is a good solution. To get a fast count,
you have to use a counter table you create yourself and let your
application update it according to the inserts and deletes it does. If
an approximate row count is sufficient, SHOW TABLE STATUS can be used.
See Section 14.3.14.1, “InnoDB Performance Tuning Tips”.
phpMyAdmin uses a quick method to get the row count, and this method only returns an approximate count in the case of InnoDB tables.
See $cfg['MaxExactCount'] for a way to modify those results, but this could have a serious impact on performance.