Get *current/actual* size of a table in bytes - mysql

I have been using a method oft seen for measuring table size using a query, e.g.:
INSERT INTO tableRecords (loadTime, dataFromDate, tableName, rowCount, sizeMB)
SELECT NOW(),
SUBDATE(CURDATE(),1),
'table_name',
COUNT(*),
(SELECT ROUND(((data_length + index_length) / 1024 / 1024), 2)
FROM information_schema.TABLES
WHERE table_schema = 'db_name' AND table_name = 'table_name')
FROM table_name
I've been running this daily for some time
However, I notice that often for days at a time the number stays the same, regardless of how many rows are added
Is there a better way to do this so that I can ensure I'm getting the current table size?
I am able to run multiple queries with the script I am using
Aware of the OPTIMIZE TABLE command but I'm unsure if it's the right thing to use or exactly how/when to use it. Is it necessary to solve my problem? I could see it taking some time to run all of the OPTIMIZE TABLE commands every day in my case (some large tables)
If that is the right way to go, should I just run this command daily prior to my INSERT?
Are there any other options?

You should know that the number reported for table size in INFORMATIONS_SCHEMA.TABLES is only an estimate -- it can be off +/- 10% or more. It may also change from time to time, when MySQL updates its statistics.
This may be why the number doesn't seem to change even as you insert lots of rows. You can force the statistics to update with ANALYZE TABLE.
Using SHOW TABLE STATUS is really just reading the same values in the INFORMATION_SCHEMA, so it has the same issue with precision.
Looking at the physical file size on disk is not accurate either, because the file can normally contain some amount of fragmentation, from past deletions or rows that don't fill up a given space in the file fully. Therefore, the real size of data is quite different from the physical size of the file.
This also means that the "size" of a table could mean different things. Is it the size of real rows of data? The size of the pages that data occupies? The size of the pages if they were defragmented? The size of the physical file on disk?
Why do you need such a precise measure of table size, anyway? Even if you could get a perfectly precise table size measurement, it would become obsolete as soon as you inserted another row. That's why an estimate is sufficient for most purposes.
As for when to use OPTIMIZE TABLE, I recommend to use it after I've done some operation that could cause significant fragmentation, like a series of many DELETE statements. Or periodically to defragment. How frequently depends on the level of activity on the table. A table with low traffic perhaps yearly would be enough. A table with a lot of inserts/updates/deletes maybe every couple of weeks. Doing it daily is likely to be overkill.
You're right that it locks your table and block activity for some time, the larger the table, the longer it needs. You can use pt-online-schema-change to allow your table restructure to happen in the background without blocking activity on the original table.
Also, MySQL 5.6 can do OPTIMIZE TABLE for InnoDB tables as online DDL, without locking.
Re your comment:
InnoDB updates statistics from time to time, but it's not impossible for it to go days between the auto-refresh of statistics. Here are some references that talk about this in more detail:
Controlling Optimizer Statistics Estimation
When Does InnoDB Update Table Statistics? (And When It Can Bite)
When are InnoDB table index statistics updated?
InnoDB Persistent Statistics at last (for MySQL 5.6)

Related

Should I perform table optimization on MySql InnoDB tables?

I have MySQL (actually - MariaDB) database with some InnoDB tables.
Records in such tables are never deleting, only inserting and updating.
Number of records are already relatively big - around few millions.
These tables, I believe, have never been optimized since creation.
Should I optimize such tables?
How to verify if I should optimize such tables?
Unfortunately, there is nearly no way to tell if you need to optimize an InnoDB table.
As you delete or update rows (I know you said you don't delete), it can leave gaps where the old version of a row was. Optimizing a table copies all the rows to a new table, with no gaps, so it accomplished "defragmentation" and probably stores the table more compactly.
You can run SHOW TABLE STATUS LIKE 'tablename'\G and get a report that shows the size of the gaps. It's reported as the column data_free in that status result. But here's the thing: data_free only includes gaps of 1MB or larger. Smaller gaps are not counted, and these might end up accounting for a lot of fragmentation. There's literally no way to report the total of the small gaps.
So should you optimize such tables? Maybe! It wouldn't be a bad thing to do it, say once a month.
But doesn't it lock the table and block queries while you optimize? Yes, sort of. That's why I use pt-online-schema-change to run a no-op change (use the --alter=force option), so it can be done any time, without blocking clients.

Is it a good idea to distribute records of one table into several multiple tables having similar table structure

I have a table with records 62 Million.
Table structure: 52 columns
Storage engine: InnoDB
Collation: utf8_general_ci
SELECT - Maximum number of operations performed
INSERT - Always in bulk but it doesn't happen always.
UPDATE - Very less number of operations but sometime much and sometime not at all
Since we are fetching in real time almost always. Is it a good idea to distribute records from this one big table in some logic into multiple similar tables in order to select record pretty much faster?
MYSQL Version: mysql Ver 15.1 Distrib 10.2.33-MariaDB
It is almost guaranteed to be slower by that technique.
Provide CREATE TABLE and the important queries.
Often a SELECT can be sped up by a composite index and/or a reformulation.
62M rows is above average, but not a scary size.
"INSERT - Always in bulk" -- Let's see your technique; there may be a way to speed it up further.
"Archiving" old data -- Actually removing the data may help some. Keeping it around, but using suitable indexes is usually fine. We need to see your queries and schema.
"Sharding" is putting parts of the data in separate servers. This is handy when you have exhausted the write capacity of a single machine.
"Replication" (Primary + Replica) allows shifting reads to another server, thereby spreading the load. With this technique, you system can handle a virtually unlimited number of SELECTs.
"Yes, indexes have been implemented" -- That may mean that you have one index per column. This is almost always not optimal.
"128GB RAM" -- If the entire dataset is much smaller than that, then most of the RAM is going unused.
"Query design" -- I had one example of a CPU that was pegged at 100%. After making the following change, the CPU dropped to 1%:
SELECT ... WHERE DATE(dt) = CURDATE();
-->
SELECT ... WHERE dt >= CURDATE();

using information_schema to find size of innodb databases

We have about 60-70 databases on an RDS server, and a lot of them can be deleted.
I want to do a benchmark of size before and after, and they are all (to my knowledge) innoDB tables.
So, I'm using the information_schema table per this link: https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/
and this is great, except the first query listed (and I presume the others) just runs and runs and eventually finishes after EIGHT MINUTES.
I can run this query instantly:
SELECT COUNT(*) FROM information_schema.TABLES;
And get about 12,500 tables.
I also notice - ironically enough - that information_schema.TABLES has no indexes! My instinct is not to mess with that.
My best option at this point is to dump the TABLES table, and run the query on a copy that I actually index.
My questions are:
1. how dynamic is the information_schema.TABLES table and in fact that entire database
2. why is it running so slow?
3. would it be advisable to index some key fields to optimize the queries I want to do?
4. If I do do an SQL dump, will I be getting current table size information?
Thanks, I hope this question is instructive.
information_schema is currently a thin layer on top of some older stuff. The older stuff needed to "open" each table to discover its size, etc. That involved reading at least the .frm. But it did not need to open in order to count the number of tables. Think of the difference between SHOW TABLES and SHOW TABLE STATUS.
table_open_cache and table_definition_cache probably did have all the tables in them when you did the 8 minute query. Anyway, the values for those VARIABLES may have been less than 12,500, implying that there would have been churn.
In the future (probably 5.8), all that info will probably be sitting in a single InnoDB table instead of splayed across the OS's file system. At that point, it will be quite fast. (Think of how fast a table scan of 12,500 rows can be done, especially if fully cached in RAM.)
Since the information_schema does not have "real" tables, there is no way to add INDEXes.
mysqldump does not provide the table size info. Even if it did, it would be no faster, since it would go through the same, old, mechanism.
60 is a questionably large number of databases; 12K is a large number of tables. Often this implies a schema design that chooses to create multiple tables instead of putting data into a single table?

MySQL Database optimization

I have a table which is frequently updated (insert/delete). I also have a script to periodically count how many records are stored in the table. How can I optimize the performance?
Do nothing: Just use the COUNT function.
Create another field to store the number of records: Whenever a new record's added, we increase that field and vice versa.
If your database's main function is storing (frequently inserting/updating), switch storage engine to InnoDB, which is faster with INSERT and UPDATE queries, but slower with reading.
Read more here, here or here.
Method #2 is pretty much the standard way of doing it (if your table is incredibly huge and COUNT is giving you performance issues). You could also store the COUNT value in a MEMORY table which would make retrieval exceedingly fast.
Increment/decrement as you see fit.
If you need accurate numbers, I would build this into the app the updates the database or use triggers to keep the counts up to date. As others have mentioned, the counts could be kept in a MEMORY table, or a Redis instance if you want performance and persistence. There are counts in the INFORMATION_SCHEMA.TABLES table, but they're not precise for InnoDB (+-20% or more).

mysql - can I determine or limit the amount of memory query is allowed to use

I am running a query that creates a temporary table however the limit is 64mb and I am not able to change the limit due to access permissions etc. When a large date range is selected the temporary table runs out of space and results in a mysql error.
Is there anyway I can determine the size or amount of memory the query will use before attempting to run the query, so I can avoid the above problem gracefully?
There's no way to limit the size of the temp table directly, except by querying for a smaller number of rows in your base SQL query.
Can you be more specific about the error you're seeing? MySQL temporary tables can exist in memory up to the lesser of tmp_table_size and max_heap_table_size. If the temp table is larger, MySQL converts it to an on-disk temp table.
This will make the temp table a lot slower than in-memory storage, but it shouldn't result in an error unless you have no space available in your temp directory.
There's also a lot of ways MySQL uses memory besides temp table storage. You can tune variables for many of these, but it's not the same as placing a limit on the memory a query uses.
The error 1114 indicates that you've run out of space. If it were an InnoDB table on disk, this probably means you have an ibdata1 file without autoextend defined for the tablespace. For a memory table, it means you're hitting the limit of max_heap_table_size.
Since you can't change max_heap_table_size, your options are to reduce the number of rows you put into the table at a time, or else use an on-disk temp table instead of in memory.
Also be careful about using the most current release of the major version of MySQL. I found bug 18160 that reports MySQL calculating table size incorrectly for heap tables (which are used for in-memory temp tables). So for example make certain you're using at least MySQL 5.0.23 or 5.1.10 to get the fix for that bug.
I'm not aware of a direct way to accomplish this, but you could use the information about the used tables provided by SHOW TABLE STATUS like for example the average row size and then calculate the number of records returned by your query using SELECT COUNT(*) .... If you need to be really save calculate the maximum size of a row by using the columns types.
Maybe it would be easier to check the number of records which can be handled and then either specify a fixed LIMIT clause or to react on SELECT COUNT(*) ....