using information_schema to find size of innodb databases - mysql

We have about 60-70 databases on an RDS server, and a lot of them can be deleted.
I want to do a benchmark of size before and after, and they are all (to my knowledge) innoDB tables.
So, I'm using the information_schema table per this link: https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/
and this is great, except the first query listed (and I presume the others) just runs and runs and eventually finishes after EIGHT MINUTES.
I can run this query instantly:
SELECT COUNT(*) FROM information_schema.TABLES;
And get about 12,500 tables.
I also notice - ironically enough - that information_schema.TABLES has no indexes! My instinct is not to mess with that.
My best option at this point is to dump the TABLES table, and run the query on a copy that I actually index.
My questions are:
1. how dynamic is the information_schema.TABLES table and in fact that entire database
2. why is it running so slow?
3. would it be advisable to index some key fields to optimize the queries I want to do?
4. If I do do an SQL dump, will I be getting current table size information?
Thanks, I hope this question is instructive.

information_schema is currently a thin layer on top of some older stuff. The older stuff needed to "open" each table to discover its size, etc. That involved reading at least the .frm. But it did not need to open in order to count the number of tables. Think of the difference between SHOW TABLES and SHOW TABLE STATUS.
table_open_cache and table_definition_cache probably did have all the tables in them when you did the 8 minute query. Anyway, the values for those VARIABLES may have been less than 12,500, implying that there would have been churn.
In the future (probably 5.8), all that info will probably be sitting in a single InnoDB table instead of splayed across the OS's file system. At that point, it will be quite fast. (Think of how fast a table scan of 12,500 rows can be done, especially if fully cached in RAM.)
Since the information_schema does not have "real" tables, there is no way to add INDEXes.
mysqldump does not provide the table size info. Even if it did, it would be no faster, since it would go through the same, old, mechanism.
60 is a questionably large number of databases; 12K is a large number of tables. Often this implies a schema design that chooses to create multiple tables instead of putting data into a single table?

Related

Should I perform table optimization on MySql InnoDB tables?

I have MySQL (actually - MariaDB) database with some InnoDB tables.
Records in such tables are never deleting, only inserting and updating.
Number of records are already relatively big - around few millions.
These tables, I believe, have never been optimized since creation.
Should I optimize such tables?
How to verify if I should optimize such tables?
Unfortunately, there is nearly no way to tell if you need to optimize an InnoDB table.
As you delete or update rows (I know you said you don't delete), it can leave gaps where the old version of a row was. Optimizing a table copies all the rows to a new table, with no gaps, so it accomplished "defragmentation" and probably stores the table more compactly.
You can run SHOW TABLE STATUS LIKE 'tablename'\G and get a report that shows the size of the gaps. It's reported as the column data_free in that status result. But here's the thing: data_free only includes gaps of 1MB or larger. Smaller gaps are not counted, and these might end up accounting for a lot of fragmentation. There's literally no way to report the total of the small gaps.
So should you optimize such tables? Maybe! It wouldn't be a bad thing to do it, say once a month.
But doesn't it lock the table and block queries while you optimize? Yes, sort of. That's why I use pt-online-schema-change to run a no-op change (use the --alter=force option), so it can be done any time, without blocking clients.

Is it a good idea to distribute records of one table into several multiple tables having similar table structure

I have a table with records 62 Million.
Table structure: 52 columns
Storage engine: InnoDB
Collation: utf8_general_ci
SELECT - Maximum number of operations performed
INSERT - Always in bulk but it doesn't happen always.
UPDATE - Very less number of operations but sometime much and sometime not at all
Since we are fetching in real time almost always. Is it a good idea to distribute records from this one big table in some logic into multiple similar tables in order to select record pretty much faster?
MYSQL Version: mysql Ver 15.1 Distrib 10.2.33-MariaDB
It is almost guaranteed to be slower by that technique.
Provide CREATE TABLE and the important queries.
Often a SELECT can be sped up by a composite index and/or a reformulation.
62M rows is above average, but not a scary size.
"INSERT - Always in bulk" -- Let's see your technique; there may be a way to speed it up further.
"Archiving" old data -- Actually removing the data may help some. Keeping it around, but using suitable indexes is usually fine. We need to see your queries and schema.
"Sharding" is putting parts of the data in separate servers. This is handy when you have exhausted the write capacity of a single machine.
"Replication" (Primary + Replica) allows shifting reads to another server, thereby spreading the load. With this technique, you system can handle a virtually unlimited number of SELECTs.
"Yes, indexes have been implemented" -- That may mean that you have one index per column. This is almost always not optimal.
"128GB RAM" -- If the entire dataset is much smaller than that, then most of the RAM is going unused.
"Query design" -- I had one example of a CPU that was pegged at 100%. After making the following change, the CPU dropped to 1%:
SELECT ... WHERE DATE(dt) = CURDATE();
-->
SELECT ... WHERE dt >= CURDATE();

Updating MySQL Innodb Index Statistics

We have a large MySQL 5.5 database in which many rows are inserted daily and never deleted or updated. There are also users querying the live database. Tables are MyISAM.
But it is effectively impossible to run ANALYZE TABLES because it takes way too long. And so the query optimizer will often pick the wrong index. (15 hours, and sometimes crashes the tables.)
We want to try switching to all InnoDB. Will we need to run ANALYZE TABLES or not?
The MySQL docs say:
The cardinality (the number of different key values) in every index of a table
is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and
on other circumstances (like when the table has changed too much).
But that begs the question: when is a table opened? If that means accessed during a connection then we need do nothing special. But I do not think that that is the case for InnoDB.
So what is the best approach? Run ANALYZE TABLE periodically? Perhaps with an increased dive count?
Or will it all happen automatically?
The query users use apps to get the data, so each run is a separate connection. They generally do NOT expect the rows to be up-to-date within just minutes.

Get *current/actual* size of a table in bytes

I have been using a method oft seen for measuring table size using a query, e.g.:
INSERT INTO tableRecords (loadTime, dataFromDate, tableName, rowCount, sizeMB)
SELECT NOW(),
SUBDATE(CURDATE(),1),
'table_name',
COUNT(*),
(SELECT ROUND(((data_length + index_length) / 1024 / 1024), 2)
FROM information_schema.TABLES
WHERE table_schema = 'db_name' AND table_name = 'table_name')
FROM table_name
I've been running this daily for some time
However, I notice that often for days at a time the number stays the same, regardless of how many rows are added
Is there a better way to do this so that I can ensure I'm getting the current table size?
I am able to run multiple queries with the script I am using
Aware of the OPTIMIZE TABLE command but I'm unsure if it's the right thing to use or exactly how/when to use it. Is it necessary to solve my problem? I could see it taking some time to run all of the OPTIMIZE TABLE commands every day in my case (some large tables)
If that is the right way to go, should I just run this command daily prior to my INSERT?
Are there any other options?
You should know that the number reported for table size in INFORMATIONS_SCHEMA.TABLES is only an estimate -- it can be off +/- 10% or more. It may also change from time to time, when MySQL updates its statistics.
This may be why the number doesn't seem to change even as you insert lots of rows. You can force the statistics to update with ANALYZE TABLE.
Using SHOW TABLE STATUS is really just reading the same values in the INFORMATION_SCHEMA, so it has the same issue with precision.
Looking at the physical file size on disk is not accurate either, because the file can normally contain some amount of fragmentation, from past deletions or rows that don't fill up a given space in the file fully. Therefore, the real size of data is quite different from the physical size of the file.
This also means that the "size" of a table could mean different things. Is it the size of real rows of data? The size of the pages that data occupies? The size of the pages if they were defragmented? The size of the physical file on disk?
Why do you need such a precise measure of table size, anyway? Even if you could get a perfectly precise table size measurement, it would become obsolete as soon as you inserted another row. That's why an estimate is sufficient for most purposes.
As for when to use OPTIMIZE TABLE, I recommend to use it after I've done some operation that could cause significant fragmentation, like a series of many DELETE statements. Or periodically to defragment. How frequently depends on the level of activity on the table. A table with low traffic perhaps yearly would be enough. A table with a lot of inserts/updates/deletes maybe every couple of weeks. Doing it daily is likely to be overkill.
You're right that it locks your table and block activity for some time, the larger the table, the longer it needs. You can use pt-online-schema-change to allow your table restructure to happen in the background without blocking activity on the original table.
Also, MySQL 5.6 can do OPTIMIZE TABLE for InnoDB tables as online DDL, without locking.
Re your comment:
InnoDB updates statistics from time to time, but it's not impossible for it to go days between the auto-refresh of statistics. Here are some references that talk about this in more detail:
Controlling Optimizer Statistics Estimation
When Does InnoDB Update Table Statistics? (And When It Can Bite)
When are InnoDB table index statistics updated?
InnoDB Persistent Statistics at last (for MySQL 5.6)

MySQL ALTER TABLE on very large table - is it safe to run it?

I have a MySQL database with a MyISAM table with 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then alter the table like this:
ALTER TABLE x ORDER BY PK DESC
I order the table by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory). Three times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.
Can a 512MB server cope with that alter statement on such a large table? I have read that a temporary table is created to perform the ALTER TABLE command.
Question: Can this alter command be safely run? What should be the expected time for the alteration of the table?
As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.
The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.
On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.
A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.
What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.
I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:
Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.
There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).
is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.
(unless you have lots of deletes)
I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.
If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480
MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.