Should I perform table optimization on MySql InnoDB tables? - mysql

I have MySQL (actually - MariaDB) database with some InnoDB tables.
Records in such tables are never deleting, only inserting and updating.
Number of records are already relatively big - around few millions.
These tables, I believe, have never been optimized since creation.
Should I optimize such tables?
How to verify if I should optimize such tables?

Unfortunately, there is nearly no way to tell if you need to optimize an InnoDB table.
As you delete or update rows (I know you said you don't delete), it can leave gaps where the old version of a row was. Optimizing a table copies all the rows to a new table, with no gaps, so it accomplished "defragmentation" and probably stores the table more compactly.
You can run SHOW TABLE STATUS LIKE 'tablename'\G and get a report that shows the size of the gaps. It's reported as the column data_free in that status result. But here's the thing: data_free only includes gaps of 1MB or larger. Smaller gaps are not counted, and these might end up accounting for a lot of fragmentation. There's literally no way to report the total of the small gaps.
So should you optimize such tables? Maybe! It wouldn't be a bad thing to do it, say once a month.
But doesn't it lock the table and block queries while you optimize? Yes, sort of. That's why I use pt-online-schema-change to run a no-op change (use the --alter=force option), so it can be done any time, without blocking clients.

Related

Duplicating tables in InnoDB, any risks of using "Create table like" to do that? Size is worrying

I have some issues with a table that's been growing over the years and is now eating up free disk space. It's only containing about 1.5M rows and yet it takes up almost 20GB. I have been deleting unwanted rows a couple of times but since it's InnoDB I couldn't reclaim any free space when doing so.
Anyway, I have been playing around on a devise using the following commands to duplicate a table:
`CREATE TABLE products_new LIKE products;
INSERT INTO products_new SELECT * FROM products;`
So far so good, it took about 200 seconds to perform that one on a dev-site with about 200k rows.
But what's worrying is the size of the new table. It's tiny compared to the original one! Did I miss anything when inserting the data? Sure, I expected to regain some data since I've been deleting rows but not this much.
size of new table vs old table
I have tried this on a couple of tables but only on dev-sites and I'm worried that I have missed anything since the difference in data size is so massive. Is this normal behavior when creating a new InnoDB table with old data?
Worth mentioning that innodb_file_per_table is ON.
It sounds like the table stats have not updated since the insert.
ANALYZE TABLE products_new;
Read more about here
OPTIMIZE TABLE products;
on the original table should reclaim the free space. Read more

Post optimization needed after deleting rows in a MYSQL Database

I have a log table that is currently 10GB. It has a lot of data for the past 2 years, and I really feel at this point I don't need so much in there. Am I wrong to assume it is not good to have years of data in a table (a smaller table is better)?
My tables all have an engine of MYISAM.
I would like to delete all data of 2014 and 2015, and soon i'll do 2016, but i'm concerned about after I run the DELETE statement, what exactly will happen. I understand because it's ISAM there is a lock that will occur where no writing can take place? I would probably delete data by the month, and do it late at night, to minimize this as it's a production DB.
My prime interest, specifically, is this: should I take some sort of action after this deletion? Do I need to manually tell MYSQL to do anything to my table, or is MYSQL going to do all the housekeeping itself, reclaiming everything, reindexing, and ultimately optimizing my table after the 400,000k records I'll be deleting.
Thanks everyone!
Plan A: Use a time-series PARTITIONing of the table so that future deletions are 'instantaneous' because of DROP PARTITION. More discussion here . Partitioning only works if you will be deleting all rows older than X.
Plan B: To avoid lengthy locking, chunk the deletes. See here . This is optionally followed by an OPTIMIZE TABLE to reclaim space.
Plan C: Simply copy over what you want to keep, then abandon the rest. This is especially good if you need to preserve only a small proportion of the table.
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT * FROM real
WHERE ... ; -- just the newer rows;
RENAME TABLE real TO old, new TO real; -- instantaneous and atomic
DROP TABLE old; -- after verifying that all went well.
Note: The .MYD file contains the data; it will never shrink. Deletes will leave holes in it. Further inserts (and opdates) will use the holes in preference to growing the table. Plans A and C (but not B) will avoid the holes, and truly free up space.
Tim and e4c5 have given some good recommendations and I urge them to add their answers.
You can run OPTIMIZE TABLE after doing the deletes. Optimize table will help you with a few things (taken from the docs):
If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the table's statistics are not up to date (and the repair could not be accomplished by sorting the index), update them.
According to the docs: http://dev.mysql.com/doc/refman/5.7/en/optimize-table.html
Use OPTIMIZE TABLE in these cases, depending on the type of table:
...
After deleting a large part of a MyISAM or ARCHIVE table, or making
many changes to a MyISAM or ARCHIVE table with variable-length rows
(tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted
rows are maintained in a linked list and subsequent INSERT operations
reuse old row positions. You can use OPTIMIZE TABLE to reclaim the
unused space and to defragment the data file. After extensive changes
to a table, this statement may also improve performance of statements
that use the table, sometimes significantly.

using information_schema to find size of innodb databases

We have about 60-70 databases on an RDS server, and a lot of them can be deleted.
I want to do a benchmark of size before and after, and they are all (to my knowledge) innoDB tables.
So, I'm using the information_schema table per this link: https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/
and this is great, except the first query listed (and I presume the others) just runs and runs and eventually finishes after EIGHT MINUTES.
I can run this query instantly:
SELECT COUNT(*) FROM information_schema.TABLES;
And get about 12,500 tables.
I also notice - ironically enough - that information_schema.TABLES has no indexes! My instinct is not to mess with that.
My best option at this point is to dump the TABLES table, and run the query on a copy that I actually index.
My questions are:
1. how dynamic is the information_schema.TABLES table and in fact that entire database
2. why is it running so slow?
3. would it be advisable to index some key fields to optimize the queries I want to do?
4. If I do do an SQL dump, will I be getting current table size information?
Thanks, I hope this question is instructive.
information_schema is currently a thin layer on top of some older stuff. The older stuff needed to "open" each table to discover its size, etc. That involved reading at least the .frm. But it did not need to open in order to count the number of tables. Think of the difference between SHOW TABLES and SHOW TABLE STATUS.
table_open_cache and table_definition_cache probably did have all the tables in them when you did the 8 minute query. Anyway, the values for those VARIABLES may have been less than 12,500, implying that there would have been churn.
In the future (probably 5.8), all that info will probably be sitting in a single InnoDB table instead of splayed across the OS's file system. At that point, it will be quite fast. (Think of how fast a table scan of 12,500 rows can be done, especially if fully cached in RAM.)
Since the information_schema does not have "real" tables, there is no way to add INDEXes.
mysqldump does not provide the table size info. Even if it did, it would be no faster, since it would go through the same, old, mechanism.
60 is a questionably large number of databases; 12K is a large number of tables. Often this implies a schema design that chooses to create multiple tables instead of putting data into a single table?

Updating MySQL Innodb Index Statistics

We have a large MySQL 5.5 database in which many rows are inserted daily and never deleted or updated. There are also users querying the live database. Tables are MyISAM.
But it is effectively impossible to run ANALYZE TABLES because it takes way too long. And so the query optimizer will often pick the wrong index. (15 hours, and sometimes crashes the tables.)
We want to try switching to all InnoDB. Will we need to run ANALYZE TABLES or not?
The MySQL docs say:
The cardinality (the number of different key values) in every index of a table
is calculated when a table is opened, at SHOW TABLE STATUS and ANALYZE TABLE and
on other circumstances (like when the table has changed too much).
But that begs the question: when is a table opened? If that means accessed during a connection then we need do nothing special. But I do not think that that is the case for InnoDB.
So what is the best approach? Run ANALYZE TABLE periodically? Perhaps with an increased dive count?
Or will it all happen automatically?
The query users use apps to get the data, so each run is a separate connection. They generally do NOT expect the rows to be up-to-date within just minutes.

is separating update column in other table will help in optimization(MySQL MyISAM table)?

MySQL MyISAM "Table1" having 70% select , 13% update and 0.67% insert statements approximate.
There is one "count_column(int)" which used to increase count with primary key.(Update statements)
Updating of "count_column" make table select queries in "Waiting for table level lock"
So separating "count_column" in other table will reduce "Waiting for table level lock" or not?
I also need separated column in select statements with join.
Thanks, Yogs
AFAIK your locking problem is the COUNT with INSERT, not the UPDATE itself - but you must have a huge bunch of SELECTs. Your question is lacking quite some details...
COUNT is really optimized on MyISAM tables, if you encounter problems with that you maybe should consider a count estimate or memory tables holding this value:-\ But an exact row count is stored for MyISAM that is extremely quick to get by the storage engine, so you maybe even slowed down MySQL with your solution. "Slow" COUNT is valid for engines like InnoDB because of their transactional nature.
One other thing to consider is, that storing a count in a column in the table itself is an additional column for each row and quite bad.
And if you are using triggers to accomplish that you should be aware of http://dev.mysql.com/doc/refman/5.0/en/faqs-triggers.html#qandaitem-B-5-1-12 :)
Moving the frequently updated cells in another table will greatly reduce number of locks on the table and speed up select on it. Converting the table to InnoDB also can help (if you are not using full-text indexes, they are still not supported in MySQL 5.5 InnoDB), since it uses row-level locks instead of table-level. If you have a lot of queries, take a look at this article about implementing efficient counters