Mariadb table defragmentation using OPTIMIZE - mysql

We are running MariaDB v 10.1.30, testing a script to run database maintenance script for defragmenting tables and rebuilding indexes using OPTIMIZE TABLE command by using the new 10.1.1 patch of setting innodb_defragment = 1.
I've tested Alter Table with Alogorithm = INPLACE, works fine but I'm trying to make use of innodb_defragment and use optimize to avoid creating temp files when the tables are being rebuilt as done by Alter table INPLACE algorithm.
On using Optimize, there are no temp tables created however the table gets locked not allowing concurrent connections which is not the case with Alter Table with Alogorithm = INPLACE, the documentation however mentions that the optimize is done using INPLACE algorithm.
https://mariadb.org/defragmenting-unused-space-on-innodb-tablespace/
Is this a bug or am i missing something here, please advise.

The benefit for speed is virtually nil.
A "point query" (where you have the key and can go directly to the row) depends on the depth of the BTree. For a million rows, the depth will be about 3. For a trillion rows, about 6. Optimizing a table is very unlikely to shrink the depth.
A "range scan" (BETWEEN, >, etc) walks across a block, looking at each row. Then it hops (via a link) to the next block until it has found all the rows needed. Sure, you will touch more blocks in an un-optimized table, but the bulk of the effort is in accessing each row.
The benefit for space is limited.
An INSERT may add to a non-full block or it may split a full block into two half-full blocks. Later, two adjacent, somewhat empty, blocks will be merged together. Hence, a BTree naturally gravitates toward a state where the average block is 69% full. That is, the benefit of OPTIMIZE TABLE for space is limited.
Phrased differently, OPTIMIZE might shrink the disk footprint for a table to only 69% of what it was, but subsequent operations will just grow the table again.
If you are using innodb_file_per_table=OFF, then OPTIMIZE cannot return the free blocks to the Operating system. Such blocks can be reused for future INSERTs.
OPTIMIZE TABLE is invasive.
It copies the table over, locking it during the process. This is unacceptable to sites that need 100% uptime.
If you are using replication, subsequent writes may stack up behind the OPTIMIZE, thereby making the Slave not up-to-the-second.
Big DELETEs
After deleting lots of rows, there may be benefit to OPTIMIZE, but check the 69% estimate.
If big deletes are a common occurrence, perhaps there are other things that you should be doing. See http://mysql.rjweb.org/doc.php/deletebig
History and internals
Old version did OPTIMIZE in a straightforward way: Create a new table (same schema); copy rows into it: rename table; drop. Writes could not be allowed.
ALGORITHM=INPLACE probably locks a few blocks, combines them to fill up one block, then slides forward. This requires some degree of locking. Based on the Question, it sounds like it simply locks the whole table.
Note that each BTree (the PK+Data, or a secondary index), could be 'optimized' independently. But no command allows for doing such for just the main BTree (PK+data). Optimizing a single secondary index can be done by DROP INDEX + ADD INDEX, but that loses the index. Instead, consider do a NOCOPY ADD INDEX, then INSTANT DROP INDEX. Caution: This could impact USE_INDEX or FORCE INDEX if you are using such.
(Caveat: This Answer applies to InnoDB, not MyISAM.)

Related

How to get statistics on time MySQL spent updating indexes during a new row insertion

I'm trying to figure out how multiple indexes are actually affecting insertion performance for MySQL InnoDB tables.
Is it possible to get information about index update times using performance_schema?
It seems like there are no instruments for stages that may reflect such information.
Even if there is something in performance_schema, it would be incomplete.
Non-UNIQUE secondary indexes are handled thus:
An INSERT starts.
Any UNIQUE indexes (including the PRIMARY KEY) are immediately checked for "dup key".
Other index changes are put into the "change buffer".
The INSERT returns to the client.
The Change Buffer is a portion of the buffer_pool (default: 25%) where such index modifications are held. Eventually, they will be batched up for updating the actual blocks of the index's BTree.
In a good situation, many index updates will be combined into very few read-modify-write steps to update a block. In a poor case, each index update requires a separate read and write.
The I/O for the change buffer is done 'in the background' as is the eventual write of any changes to data blocks. These cannot be realistically monitored in any way -- especially if there are different clients with different queries contributing to the same index or data blocks being updated.
Oh, meanwhile, any index lookups need to look both in the on-disk (or cached in buffer_pool) blocks and the change buffer. This makes an index lookup faster or slower, depending on various things unrelated to the operation in hand.

Does a lot of writing/inserting affect database indexes?

Does a database have to rebuild its indexes every time a new row is inserted?
And by that token, wouldn't it mean if I was inserting alot, the index would be being rebuilt constantly and therefore less effective/useless for querying?
I'm trying to understand some of this database theory for better database design.
Updates definitely don't require rebuilding the entire index every time you update it (likewise insert and delete).
There's a little bit of overhead to updating entries in an index, but it's reasonably low cost. Most indexes are stored internally as a B+Tree data structure. This data structure was chosen because it allows easy modification.
MySQL also has a further optimization called the Change Buffer. This buffer helps reduce the performance cost of updating indexes by caching changes. That is, you do an INSERT/UPDATE/DELETE that affects an index, and the type of change is recorded in the Change Buffer. The next time you read that index with a query, MySQL reads the Change Buffer as a kind of supplement to the full index.
A good analogy for this might be a published document that periodically publishes "errata" so you need to read both the document and the errata together to understand the current state of the document.
Eventually, the entries in the Change Buffer are gradually merged into the index. This is analogous to the errata being edited into the document for the next time the document is reprinted.
The Change Buffer is used only for secondary indexes. It doesn't do anything for primary key or unique key indexes. Updates to unique indexes can't be deferred, but they still use the B+Tree so they're not so costly.
If you do OPTIMIZE TABLE or some types of ALTER TABLE changes that can't be done in-place, MySQL does rebuild the indexes from scratch. This can be useful to defragment an index after you delete a lot of the table, for example.
Yes, inserting affects them, but it's not as bad as you seem to think. Like most entities in relational databases, indexes are usually created and maintained with an extra amount of space to accommodate for growth, and usually set up to increase that extra amount automatically when index space is nearly exhausted.
Rebuilding the index starts from scratch, and is different from adding entries to the index. Inserting a new row does not result in the rebuild of an index. The new entry gets added in the extra space mentioned above, except for clustered indexes which operate a little differently.
Most DB administrators also do a task called "updating statistics," which updates an internal set of statistics used by the query planner to come up with good query strategies. That task, performed as part of maintenance, also helps keep the query optimizer "in tune" with the current state of indexes.
There are enormous numbers of high-quality references on how databases work, both independent sites and those of the publishers of major databases. You literally can make a career out of becoming a database expert. But don't worry too much about your inserts causing troubles. ;) If in doubt, speak to your DBA if you have one.
Does that help address your concerns?

Inserting New Column in MYSQL taking too long

We have a huge database and inserting a new column is taking too long. Anyway to speed up things?
Unfortunately, there's probably not much you can do. When inserting a new column, MySQL makes a copy of the table and inserts the new data there. You may find it faster to do
CREATE TABLE new_table LIKE old_table;
ALTER TABLE new_table ADD COLUMN (column definition);
INSERT INTO new_table(old columns) SELECT * FROM old_table;
RENAME table old_table TO tmp, new_table TO old_table;
DROP TABLE tmp;
This hasn't been my experience, but I've heard others have had success. You could also try disabling indices on new_table before the insert and re-enabling later. Note that in this case, you need to be careful not to lose any data which may be inserted into old_table during the transition.
Alternatively, if your concern is impacting users during the change, check out pt-online-schema-change which makes clever use of triggers to execute ALTER TABLE statements while keeping the table being modified available. (Note that this won't speed up the process however.)
There are four main things that you can do to make this faster:
If using innodb_file_per_table the original table may be highly fragmented in the filesystem, so you can try defragmenting it first.
Make the buffer pool as big as sensible, so more of the data, particularly the secondary indexes, fits in it.
Make innodb_io_capacity high enough, perhaps higher than usual, so that insert buffer merging and flushing of modified pages will happen more quickly. Requires MySQL 5.1 with InnoDB plugin or 5.5 and later.
MySQL 5.1 with InnoDB plugin and MySQL 5.5 and later support fast alter table. One of the things that makes a lot faster is adding or rebuilding indexes that are both not unique and not in a foreign key. So you can do this:
A. ALTER TABLE ADD your column, DROP your non-unique indexes that aren't in FKs.
B. ALTER TABLE ADD back your non-unique, non-FK indexes.
This should provide these benefits:
a. Less use of the buffer pool during step A because the buffer pool will only need to hold some of the indexes, the ones that are unique or in FKs. Indexes are randomly updated during this step so performance becomes much worse if they don't fully fit in the buffer pool. So more chance of your rebuild staying fast.
b. The fast alter table rebuilds the index by sorting the entries then building the index. This is faster and also produces an index with a higher page fill factor, so it'll be smaller and faster to start with.
The main disadvantage is that this is in two steps and after the first one you won't have some indexes that may be required for good performance. If that is a problem you can try the copy to a new table approach, using just the unique and FK indexes at first for the new table, then adding the non-unique ones later.
It's only in MySQL 5.6 but the feature request in http://bugs.mysql.com/bug.php?id=59214 increases the speed with which insert buffer changes are flushed to disk and limits how much space it can take in the buffer pool. This can be a performance limit for big jobs. the insert buffer is used to cache changes to secondary index pages.
We know that this is still frustratingly slow sometimes and that a true online alter table is very highly desirable
This is my personal opinion. For an official Oracle view, contact an Oracle public relations person.
James Day, MySQL Senior Principal Support Engineer, Oracle
usually new line insert means that there are many indexes.. so I would suggest reconsidering indexing.
Michael's solution may speed things up a bit, but perhaps you should have a look at the database and try to break the big table into smaller ones. Take a look at this: link. Normalizing your database tables may save you loads of time in the future.

MySQL: add a field to a large table

i have a table with about 200,000 records. i want to add a field to it:
ALTER TABLE `table` ADD `param_21` BOOL NOT NULL COMMENT 'about the field' AFTER `param_20`
but it seems a very heavy query and it takes a very long time, even on my Quad amd PC with 4GB of RAM.
i am running under windows/xampp and phpMyAdmin.
does mysql have a business with every record when adding a field?
or can i change the query so it makes the change more quickly?
MySQL will, in almost all cases, rebuild the table during an ALTER**. This is because the row-based engines (i.e. all of them) HAVE to do this to retain the data in the right format for querying. It's also because there are many other changes you could make which would also require rebuilding the table (such as changing indexes, primary keys etc)
I don't know what engine you're using, but I will assume MyISAM. MyISAM copies the data file, making any necessary format changes - this is relatively quick and is not likely to take much longer than the IO hardware can get the old datafile in and the new on out to disc.
Rebuilding the indexes is really the killer. Depending on how you have it configured, MySQL will either: for each index, put the indexed columns into a filesort buffer (which may be in memory but is typically on disc), sort that using its filesort() function (which does a quicksort by recursively copying the data between two files, if it's too big for memory) and then build the entire index based on the sorted data.
If it can't do the filesort trick, it will just behave as if you did an INSERT on every row, and populate the index blocks with each row's data in turn. This is painfully slow and results in far from optimal indexes.
You can tell which it's doing by using SHOW PROCESSLIST during the process. "Repairing by filesort" is good, "Repairing with keycache" is bad.
All of this will use AT MOST one core, but will sometimes be IO bound as well (especially copying the data file).
** There are some exceptions, such as dropping secondary indexes on innodb plugin tables.
You add a NOT NULL column, the tuples need to be populated. So it will be slow...
This touches each of 200.000 records, as each record needs to be updated with a new bool value which is not going to be null.
So; yes it's an expensive query... There is nothing you can do to make it faster.

Oracle performance of schema changes as compared to MySQL ALTER TABLE?

When using MySQL MyISAM tables, and issuing an ALTER TABLE statement to add a column, MySQL creates a temporary table and copies all the data into the new table before overwriting the original table.
If that table has a lot of data, this process can be very slow (especially when rebuilding indexes), and requires you to have enough free space on the disk to store 2 copies of the table. This is very annoying.
How does Oracle work when adding columns? Is it fast on large tables?
I'm always interested in being able to do schema changes without having a lot of downtime. We are always adding new features to our software which require schema changes with every release. Any advice is appreciated...
Adding a column with no data to a large table in Oracle is generally very fast. There is no temporary copy of the data and no need to rebuild indexes. Slowness will generally arise when you want to add a column to a large table and backfill data into that new column for all the existing rows, since now you're talking about an UPDATE statement that affects a large number of rows.
Adding columns can lead to row migration over time. If you have a block that is 80% full with 4 rows and you add columns that will grow the size of each row 30% over time, you'll eventually reach a point where Oracle has to move one of the 4 rows to a different block. It does this by leaving a pointer to the new block in the old block, which causes reads on that migrated row to require more I/O. Eliminating migrated rows can be somewhat costly, and though it is generally possible to do without downtime assuming you're using the enterprise edition, it is generally easier if you have a bit of downtime. But row migration is something that you generally only have to worry about well down the road. If you know that certain tables are likely to have their row size increase substantially in the future, you can mitigate problems in advance by specifying a larger PCTFREE setting for the table.
With regard to downtime, altering a table (and a bunch of other DDL operations) take an exclusive lock. However Oracle can also perform online redefinition of oobjects using the DBMS_REDEFINITION package, which can really take a bite out of downtime.