Why my mysql table has to optimize frequently - mysql

I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?

Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.

I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.

Related

optimizing mysql table with 1.5m records where most are soft deleted

I've got a MySQL table that has around 1.5 million records and the table size is 1.3GB
I am using a soft delete mechanism in that table, which means I've a column deleted_at which indicates whether the row has been deleted and when. if the record is not deleted then deleted_at value is NULL
From these 1.5 million records, only 30K are not soft deleted. that means they are accessed frequently, while the other records are barely accessed, but they are, in some cases.
So this table is heavily used and queried for the none deleted records, and sometimes for the soft deleted records.
I have a BTREE index type for the deleted_at record (with cardinality of 35K). The table becomes heavier with time and obviously it is not a scalable solution.
The table engine is MyISAM. most of the other tables are InnoDB but this table is queried heavily with STORED PROCEDURE, and when I changed to InnoDB the queries were way slower.
I am looking for a solution that will not involve hardware changes. the current hardware is sufficient for that table to have good performance, but this will not be the case once this table will grow more.
Things I thought of :
partitioning, but I cannot use partitions as some of the columns are FULL TEXT indexed.
split data into two tables. one for the deleted rows and one for none deleted rows which accessed and queried frequently. this change requires a lot of infrastructure changes so I am not in a hurry to do this.
creating a new table that will sync with the original table once in 10/20min instead of splitting and will contain only the none deleted rows. that will require small infrastructure changes as well the maintenance is much easier and safe. splitting into two table could result in missing records due to queries failures, as "DELETE" operation will actually move row from one table to another, and thus requires sophisticated mechanism
What other options do I have? can I give priority to some rows in a table with MySQL? memory wise.
I've got 10.3.20-MariaDB and 32GB of RAM
MyISAM does not cache rows, it only caches indexes. It relies on the filesystem cache for buffering rows.
So you could try to ensure at least the index is fully loaded into the cache:
Increase key_buffer_size so it's at least as large as your MyISAM indexes for this table. Use SHOW TABLE STATUS to find out the index size.
If you have multiple MyISAM tables, you might need to dedicate a key cache specifically for this table. See CACHE INDEX.
Pre-load the index into the key cache at startup. See LOAD INDEX INTO CACHE.
You may also want to consider multi-column indexes tailored to your queries. Like if you have a query WHERE user_id = 1234 AND deleted_at IS NULL, you should create an index on (user_id, deleted_at).
Which indexes you need depend on the queries you want to optimize.
Frankly, I would split the table so deleted rows are in a second table. That would reduce your table size by 98%, and that might make queries run quick enough that you don't need to use MyISAM anymore.

Create index and then insert or insert and then create index?

I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
Create an index and then insert all data
Insert all data and then create an index
thanks in advance!
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.

Mysql Xtradb build indexes by sort instead of via insertion

This post says:
If you’re running Innodb Plugin on Percona Server with XtraDB you get
benefit of a great new feature – ability to build indexes by sort
instead of via insertion
However I could not find any info on this. I'd like to have an ability to reorganize how a table is laid out physically, similar to Postgre CLUSTER command, or MyISAM "alter table ... order by". For example table "posts" has millions of rows in random insertion order, most queries use "where userid = " and I want the table to have rows belonging to one user physically separated nearby on disk, so that common queries require low IO. Is it possible with XtraDB?
Clarification concerning the blog post
The feature you are basically looking at is fast index creation. This features speeds up the creation of secondary indexes to InnoDB tables, but it is only used in very specific cases. For example the feature is not used while OPTIMIZE TABLE, which can therefore be dramatically speed up by dropping the indexes first, then run OPTIMIZE TABLE and then recreate the indexes with fast index creation (about this was the post you linked).
Some kind of automation for the cases, which can be improved by using this feature manually like above, was added to Percona Server as a system variable named expand_fast_index_creation. If activated, the server should use fast index creation not only in the very specific cases, but in all cases it might help, such as OPTIMIZE TABLE — the problem mentioned in the linked blog article.
Concerning your question
Your question was actually if it is possible to save InnoDB tables in a custom order to speed up specific kind of queries by exploiting locality on the disk.
This is not possible. InnoDB rows are saved in pages, based on the clustered index (which is essentially the primary key). The rows/pages might be in chaotic ordering, for which one can OPTIMIZE TABLE the InnoDB table. With this command the table is actually recreated in primary key order. This allows to gather primary key local rows on the same or neighboring pages.
That is all you can force InnoDB to do. You can read the manual about clustered index, another page in the manual as a definite answer that this is not possible ("ORDER BY does not make sense for InnoDB tables because InnoDB always orders table rows according to the clustered index.") and the same question on dba.stackexchange which answers might interest you.

Do mysql update queries benefit from an index?

I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.

Mysql - Index Performances

Is there any performance issues if you create an index with multiple columns, or should you do 1 index per column?
There's nothing inherently wrong with a multi-column index -- it depends completely on how you're going to query the data. If you have an index on colA+colB, it will help for queries like where colA='value' and colA='value' and colB='value' but it's not going to help for queries like colB='value'.
Advantages of MySQL Indexes
Generally speaking, MySQL indexing into database gives you three advantages:
Query optimization: Indexes make search queries much faster.
Uniqueness: Indexes like primary key index and unique index help to avoid duplicate row data.
Text searching: Full-text indexes in MySQL version 3.23.23, users have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.
Disadvantages of MySQL indexes
When an index is created on the column(s), MySQL also creates a separate file that is sorted, and contains only the field(s) you're interested in sorting on.
Firstly, the indexes take up disk space. Usually the space usage isn’t significant, but because of creating index on every column in every possible combination, the index file would grow much more quickly than the data file. In the case when a table is of large table size, the index file could reach the operating system’s maximum file size.
Secondly, the indexes slow down the speed of writing queries, such as INSERT, UPDATE and DELETE. Because MySQL has to internally maintain the “pointers” to the inserted rows in the actual data file, so there is a performance price to pay in case of above said writing queries because every time a record is changed, the indexes must be updated. However, you may be able to write your queries in such a way that do not cause the very noticeable performance degradation.