My question is with regard to understanding the fate of index after a partition is dropped.
After the partition is dropped, what will happen to the index? And again if there is data inserted to the same truncated partition how will the index work?
I understand that there are only locally created indexes in mysql and only the truncated partition's index is affected
Any thoughts on the same will be greatly appreciated
Indexes will be removed too:
Important
Partitioning applies to all data and indexes of a table; you cannot partition only the data and not the indexes, or vice versa, nor can you partition only a portion of the table.
https://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html
Related
I have a table with 5 hash(key_1) partitions. I want to change that, so it instead has 5 hash(key_2) partitions, but without losing data.
How do I do this? I have searched but its hard to find confirmation that I dont lose data by deleting partitions.
Deleting, truncating, or dropping partitions will definitely lose data. You can change partitioning this with ALTER TABLE, for example ALTER TABLE t PARTITION BY HASH (key_2) PARTITIONS 5. This won't lose data, but (at least with InnoDB), the table will be locked for writes and rebuilt with the new partitioning.
I've partitioned tables in my MySQL 5.1.41 which hold very huge amount of data. Recently, I've deleted a lot of data which caused fragmentation of around 500 GB yet there is a lot of data in the partitions.
To reclaim that space to the OS, I had to de-fragment the partitions. I referred to MySQL documentation, https://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html which confused me with the following statements,
Rebuilding partitions : Rebuilds the partition; this has the same effect as dropping all records stored in the partition, then
reinserting them. This can be useful for purposes of defragmentation.
Optimizing partitions : If you have deleted a large number of rows from a partition or if you have made many changes to a partitioned
table with variable-length rows (that is, having VARCHAR, BLOB, or
TEXT columns), you can use ALTER TABLE ... OPTIMIZE PARTITION to
reclaim any unused space and to defragment the partition data file.
I tried both and observed sometimes "rebuild" happens faster and sometimes "optimize". Each partition I run these commands on, has records from millions to sometimes billions. I'm aware of what MySQL does for above each statement.
Do they need to be applied based on number of rows in the partition? If so, on how many rows I can use "optimize" and on how many I should use "rebuild"?
Also, which is better to use?
MyISAM or InnoDB? (The answer will be different.)
For MyISAM, REBUILD/REORGANIZE/OPTIMIZE will take about the same effort per partition.
For InnoDB, OPTIMIZE PARTITION rebuilds all partitions. So, don't use this if you want to do the partitions one at a time. REORGANIZE PARTITION of the partition into an identical partition definition should act only on the one partition. I recommend that.
It is generally not worth using partitioning unless you have a least a million rows. Also BY RANGE is the only form that has any performance benefits that I have found.
Perhaps the main use of partitioning is with a time-series where you want to delete "old" data. PARTITION BY RANGE with weekly or monthly partitions lets you very efficiently DROP PARTITION rather than DELETE. More in my blog.
(My answer applies to all versions through 5.7, not just your antique 5.1.)
I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.
I want to add partition to my innoDB table. I have tried to search the syntax for this, but have not found specifics.
Is this syntax wrong? :
ALTER TABLE Product PARTITION BY HASH(catetoryID1) PARTITIONS 6
SUBPARTITION BY KEY(catetoryID2) SUBPARTITIONS 10;
Does SUBPARTITIONS 10 mean each main partition has 10 subpartitions, or does it mean all main partitions have 10 subpartitions divided among them?
It's strange you didn't find the syntax. The MySQL online documentation has quite detailed syntax listed for most common operations.
Look here for overall syntax of the alter table to work with partitions:
http://dev.mysql.com/doc/refman/5.5/en/create-table.html
The syntax for partition management would remain same even when used with the alter table statement, with a few nuances that are listed on the alter table syntax pages in the MySQL docs.
To answer your first question, the problem is not your syntax but rather that you are trying sub-partition a table partitioned first by Hash partitioning - this is not allowed, at least in MySQL 5.5. Only Range or List partitions can be sub-partitioned.
Look here for a complete list of partitioning types:
http://dev.mysql.com/doc/refman/5.5/en/partitioning-types.html
As for the second question, assuming what you were trying would work, you'd be creating 6 partitions hashed by catetoryID1, and then within these you'd have 10 sub-partitions hashed by catetoryID2. So you'd have in all
6 x 10 = 60 partitions
Rules of thumb:
SUBPARTITION is useless. It provides no speed, and nothing else.
Due to various inefficiencies, don't have more than about 50 partitions.
PARTITION BY RANGE is the only useful one.
Often an INDEX can provide better performance than PARTITION; let's see your SELECT.
My blog on partitioning: http://mysql.rjweb.org/doc.php/partitionmaint
I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.