I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
Create an index and then insert all data
Insert all data and then create an index
thanks in advance!
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.
Related
I have a table in my MySQL database with round 5M rows. Inserting rows to the table is too slow as MySQL updates index while inserting. How to stop index updating while inserting and do the indexing separately later?
Thanks
Kamrul
Sounds like your table might be over indexed. Maybe post your table definition here so we can have a look.
You have two choices:
Keep current indexes and remove unused indexes. If you have 3 indexes on a table for every single write to the table there will be 3 writes to the indexes. A index is only helpful during reads so you might want to remove unused indexes. During a load indexes will be updated which will slow down your load.
Drop you indexes before load then recreate them after load. You can drop your indexes before data load then insert and rebuild. The rebuild might take longer than the slow inserts. You will have to rebuild all indexes one by one. Also unique indexes can fail if duplicates are loaded during the load process without the indexes.
Now I suggest you take a good look at the indexes on the table and reduce them if they are not used in any queries. Then try both approaches and see what works for you. There is no way I know of in MySQL to disable indexes as they need the values insert to be written to their internal structures.
Another thing you might want to try it to split the IO over multiple drives i.e partition your table over several drives to get some hardware performance in place.
I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.
I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.
Is there any performance issues if you create an index with multiple columns, or should you do 1 index per column?
There's nothing inherently wrong with a multi-column index -- it depends completely on how you're going to query the data. If you have an index on colA+colB, it will help for queries like where colA='value' and colA='value' and colB='value' but it's not going to help for queries like colB='value'.
Advantages of MySQL Indexes
Generally speaking, MySQL indexing into database gives you three advantages:
Query optimization: Indexes make search queries much faster.
Uniqueness: Indexes like primary key index and unique index help to avoid duplicate row data.
Text searching: Full-text indexes in MySQL version 3.23.23, users have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.
Disadvantages of MySQL indexes
When an index is created on the column(s), MySQL also creates a separate file that is sorted, and contains only the field(s) you're interested in sorting on.
Firstly, the indexes take up disk space. Usually the space usage isn’t significant, but because of creating index on every column in every possible combination, the index file would grow much more quickly than the data file. In the case when a table is of large table size, the index file could reach the operating system’s maximum file size.
Secondly, the indexes slow down the speed of writing queries, such as INSERT, UPDATE and DELETE. Because MySQL has to internally maintain the “pointers” to the inserted rows in the actual data file, so there is a performance price to pay in case of above said writing queries because every time a record is changed, the indexes must be updated. However, you may be able to write your queries in such a way that do not cause the very noticeable performance degradation.
I'm doing a lot of INSERTs via LOAD DATA INFILE on MySQL 5.0. After many inserts, say a few hundred millions rows (InnoDB, PK + a non-unique index, 64 bit Linux 4GB RAM, RAID 1), the inserts slow down considerably and appear IO bound. Are partitions in MySQL 5.1 likely to improve performance if the data flows into separate partition tables?
The previous answer is erroneous in his assumptions that this will decrease performance. Quite the contrary.
Here's a lengthy, but informative article and the why and how to do partitioning in MySQL:
http://dev.mysql.com/tech-resources/articles/partitioning.html
Partitioning is typically used, as was mentioned, to group like-data together. That way, when you decided to archive off or flat out destroy a partition, your tables do not become fragmented. This, however, does not hurt performance, it can actually increase it. See, it is not just deletions that fragment, updates and inserts can also do that. By partitioning the data, you are instructing the RDBMS the criteria (indeces) by which the data should be manipulated and queried.
Edit: SiLent SoNG is correct. DISABLE / ENABLE KEYS only works for MyISAM, not InnoDB. I never knew that, but I went and read the docs. http://dev.mysql.com/doc/refman/5.1/en/alter-table.html#id1101502.
Updating any indexes may be whats slowing it down. You can disable indexes while your doing your update and turn them back on so they can be generated once for the whole table.
ALTER TABLE foo DISABLE KEYS;
LOAD DATA INFILE ... ;
ALTER TABLE ENABLE KEYS;
This will cause the indexes to all be updated in one go instead of per-row. This also leads to more balanced BTREE indexes.
No improvement on MySQL 5.6
"MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned."
http://dev.mysql.com/doc/refman/5.6/en/partitioning-pruning.html
If the columns INSERT checks (primary keys, for instance) are indexed - then this will only decrease the speed: MySQL will have to additionally decide on partitioning.
All queries are only improved by adding indexes. Partitioning is useful when you have tons of very old data (e.g. year<2000) which is rarely used: then it'll be nice to create a partition for that data.
Cheers!