I have a table which already have a column with BTREE index on it. Now I want to add a unique key constraint to the same column to avoid race condition from my rails app.
All the reference blogs/article shows I have to add a migration to create a new uniq index on that column like below
add_index :products, :key, :string, unique: true
I want to understand
What happens to BTREE index which is already present?(I need this)
Is it OK to have both the index and they both work fine?
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
You don't need both indexes.
In MySQL's default storage engine InnoDB, a UNIQUE KEY index is also a BTREE. InnoDB only supports BTREE indexes, whether they are unique or not (it also supports fulltext indexes, but that's a different story).
So a unique index is also useful for searching and sorting, just like a non-unique index.
Building an index will lock the table. I suggest using an online schema change tool like pt-online-schema-change or gh-ost. We use the former at my company, and we run hundreds of schema changes per week on production tables without blocking access. In fact, using one of these tools might cause the change to take longer, but we don't care because we aren't suffering any limited access while it's running.
What happens to BTREE index which is already present?(I need this)
Nothing. Creating a new index does not affect existing indexes.
Is it OK to have both the index and they both work fine?
Two indices by the same expression which differs in uniqueness only? This makes no sense.
It is recommended to remove regular index when unique one is created. This will save a lot of disk space. Additionally - when regular and unique indices by the same expression (literally!) exists then server will never use regular index.
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
The table will be locked shortly at the start of the index creation process. But if index creation and parallel CUD operations are executed then both of them will be slower.
The time needed for index creation can be determined only in practice. Sometimes it cannot be even predicted.
Related
I have a not so big table, around 2M~ rows.
Because some business rule I had to add a new reference on this table.
Right now the application is writing values but not using the column.
Now I need to update all null rows to the correct values, create a FK, and start using the column.
But this table has a lot of reads, and when I try to alter table to add the FK the table is locked and the read queries get blocked.
There is any way to speed this?
Leaving all fields in NULL values helps to speed up (since I think there will be no need to check if the values is valid)?
Creating a index before helps to speed up?
In postgres I could create a not valid FK and then validate it(which caused only row lock, not table lock), there is anything similar in MySQL?
What's taking time is building the index. A foreign key requires an index. If there is already an index on the appropriate column(s), the FK will use it. If there is no index, then adding the FK constraint implicitly builds a new index. This takes a while, and the table is locked in the meantime.
Starting in MySQL 5.6, building an index should allow concurrent read and write queries. You can try to make this explicit:
ALTER TABLE mytable ADD INDEX (col1, col2) LOCK=NONE;
If this doesn't work (like if it gives an error because it doesn't recognize the LOCK=NONE syntax), then you aren't using a version of MySQL that supports online DDL. See https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html
If you can't build an index or define a foreign key without locking the table, then I suggest trying the free tool pt-online-schema-change. We use this at my job, and we make many schema changes per day in production, without blocking any queries.
I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
Create an index and then insert all data
Insert all data and then create an index
thanks in advance!
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.
I'm in the process of moving an sql server database to mariadb.
In that i'm now doing the index naming, and have to modify some names because they are longer than 64 chars.
That got me wondering, do in mariadb the indexes get stored on the table level or on the database level like on sql server?
To rephrase the question in another way, do index name need to be unique per database or per table?
The storage engine I'm using is innoDB
Index names (in MySQL) are almost useless. About the only use is for DROP INDEX, which is rarely done. So, I recommend spending very little time on naming indexes. The names only need to be unique within the table.
The PRIMARY KEY (which has no other name than that) is "clustered" with the data. That is, the PK and the data are in the same BTree.
Each secondary key is a separate BTree. The BTree is sorted according to the column(s) specified. The leaf node 'records' contain the columns of the PK, thereby providing a way to get to the actual record.
FULLTEXT and SPATIAL indexes work differently.
PARTITIONing... First of all, partitioning is rarely useful. But if you have any partitioned tables, then here are some details about indexes. A Partitioned table is essentially a collection of sub-tables, each identical (including index names). There is no "global index" across the table; each index for a sub-table refers only to the sub-table.
Keys belong to a table, not a database.
I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.
I am running some MySQL queries on a pretty large table (not on Facebook scale, but around a million rows), and I am finding them very slow. The reason, I suspect, is that I am querying on an id field, but that id has not been declared as primary key, and also no index has been declared.
I cannot set the id field to primary key, because it is not unique, although its cardinality is pretty close to 1. Under these circumstances, if I do a alter table to add an index on the id field, is it supposed to boost up the query speed, given that it is not a primary key?
And supposing it does, how long will it take for the index to develop fully so that the queries start executing quickly? I mean, the moment the prompt appears after executing the alter table, or is it that even though the prompt appears the index building will go on internally for quite some time? (I am asking before doing it because I am not sure whether declaring index on non-unique field corrupts the db or not)
Any index will speed up queries that match on the corresponding column. There's no significant difference between the primary key and other indexes in this regard.
The index is created immediately when you execute the ALTER TABLE query. When the prompt returns, the index is there and will be used. There's no corruption while this is happening.