I am running some MySQL queries on a pretty large table (not on Facebook scale, but around a million rows), and I am finding them very slow. The reason, I suspect, is that I am querying on an id field, but that id has not been declared as primary key, and also no index has been declared.
I cannot set the id field to primary key, because it is not unique, although its cardinality is pretty close to 1. Under these circumstances, if I do a alter table to add an index on the id field, is it supposed to boost up the query speed, given that it is not a primary key?
And supposing it does, how long will it take for the index to develop fully so that the queries start executing quickly? I mean, the moment the prompt appears after executing the alter table, or is it that even though the prompt appears the index building will go on internally for quite some time? (I am asking before doing it because I am not sure whether declaring index on non-unique field corrupts the db or not)
Any index will speed up queries that match on the corresponding column. There's no significant difference between the primary key and other indexes in this regard.
The index is created immediately when you execute the ALTER TABLE query. When the prompt returns, the index is there and will be used. There's no corruption while this is happening.
Related
I have a table which already have a column with BTREE index on it. Now I want to add a unique key constraint to the same column to avoid race condition from my rails app.
All the reference blogs/article shows I have to add a migration to create a new uniq index on that column like below
add_index :products, :key, :string, unique: true
I want to understand
What happens to BTREE index which is already present?(I need this)
Is it OK to have both the index and they both work fine?
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
You don't need both indexes.
In MySQL's default storage engine InnoDB, a UNIQUE KEY index is also a BTREE. InnoDB only supports BTREE indexes, whether they are unique or not (it also supports fulltext indexes, but that's a different story).
So a unique index is also useful for searching and sorting, just like a non-unique index.
Building an index will lock the table. I suggest using an online schema change tool like pt-online-schema-change or gh-ost. We use the former at my company, and we run hundreds of schema changes per week on production tables without blocking access. In fact, using one of these tools might cause the change to take longer, but we don't care because we aren't suffering any limited access while it's running.
What happens to BTREE index which is already present?(I need this)
Nothing. Creating a new index does not affect existing indexes.
Is it OK to have both the index and they both work fine?
Two indices by the same expression which differs in uniqueness only? This makes no sense.
It is recommended to remove regular index when unique one is created. This will save a lot of disk space. Additionally - when regular and unique indices by the same expression (literally!) exists then server will never use regular index.
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
The table will be locked shortly at the start of the index creation process. But if index creation and parallel CUD operations are executed then both of them will be slower.
The time needed for index creation can be determined only in practice. Sometimes it cannot be even predicted.
I have a not so big table, around 2M~ rows.
Because some business rule I had to add a new reference on this table.
Right now the application is writing values but not using the column.
Now I need to update all null rows to the correct values, create a FK, and start using the column.
But this table has a lot of reads, and when I try to alter table to add the FK the table is locked and the read queries get blocked.
There is any way to speed this?
Leaving all fields in NULL values helps to speed up (since I think there will be no need to check if the values is valid)?
Creating a index before helps to speed up?
In postgres I could create a not valid FK and then validate it(which caused only row lock, not table lock), there is anything similar in MySQL?
What's taking time is building the index. A foreign key requires an index. If there is already an index on the appropriate column(s), the FK will use it. If there is no index, then adding the FK constraint implicitly builds a new index. This takes a while, and the table is locked in the meantime.
Starting in MySQL 5.6, building an index should allow concurrent read and write queries. You can try to make this explicit:
ALTER TABLE mytable ADD INDEX (col1, col2) LOCK=NONE;
If this doesn't work (like if it gives an error because it doesn't recognize the LOCK=NONE syntax), then you aren't using a version of MySQL that supports online DDL. See https://dev.mysql.com/doc/refman/5.6/en/innodb-online-ddl-operations.html
If you can't build an index or define a foreign key without locking the table, then I suggest trying the free tool pt-online-schema-change. We use this at my job, and we make many schema changes per day in production, without blocking any queries.
I have a table in a MariaDB database for which no primary key is defined. However, it has an index. I'd like to add a primary key with the same definition as that index. The naïve way might be:
alter table `foo` add primary key (`bar`, `baz`),
drop index `qux`;
...but that will take a very long time and seems wasteful. (The table is tens of gigabytes in size and is running on a machine with less free disk space than the total size of the table.) I realize an index and a primary key aren't the same thing (at the very least, the primary key includes a uniqueness constraint which must be checked during the creation process), but is there any way to use the index to “bootstrap” the primary key?
Assuming the table is ENGINE=InnoDB??...
If there is not enough free space on disk for another copy of the table, the task cannot be performed without the help of a second server. Can you drop some tables? Or otherwise free up space?
A PRIMARY KEY is UNIQUE and is an index. If the combination of bar and baz is not unique, you should not turn it into the PK.
Using a PK for looking up a single row is faster than using a secondary index. This is because it first looks up the row in the secondary index's BTree. There it finds the PRIMARY KEY, which is then used to find the row in the data's BTree.
If the table is bigger than innodb_buffer_pool_size, your change would also (in many cases) eliminate a disk hit. (Disk hits are the slowest part of database operations.)
Yes, there is currently a PRIMARY KEY on you table. It is a 6-byte hidden 'column'. Your ALTER would throw that away, thereby making the table a little smaller (another small benefit).
Do you have innodb_file_per_table=ON (or =1)? If the table is in its own .ibd file, you will recover the disk space after the operation (assuming it can run at all). With OFF, it will increase the size of the ibdata1 file, but fail to shrink it back. Have it ON when creating tables that will eventually be 'big'.
OK, there may be hope. If you are running with OFF, and there is enough space in ibdata1, then the task may complete. (But that means, as aluded to above, that you have already bloated ibdata1.)
I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.
One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.
I have a mysql table with just under 6 million rows.
I am doing a very simple query on this table, and believe I have all the correct indexes in place.
the query is
SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date
the explain returns
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE updateshows range date_idx date_idx 7 NULL 648997 Using where
so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.
The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.
Any ideas here?
Edited:
The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?
What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:
ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);
With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.
Now, hopefully you'll see the following when you explain the query:
mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;
id select_type table type possible_keys key [..] Extra
1 SIMPLE events range date_idx, indexNameHere indexNameHere Using index, Using where
Try adding a key that spans venid and date (or the other way around, or both...)
I would imagine that a 6M row table should be able to be optimised with quite normal techniques.
I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).
You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.
In any case you want to run repeated performance testing on production-grade hardware.
Try putting an index on the venid column.