Promote index to primary key - mysql

I have a table in a MariaDB database for which no primary key is defined. However, it has an index. I'd like to add a primary key with the same definition as that index. The naïve way might be:
alter table `foo` add primary key (`bar`, `baz`),
drop index `qux`;
...but that will take a very long time and seems wasteful. (The table is tens of gigabytes in size and is running on a machine with less free disk space than the total size of the table.) I realize an index and a primary key aren't the same thing (at the very least, the primary key includes a uniqueness constraint which must be checked during the creation process), but is there any way to use the index to “bootstrap” the primary key?

Assuming the table is ENGINE=InnoDB??...
If there is not enough free space on disk for another copy of the table, the task cannot be performed without the help of a second server. Can you drop some tables? Or otherwise free up space?
A PRIMARY KEY is UNIQUE and is an index. If the combination of bar and baz is not unique, you should not turn it into the PK.
Using a PK for looking up a single row is faster than using a secondary index. This is because it first looks up the row in the secondary index's BTree. There it finds the PRIMARY KEY, which is then used to find the row in the data's BTree.
If the table is bigger than innodb_buffer_pool_size, your change would also (in many cases) eliminate a disk hit. (Disk hits are the slowest part of database operations.)
Yes, there is currently a PRIMARY KEY on you table. It is a 6-byte hidden 'column'. Your ALTER would throw that away, thereby making the table a little smaller (another small benefit).
Do you have innodb_file_per_table=ON (or =1)? If the table is in its own .ibd file, you will recover the disk space after the operation (assuming it can run at all). With OFF, it will increase the size of the ibdata1 file, but fail to shrink it back. Have it ON when creating tables that will eventually be 'big'.
OK, there may be hope. If you are running with OFF, and there is enough space in ibdata1, then the task may complete. (But that means, as aluded to above, that you have already bloated ibdata1.)

Related

Multiple index on same column

I have a table which already have a column with BTREE index on it. Now I want to add a unique key constraint to the same column to avoid race condition from my rails app.
All the reference blogs/article shows I have to add a migration to create a new uniq index on that column like below
add_index :products, :key, :string, unique: true
I want to understand
What happens to BTREE index which is already present?(I need this)
Is it OK to have both the index and they both work fine?
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
You don't need both indexes.
In MySQL's default storage engine InnoDB, a UNIQUE KEY index is also a BTREE. InnoDB only supports BTREE indexes, whether they are unique or not (it also supports fulltext indexes, but that's a different story).
So a unique index is also useful for searching and sorting, just like a non-unique index.
Building an index will lock the table. I suggest using an online schema change tool like pt-online-schema-change or gh-ost. We use the former at my company, and we run hundreds of schema changes per week on production tables without blocking access. In fact, using one of these tools might cause the change to take longer, but we don't care because we aren't suffering any limited access while it's running.
What happens to BTREE index which is already present?(I need this)
Nothing. Creating a new index does not affect existing indexes.
Is it OK to have both the index and they both work fine?
Two indices by the same expression which differs in uniqueness only? This makes no sense.
It is recommended to remove regular index when unique one is created. This will save a lot of disk space. Additionally - when regular and unique indices by the same expression (literally!) exists then server will never use regular index.
Table has around 30MN entries, will it locks the table while adding index and take huge time to add this UNIQUE index?
The table will be locked shortly at the start of the index creation process. But if index creation and parallel CUD operations are executed then both of them will be slower.
The time needed for index creation can be determined only in practice. Sometimes it cannot be even predicted.

Mysql - estimate time to drop index

We have a fairly unoptimized table with the following definition:
CREATE TABLE `Usage` (
`TxnDate` varchar(30) DEFAULT NULL,
`TxnID` decimal(13,2) NOT NULL,
`UserID2015` varchar(20) DEFAULT NULL,
`UserRMN` decimal(13,0) DEFAULT NULL,
`CustomerNo` decimal(13,0) DEFAULT NULL,
`OperatorName` varchar(50) DEFAULT NULL,
`AggregatorName` varchar(30) DEFAULT NULL,
`TransAmount` decimal(10,2) DEFAULT NULL,
`MMPLTxnID` decimal(13,0) DEFAULT NULL,
`ProductType` varchar(30) DEFAULT NULL,
`YearMonthRMN` varchar(50) DEFAULT NULL,
PRIMARY KEY (`TxnID`),
UNIQUE KEY `TxnID` (`TxnID`) USING BTREE,
KEY `TxnDate` (`TxnDate`),
KEY `OperatorName` (`OperatorName`),
KEY `AggregatorName` (`AggregatorName`),
KEY `MMPLTxnID` (`MMPLTxnID`),
KEY `ProductType` (`ProductType`),
KEY `UserRMN` (`UserRMN`),
KEY `YearMonthRMN` (`YearMonthRMN`) USING BTREE,
KEY `CustomerNo` (`CustomerNo`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The table has abotu 170M records.
I want to drop the primary key and instead add an auto number primary key. So far the index dropping has taken 2h.
Why is it taking so long to remove an index, is there any sorting happening?
How can I estimate the time to drop the index?
When I add the autonumber, will I have to estimate time for sorting the table or will this not be necessary with a new autonumber index?
You're not just dropping an index, you're dropping the primary key.
Normally, InnoDB tables are stored as a clustered index based on the primary key, so by dropping the primary key, it has to create a new table that uses either the secondary unique key or else an auto-generated key for its clustered index.
I've done a fair amount of MySQL consulting, and the question of "how much time will this take?" is a common question.
It takes as long as it takes to build a new clustered index on your server. This is hard to predict. It depends on several factors, like how fast your server's CPUs are, how fast your storage is, and how much other load is going on concurrently, competing for CPU and I/O bandwidth.
In other words, in my experience, it's not possible to predict how long it will take.
Your table will be rebuilt with TxnID as the new clustered index, which is coincidentally the same as the primary key. But apparently MySQL Server doesn't recognize this special case as one that can use the shortcut of doing an inplace alter.
Your table also has eight other secondary indexes, five of which are varchars. It has to build those indexes during the table restructure. That's a lot of I/O to build those indexes in addition to the clustered index. That's likely what's taking so much time.
You'll go through a similar process when you add your new auto-increment primary key. You could have saved some time if you had dropped your old primary key and created the new auto-increment primary key in one ALTER TABLE statement.
(I agree with Bill's answer; here are more comments.)
I would kill the process and rethink whether there is any benefit in a AUTO_INCREMENT.
I try to look beyond the question to the "real" question. In this case it seems to be something as-yet-unspoken that calls for an AUTO_INCREMENT; please elaborate.
Your current PRIMARY KEY is 6 bytes. Your new PK will be 4 bytes if INT or 8 bytes if BIGINT. So, there will be only a trivial savings or loss in disk space utilization.
Any lookups by TxnID will be slowed down because of going through the AI. And since TxnID is UNIQUE and non-null, it seems like the optimal "natural" PK.
A PK is a Unique key, so UNIQUE(TxnID) is totally redundant; DROPping it would save space without losing anything. That is the main recommendation I would have (just looking at the schema).
When I see a table with essentially every column being NULL, I am suspicious that the designer did not make a conscious decision about the nullness of the columns.
DECIMAL(13,2) would be a lot of dollars or Euros, but as a PK, it is quite unusual. What's up?
latin1? No plans for globalization?
Lots of single-column indexes? WHERE a=1 AND b=2 begs for a composite INDEX(a,b).
Back to estimating time...
If the ALTER rebuilds the 8-9 indexes, then is should do what it can with a disk sort. This involves writing stuff to disk, using an efficient disk-based sort that involves some RAM, then reading the sorted result to recreate the index. A sort is O(log N), thereby making it non-linear. This makes it hard to predict the time taken. Some newer versions of MariaDB attempt estimate the remaining time, but I don't trust it.
A secondary index includes the column(s) being index, plus any other column(s) of the PK. Each index in that table will occupy about 5-10GB of disk space. This may help you convert to IOPs or whatever. But note that (assuming you don't have much RAM), that 5-10GB will be reread a few (several?) times during the sort that rebuilds the index.
When doing multiple ALTERs, do them in a single ALTER statement. That way, all the work (especially rebuilding of secondary indexes) need be done only once.
You have not said what version you are using. Older versions hand one choice: "COPY": Create new table; copy data over; rebuild indexes; rename. New versions can deal with secondary indexes "INPLACE". Note: changes to the PRIMARY KEY require the copy method.
For anyone interested:
This is run on Amazon Aurora with 30GB of data stored. I could not find any information on how IOPS is provisioned for this, but I expected at worst case there would be 90IOPS available consistently. To write 10GB in and out would take around 4 hours.
I upgraded the instance to db.r3.8xlarge before running the alter table.
Then ran
alter table `Usage` drop primary key, add id bigint auto_increment primary key
it took 1h 21m, which is much better than expected.

Handling huge MyISAM table for optimisation

I have a huge (and growing) MyISAM table (700millions rows = 140Gb).
CREATE TABLE `keypairs` (
`ID` char(60) NOT NULL,
`pair` char(60) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=MyISAM
The table option was changed to ROW_FORMAT=FIXED, cause both columns are always fixed length to max (60). And yes yes, ID is well a string sadly and not an INT.
SELECT queries are pretty ok in speed efficiency.
Databases and mysql engine are all 127.0.0.1/localhost. (nothing distant)
Sadly, INSERT is slow as hell. I dont even talk about trying to LOAD DATA millions new rows... takes days.
There won't have any concurrent read on it. All SELECTs are done one by one by only my local server.(it is not for client's use)
(for infos : files sizes .MYD=88Gb, .MYI=53Gb, .TMM=400Mb)
How could i speed up inserts into that table?
Would it help to PARTITION that huge table ? (how then?)
I heard MyISAM is using "structure cache" as .frm files. And that a line into config file is helping mysql keep in memory all the .frm (in case of partitionned), would it help also? Actualy, my .frm file is 9kb only for 700millions rows)
string shortenning/compress function... the ID string? (same idea as rainbow tables) even if it lowers the max allowed unique ID's, i will anyway never reach the max of 60chars. so maybe its an idea? but before creating a new unique ID i have to check if shortened string doesn't exists in db ofc
Same idea as shortening ID strings, what about using md5() on the ID? shorten string means faster or not in that case?
Sort the incoming data before doing the LOAD. This will improve the cacheability of the PRIMARY KEY(id).
PARTITIONing is unlikely to help, unless there is some useful pattern to ID.
PARTITIONing will not help for single-row insert nor for single-row fetch by ID.
If the strings are not a constant width of 60, you are wasting space and speed by saying CHAR instead of VARCHAR. Change that.
MyISAM's FIXED is useful only if there is a lot of 'churn' (deletes+inserts, and/or updates).
Smaller means more cacheable means less I/O means faster.
The .frm is an encoding of the CREATE TABLE; it is not relevant for this discussion.
A simple compress/zip/whatever will almost always compress text strings longer than 10 characters. And they can be uncompressed, losslessly. What do your strings look like? 60-character English text will shrink to 20-25 bytes.
MD5 is a "digest", not a "compression". You cannot recover the string from its MD5. Anyway, it would take 16 bytes after converting to BINARY(16).
The PRIMARY KEY is a BTree. If ID is somewhat "random", then the 'next' ID (unless the input is sorted) is likely not to be cached. No, the BTree is not rebalanced all the time.
Turning the PRIMARY KEY into a secondary key (after adding an AUTO_INCREMENT) will not speed things up -- it still has to update the BTree with ID in it!
How much RAM do you have? For your situation, and for this LOAD, set MyISAM's key_buffer_size to about 70% of available RAM, but not bigger than the .MYI file. I recommend a big key_buffer because that is where the random accesses are occurring; the .MYD is only being appended to (assuming you have never deleted any rows).
We do need to see your SELECTs to make sure these changes are not destroying performance somewhere else.
Make sure you are using CHARACTER SET latin1 or ascii; utf8 would waste a lot more space with CHAR.
Switching to InnoDB will double, maybe triple, the disk space for the table (data+index). Therefore, it will probably show down. But a mitigating factor is that the PK is "clustered" with the data, so you are not updating two things for each row inserted. Note that key_buffer_size should be lowered to 10M and innodb_buffer_pool_size should be set to 70% of available RAM.
(My bullet items apply to InnoDB except where MyISAM is specified.)
In using InnoDB, it would be good to try to insert 1000 rows per transaction. Less than that leads to more transaction overhead; more than that leads to overrunning the undo log, causing a different form of slowdown.
Hex ID
Since ID is always 60 hex digits, declare it to be BINARY(30) and pack them via UNHEX(...) and fetch via HEX(ID). Test via WHERE ID = UNHEX(...). That will shrink the data about 25%, and MyISAM's PK by about 40%. (25% overall for InnoDB.)
To do just the conversion to BINARY(30):
CREATE TABLE new (
ID BINARY(30) NOT NULL,
`pair` char(60) NOT NULL
-- adding the PK later is faster for MyISAM
) ENGINE=MyISAM;
INSERT INTO new
SELECT UNHEX(ID),
pair
FROM keypairs;
ALTER TABLE keypairs ADD
PRIMARY KEY (`ID`); -- For InnoDB, I would do differently
RENAME TABLE keypairs TO old,
new TO keypairs;
DROP TABLE old;
Tiny RAM
With only 2GB of RAM, a MyISAM-only dataset should use something like key_buffer_size=300M and innodb_buffer_pool_size=0. For InnoDB-only: key_buffer_size=10M and innodb_buffer_pool_size=500M. Since ID is probably some kind of digest, it will be very random. The small cache and the random key combine to mean that virtually every insert will involve a disk I/O. My first estimate would be more like 30 hours to insert 10M rows. What kind of drives do you have? SSDs would make a big difference if you don't already have such.
The other thing to do to speed up the INSERTs is to sort by ID before starting the LOAD. But that gets tricky with the UNHEX. Here's what I recommend.
Create a MyISAM table, tmp, with ID BINARY(30) and pair, but no indexes. (Don't worry about key_buffer_size; it won't be used.)
LOAD the data into tmp.
ALTER TABLE tmp ORDER BY ID; This will sort the table. There is still no index. I think, without proof, that this will be a filesort, which is much faster that "repair by key buffer" for this case.
INSERT INTO keypairs SELECT * FROM tmp; This will maximize the caching by feeding rows to keypairs in ID order.
Again, I have carefully spelled out things so that it works well regardless of which Engine keypairs is. I expect step 3 or 4 to take the longest, but I don't know which.
Optimizing a table requires that you optimize for specific queries. You can't determine the best optimization strategy unless you have specific queries in mind. Any optimization improves one type of query at the expense of other types of queries.
For example, if your query is SELECT SUM(pair) FROM keypairs (a query that would have to scan the whole table anyway), partitioning won't help, and just adds overhead.
If we assume your typical query is inserting or selecting one keypair at a time by its primary key, then yes, partitioning can help a lot. It all depends on whether the optimizer can tell that your query will find its data in a narrow subset of partitions (ideally one partition).
Also make sure to tune MyISAM. There aren't many tuning options:
Allocate key_buffer_size as high as you can spare to cache your indexes. Though I haven't ever tried anything higher than about 10GB, and I can't guarantee that MyISAM key buffers are stable at 53GB (the size of your MYI file).
Pre-load the key buffers: https://dev.mysql.com/doc/refman/5.7/en/cache-index.html
Size read_buffer_size and read_rnd_buffer_size appropriately given the queries you run. I can't give a specific value here, you should test different values with your queries.
Size bulk_insert_buffer_size to something large if you want to speed up LOAD DATA INFILE. It's 8MB by default, I'd try at least 256MB. I haven't experimented with that setting, so I can't speak from experience.
I try not to use MyISAM at all. MySQL is definitely trying to deprecate its use.
...is there a mysql command to ALTER TABLE add INT ID increment column automatically?
Yes, see my answer to https://stackoverflow.com/a/251630/20860
First, your primary key is not incrementable.
Which means, roughly: at every insert the index have to be rebalanced.
No wonder it goes slowpoke at the table of such a size.
And such an engine...
So, to the second: what's the point of keeping that MyISAM old junk?
Like, for example, you don't mind to loose row or two (or -teen) in case of an accident? And etc, etc, etc, even setting aside that current MySQL maintainer (Oracle Corp) explicitly discourages usage of MyISAM.
So, here are possible solutions:
1) Switch to Inno;
2) If you can't surrender the char ID, then:
Add autoincrement numerical key and set it primary - then, index would be clustered and the cost of insert would drop significantly;
Turn your current key into secondary index;
3) In case you can - it's obvious

MySQL indexing on a non-primary-key column

I am running some MySQL queries on a pretty large table (not on Facebook scale, but around a million rows), and I am finding them very slow. The reason, I suspect, is that I am querying on an id field, but that id has not been declared as primary key, and also no index has been declared.
I cannot set the id field to primary key, because it is not unique, although its cardinality is pretty close to 1. Under these circumstances, if I do a alter table to add an index on the id field, is it supposed to boost up the query speed, given that it is not a primary key?
And supposing it does, how long will it take for the index to develop fully so that the queries start executing quickly? I mean, the moment the prompt appears after executing the alter table, or is it that even though the prompt appears the index building will go on internally for quite some time? (I am asking before doing it because I am not sure whether declaring index on non-unique field corrupts the db or not)
Any index will speed up queries that match on the corresponding column. There's no significant difference between the primary key and other indexes in this regard.
The index is created immediately when you execute the ALTER TABLE query. When the prompt returns, the index is there and will be used. There's no corruption while this is happening.

"CACHE INDEX" and "LOAD INDEX INTO CACHE" in MySQL

The MySQL documentation implies that you can assign one or more of a table's indexes to a named key buffer (and preload them). The syntax definition in the manual is:
CACHE INDEX
tbl_index_list [, tbl_index_list] ...
IN key_cache_name
tbl_index_list:
tbl_name [[INDEX|KEY] (index_name[, index_name] ...)]
which seems to say that you could assign just one of a table's indexes to the named key buffer. For example:
SET GLOBAL my_keys.key_buffer_size=512*1048576;
CACHE INDEX my_table KEY (PRIMARY) INTO my_keys;
LOAD INDEX INTO CACHE my_table KEY (PRIMARY);
would load only the PRIMARY index of my_table.
But from what I can tell, it doesn't work like that, at least, not in 5.0.87. Instead, the server appears to load all the table's indexes, effectively ignoring the index list part in parenthesis.
For example, I have a big dictionary table:
CREATE TABLE dict (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(330) NOT NULL,
UNIQUE KEY (name) );
Now, if I attempt to load just the PRIMARY index, the mysqld's resident size in memory increases by the size of dict.MYI (733 MB in my example) which is buch bigger than the size of the PRIMARY index alone (103 MB).
UPDATE 2011-01-08: The documentation for CACHE INDEX actually provides the answer:
The syntax of CACHE INDEX enables you to specify that only particular indexes from a table should be assigned to the cache. The current implementation assigns all the table's indexes to the cache, so there is no reason to specify anything other than the table name.
If I would have properly read the very documentation I referenced in the OP, none of this would ever have happened.
fsb's answer to his own question for those who missed the update.
The syntax of CACHE INDEX enables you
to specify that only particular
indexes from a table should be
assigned to the cache. The current
implementation assigns all the table's
indexes to the cache, so there is no
reason to specify anything other than
the table name.
CACHE INDEX
This restriction does not apply in MySQL 5.7.2 or later