Mysql: MyIsam or InnoDB when UUID will be used as PK - mysql

I am working on a project where I need to use a UUID (16bit) as unique identifier in the database (MySQL). The database has a lot of tables with relations. I have the following questions about using a UUID as PK:
Should I index the unique identifier as PK / FK or is it not necessary?
If I index it, the index size will increase, but it is really needed?
Enclose an example where i have to use uuid:
Table user with one unique identifier (oid) and foreign key (language).
CREATE TABLE user (
oid binary(16) NOT NULL,
username varchar(80) ,
f_language_oid binary(16) NOT NULL,
PRIMARY KEY (oid),
KEY f_language_oid (f_language_oid),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Is it helpful / necessary to define "oid" as PRIMARY_KEY and language as FOREIGN_KEY or would it be bette, if i only create the table without key definitions?
I have read in this article (here) that innodb will generate automatically an 6bit integer als primary key (hidden). In this case, it would be better to use the 6bit internal pk than the 16bit binary key?
If no index is required, should I use MyISAM or InnoDB?
Many thanks in advance.

With MySQL it's often advantageous to use a regular INT as your primary key and have a UUID as a secondary UNIQUE index. This is mostly because I believe MySQL uses the primary key as a row identifier in all secondary indexes, and having large values here can lead to vastly bigger index sizes. Do some testing at scale to see if this impacts you.
The one reason to use a UUID as a primary key would be if you're trying to spread data across multiple independent databases and want to avoid primary key conflicts. UUID is a great way to do this.
In either case, you'll probably want to express the UUID as text so it's human readable and it's possible to do manipulate data easily. It's difficult to paste in binary data into your query, for example, must to do a simple UPDATE query. It will also ensure that you can export to or import from JSON without a whole lot of conversion overhead.
As for MyISAM vs. InnoDB, it's really highly ill-advised to use the old MyISAM database in a production environment where data integrity and uptime are important. That engine can suffer catastrophic data loss if the database becomes corrupted, something as simple as an unanticipated reboot can cause this, and has trouble recovering. InnoDB is a modern, journaled, transactional database engine that's significantly more resilient and recovers from most sudden failure situations automatically, even database crashes.
One more consideration is evaluating if PostgreSQL is a suitable fit because it has a native UUID column type.

Related

Mysql - estimate time to drop index

We have a fairly unoptimized table with the following definition:
CREATE TABLE `Usage` (
`TxnDate` varchar(30) DEFAULT NULL,
`TxnID` decimal(13,2) NOT NULL,
`UserID2015` varchar(20) DEFAULT NULL,
`UserRMN` decimal(13,0) DEFAULT NULL,
`CustomerNo` decimal(13,0) DEFAULT NULL,
`OperatorName` varchar(50) DEFAULT NULL,
`AggregatorName` varchar(30) DEFAULT NULL,
`TransAmount` decimal(10,2) DEFAULT NULL,
`MMPLTxnID` decimal(13,0) DEFAULT NULL,
`ProductType` varchar(30) DEFAULT NULL,
`YearMonthRMN` varchar(50) DEFAULT NULL,
PRIMARY KEY (`TxnID`),
UNIQUE KEY `TxnID` (`TxnID`) USING BTREE,
KEY `TxnDate` (`TxnDate`),
KEY `OperatorName` (`OperatorName`),
KEY `AggregatorName` (`AggregatorName`),
KEY `MMPLTxnID` (`MMPLTxnID`),
KEY `ProductType` (`ProductType`),
KEY `UserRMN` (`UserRMN`),
KEY `YearMonthRMN` (`YearMonthRMN`) USING BTREE,
KEY `CustomerNo` (`CustomerNo`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The table has abotu 170M records.
I want to drop the primary key and instead add an auto number primary key. So far the index dropping has taken 2h.
Why is it taking so long to remove an index, is there any sorting happening?
How can I estimate the time to drop the index?
When I add the autonumber, will I have to estimate time for sorting the table or will this not be necessary with a new autonumber index?
You're not just dropping an index, you're dropping the primary key.
Normally, InnoDB tables are stored as a clustered index based on the primary key, so by dropping the primary key, it has to create a new table that uses either the secondary unique key or else an auto-generated key for its clustered index.
I've done a fair amount of MySQL consulting, and the question of "how much time will this take?" is a common question.
It takes as long as it takes to build a new clustered index on your server. This is hard to predict. It depends on several factors, like how fast your server's CPUs are, how fast your storage is, and how much other load is going on concurrently, competing for CPU and I/O bandwidth.
In other words, in my experience, it's not possible to predict how long it will take.
Your table will be rebuilt with TxnID as the new clustered index, which is coincidentally the same as the primary key. But apparently MySQL Server doesn't recognize this special case as one that can use the shortcut of doing an inplace alter.
Your table also has eight other secondary indexes, five of which are varchars. It has to build those indexes during the table restructure. That's a lot of I/O to build those indexes in addition to the clustered index. That's likely what's taking so much time.
You'll go through a similar process when you add your new auto-increment primary key. You could have saved some time if you had dropped your old primary key and created the new auto-increment primary key in one ALTER TABLE statement.
(I agree with Bill's answer; here are more comments.)
I would kill the process and rethink whether there is any benefit in a AUTO_INCREMENT.
I try to look beyond the question to the "real" question. In this case it seems to be something as-yet-unspoken that calls for an AUTO_INCREMENT; please elaborate.
Your current PRIMARY KEY is 6 bytes. Your new PK will be 4 bytes if INT or 8 bytes if BIGINT. So, there will be only a trivial savings or loss in disk space utilization.
Any lookups by TxnID will be slowed down because of going through the AI. And since TxnID is UNIQUE and non-null, it seems like the optimal "natural" PK.
A PK is a Unique key, so UNIQUE(TxnID) is totally redundant; DROPping it would save space without losing anything. That is the main recommendation I would have (just looking at the schema).
When I see a table with essentially every column being NULL, I am suspicious that the designer did not make a conscious decision about the nullness of the columns.
DECIMAL(13,2) would be a lot of dollars or Euros, but as a PK, it is quite unusual. What's up?
latin1? No plans for globalization?
Lots of single-column indexes? WHERE a=1 AND b=2 begs for a composite INDEX(a,b).
Back to estimating time...
If the ALTER rebuilds the 8-9 indexes, then is should do what it can with a disk sort. This involves writing stuff to disk, using an efficient disk-based sort that involves some RAM, then reading the sorted result to recreate the index. A sort is O(log N), thereby making it non-linear. This makes it hard to predict the time taken. Some newer versions of MariaDB attempt estimate the remaining time, but I don't trust it.
A secondary index includes the column(s) being index, plus any other column(s) of the PK. Each index in that table will occupy about 5-10GB of disk space. This may help you convert to IOPs or whatever. But note that (assuming you don't have much RAM), that 5-10GB will be reread a few (several?) times during the sort that rebuilds the index.
When doing multiple ALTERs, do them in a single ALTER statement. That way, all the work (especially rebuilding of secondary indexes) need be done only once.
You have not said what version you are using. Older versions hand one choice: "COPY": Create new table; copy data over; rebuild indexes; rename. New versions can deal with secondary indexes "INPLACE". Note: changes to the PRIMARY KEY require the copy method.
For anyone interested:
This is run on Amazon Aurora with 30GB of data stored. I could not find any information on how IOPS is provisioned for this, but I expected at worst case there would be 90IOPS available consistently. To write 10GB in and out would take around 4 hours.
I upgraded the instance to db.r3.8xlarge before running the alter table.
Then ran
alter table `Usage` drop primary key, add id bigint auto_increment primary key
it took 1h 21m, which is much better than expected.

MySQL ( InnoDB): Guid as Primary Key for a Distributed Database

I come from the MSSQL world and have no expert knowledge in MySQL.
Having a GUID as primary key in these two different RDBMs systems is possible. In MSSQL i better do some things in order to not run into a performance nightmare as the row count increases (many million rows).
I create the primary key as a non clustered index to prevent that the database pages change if i insert a new row. If i don't do that the system would insert the row between some existing rows and in order to do that the hard drive needs to find the right position of the page on the disc. I create a second column of a numeric type and this time as a clustered index. This guarantees that new rows will get appended on insert.
Question
But how i do this in MySQL? If my information is right, i cannot force mysql to a non clustered primary key. Is this necessary or does MySQL stores the data in a manner that will not result in a performance disaster later?
Update: But why?
The reason i want to do this is because i want to be able to realize a distributed database.
I ended up using a Sequential GUIDs as described on
CodeProject: GUIDs as fast primary keys under multiple databases.
Great performance!

Which storage structure is the best?

I have a table -
CREATE TABLE `DBMSProject`.`ShoppingCart` (
`Customer_ID` INT NOT NULL,
`Seller_ID` INT NOT NULL,
`Product_ID` INT NOT NULL,
`Quantity` INT NOT NULL,
PRIMARY KEY (`Customer_ID`,`Seller_ID`,`Product_ID`));
This table has a lot more insert operations and delete operations than update operations..
Which storage structure is most suited in decreasing overall operation and access time ? I'm not sure between Hash Structure and B+ Tree structure and ISAM. PS - The number or records is in an order of 10 Million
For a shopping cart, data integrity is more important than raw speed.
MyISAM doesn't support ACID transactions, and it doesn't enforce foreign keys constraints. With all those ID numbers you're showing, I wouldn't proceed with an engine that doesn't enforce foreign key constraints.
In general, I favor testing over speculation. You can build tables, load them with 10 million rows of random(ish) data, index them several different ways, and run timing tests with representative SQL statements in just a couple of hours. Use similar hardware, if possible.
And when it comes to indexes, you can drop them and add them without having to rewrite any application code. If you can't be bothered to test, just pick one. Later, if you have a performance problem that explain suggests might be related to the index, drop it and create a different one. (After you do this a couple of times, you'll probably discover that you can spare the time to test.)

How do I set GUID / UUID as primary (and not clustered) key in MySQL?

Reason I'm using GUID / UUID as primary key: Data syncing across devices. I have a master database on the server, and then each device has its own database with the same structure (although, different engines. MySQL on the server, SQLite on the Android devices, etc).
I've read that if you're going to use GUID's as your primary key, it should at least not be the clustering key. However, I can't find how to do that with MySQL. All I can find is this reference that says if you have a primary key, InnoDB will use it as the clustering key.
Am I missing something?
The article you linked to is about Microsoft SQL Server, which gives you the option of which index to use as the clustering key.
With MySQL's InnoDB storage engine, you have no option. The primary key is always used as the clustering key. If there is no primary key, then it uses the first non-null unique key. Absent any such unique key, InnoDB generates its own internal 6-byte key as the clustering key.
So you could make a table that uses a GUID as a non-unique key, but in practice use it as a candidate key.
CREATE TABLE MyTable (
guid CHAR(32) NOT NULL,
/* other columns... */
KEY (guid) -- just a regular secondary index, neither primary nor unique
);
However, there's a legitimate use for the clustering key. If you frequently do lookups based on the GUID, they will be more efficient if you use the GUID as the clustering key.
The concerns about using a GUID as the clustering key are mostly about space. Inserting into the middle of a clustered index can cause a bit of fragmentation, but that's not necessarily a huge problem in MySQL.
The other issue is that in InnoDB, secondary indexes implicitly contain the primary key, so a CHAR(32) or whatever you use to store the GUID is going to be appended to each entry in other indexes. This makes them take more space than if you had used an integer as the primary key.

What could cause very slow performance of single UPDATEs of a InnoDB table?

I have a table in my web app for storing session data. It's performing badly, and I can't figure out why. Slow query log shows updating a row takes anything from 6 to 60 seconds.
CREATE TABLE `sessions` (
`id` char(40) COLLATE utf8_unicode_ci NOT NULL,
`payload` text COLLATE utf8_unicode_ci NOT NULL,
`last_activity` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `session_id_unique` (`id`) USING HASH
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The PK is a char(40) which stores a unique session hash generated by the framework this project uses (Laravel).
(I'm aware of the redundancy of the PK and unique index, but I've tried all combinations and it doesn't have any impact on performance in my testing. This is the current state of it.)
The table is small - fewer than 200 rows.
A typical query from the slow query log looks like this:
INSERT INTO sessions (id, payload, last_activity)
VALUES ('d195825ddefbc606e9087546d1254e9be97147eb',
'YTo1OntzOjY6Il90b2tlbiI7czo0MDoi...around 700 chars...oiMCI7fX0=',
1405679480)
ON DUPLICATE KEY UPDATE
payload=VALUES(payload), last_activity=VALUES(last_activity);
I've done obvious things like checking the table for corruption. I've tried adding a dedicated PK column as an auto increment int, I've tried without a PK, without the unique index, swapping the text column for a very very large varchar, you name it.
I've tried switching the table to use MyISAM, and it's still slow.
Nothing I do seems to make any difference - the table performs very slowly.
My next thought was the query. This is generated by the framework, but I've tested hacking it out into a UPDATE with an INSERT if that fails. The slowness continued on the UPDATE statement.
I've read a lot of questions about slow INSERT and UPDATE statements, but those usually related to bulk transactions. This is just one insert/update per user per request. The site is not remotely busy, and it's on its own VPS with plenty of resources.
What could be causing the slowness?
This is not an answer but SE comment length is too damn short. So.
What happens if you run an identical INSERT ... ON DUPLICATE KEY UPDATE... statement directly on the command line? Please try with and without actual usage of the application. The application may be artificially slowing down this UPDATE (for example, in INNODB a transaction might be opened, but committed after a lot of time was consumed. You tested with MyISAM too which does not support transactions. Perhaps in that case an explicit LOCK could account for the same effect. If the framework uses this trick, I'm not sure, I don't know laravel) Try to benchmark to see if there is a concurrency effect.
Another question: is this a single server? Or is it a master that replicates to one or more slaves?
Apart from this question, a few observations:
the values for id are hex strings. the column is unicode. this means 3*40 bytes are reserved while only 40 are utilized. This is a waste that will make things inefficient in general. It would be much better to use BINARY or ASCII as character encoding. Better yet, change the id column to BINARY data type and store the (unhexed) binary value
A hash for a innodb PK table will scatter the data across pages. The idea to use a auto_incrment pk, or not explicitly declare a pk at all (this will cause innodb to create an autoincrement pk of its own internally) is a good idea.
It looks like the payload is base64 encoded. Again the character encoding is specified to be unicode. Ascii or Binary (the character encoding, not the data type) is much more appropriate.
the HASH keyword in the unique index on ID is meaningless. InnoDB does not implement HASH indexes. Unfortunately MySQL is perfectly silent about this (see http://bugs.mysql.com/bug.php?id=73326)
(while this list does offer angles for improvement it seems unlikely that the extreme slowness can be fixed with this. there must be something else going on)
Frustratingly, the answer is this case was a bad disk. One of the disks in the storage array had gone bad, and so writes were taking forever to complete. Simply that.