MyISAM vs InnoDB for quick inserts and a composite unique key - mysql

Context: I'm creating a multi-threaded application that will be inserting/updating rows very frequently.
Originally I had the following table:
#TABLE 1
CREATE TABLE `example` (
`id` BIGINT(20) NOT NULL,
`state` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`, `state`))
ENGINE = MyISAM;
However after doing some research I found that MySQL uses table-level locking for MyISAM tables permitting only one session to update those tables at a time (source). Not good for a multi-threaded application making frequent changes to the table.
As such, it was suggested that I switch from a composite primary key to an auto-generated primary key with a unique index for id/state. This would allow for quick inserts while still enforcing the unique combination of the id/state.
#TABLE 2
CREATE TABLE `example` (
`key` BIGINT(20) NOT NULL,
`id` BIGINT(20) NOT NULL,
`state` VARCHAR(45) NOT NULL,
PRIMARY KEY (`key`),
UNIQUE INDEX `ID_STATE` (`id` ASC, `state` ASC))
ENGINE = MyISAM;
InnoDB however avoids table locks and instead uses row-level locking (source) so I thought of switching over to the following:
#TABLE 3
CREATE TABLE `example` (
`key` BIGINT(20) NOT NULL,
`id` BIGINT(20) NOT NULL,
`state` VARCHAR(45) NOT NULL,
PRIMARY KEY (`key`),
UNIQUE INDEX `ID_STATE` (`id` ASC, `state` ASC))
ENGINE = InnoDB;
But after reading up about InnoDB, I discovered InnoDB organizes data using a clustered index and secondary indexes require multiple look ups. One for the secondary index and another for the primary key (source). As such I'm debating switching to the following:
#TABLE 4
CREATE TABLE `example` (
`id` BIGINT(20) NOT NULL,
`state` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`, `state`))
ENGINE = InnoDB;
I'm wondering if all my assumptions are correct:
MyISAM table locks the entire table for INSERTS, UPDATES, and DELETES permitting only one session to update those tables at a time
InnoDB handles INSERTS with composite primary keys quicker than MyISAM. This is because InnoDB, unlike MyISAM, does not lock the entire table to scan and reserve a new primary key.
When using InnoDB I should make a composite primary key rather than a composite unique index because a secondary index requires multiple look ups.
I should be using Table 4

1-yes, 2-yes, 3-yes, 4-yes.
Also...
Do you really need BIGINT? Won't 4 billion values in INT UNSIGNED suffice? (And save half the space.) Presumably id is the PK of some other table? If so, that table would need changing, too.
Can state be normalized? Or turned into an ENUM? Again saving space.
Item 3 is worse than mentioned because of the need to lock on two unique keys.

Related

How do I partition a MySQL table that contains several unique keys?

I have an extremely large MySQL table that I would like to partition. A simplified create of this table is as given below -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`)
)
I want to partition my table either by columnA or id, but the problem is that the MySQL Manual states -
In other words, every unique key on the table must use every column in the table's partitioning expression.
Which means that I cannot partition the table on either of those columns without changing my schema. For example, I have considered adding id to all my unique keys like so -
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`columnA` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`columnB` varchar(50) NOT NULL ,
`columnC` int(11) DEFAULT NULL,
`columnD` varchar(255) DEFAULT NULL,
`columnE` int(11) DEFAULT NULL,
`columnF` varchar(255) DEFAULT NULL,
`columnG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_B` (`columnB`,`id`),
UNIQUE KEY `UNIQ_B_C` (`columnB`,`columnC`,`id`),
UNIQUE KEY `UNIQ_C_D` (`columnC`,`columnD`,`id`),
UNIQUE KEY `UNIQ_E_F_G` (`columnE`,`columnF`,`columnG`,`id`)
)
Which I do not mind doing except for the fact that it allows for the creation of rows that should not be created. For example, by my original schema, the following row insertion wouldn't have worked twice -
INSERT into myTable (columnC, columnD) VALUES (1.0,2.0)
But it works with the second schema as columnC and columnD by themselves no longer form a unique key. I have considered getting around this by using triggers to prevent the creation of such rows but then the trigger cost would reduce(or outweigh) the partitioning performance gain
Edited:
Some additional information about this table:
Table has more than 1.2Billion records.
Using Mysql 5.6.34 version with InnoDB Engine and running on AWS RDS.
Few other indexes are also there on this table.
Because of huge data and multiple indexes it is an expensive process to insert and retrieve the data.
There are no unique indexes on timestamp and float data types. It was just a sample table schema for illustration. Our actual table has similar schema as above table.
Other than Partitioning what options do we have to improve the
performance of the table without losing any data and maintaining the
integrity constraints.
How do I partition a MySQL table that contains several unique keys?
Sorry to say, you don't.
Also, you shouldn't. Remember that UPDATE and INSERT operations to a table with unique keys necessarily must query the table to ensure the keys stay unique. If it were possible to partition a table so unique keys weren't built in to the partititon expression, then every insert or update would require querying every partition. This would be likely to make the partitioning worse than useless.

MySql - Better Schema for large table IP pairings?

I'm trying to manage some internet logs. I'm essentially capturing what IPs are reaching out to what other IPs and making reports on it.
Problem is there's a ton of chatter and I'm not sure if I can make my schema any better.
my table schema:
CREATE TABLE `IpChatter` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`SourceIp` bigint(20) NULL,
`DestinationIp` bigint(20) NULL,
`SourcePort` int(11) NULL,
`DestinationPort` int(11) NULL,
`FKToSomeTableWithExtraMetaDataId` bigint(20) NOT NULL,
CONSTRAINT `PK_IpChatter` PRIMARY KEY (`Id` ASC)
) ENGINE=InnoDB;
CREATE INDEX `IX_IpChatter_FKToSomeTableWithExtraMetaDataId` ON `IpChatter` (`FKToSomeTableWithExtraMetaDataId`) using HASH;
CREATE INDEX `IX_IpChatter_Main_Query_SourceIp` ON `IpChatter` (`SourceIp`);
CREATE INDEX `IX_IpChatter_Main_Query_DestinationIp` ON `IpChatter` (`DestinationIp`);
CREATE INDEX `IX_IpChatter_Main_Query_SourcePort` ON `IpChatter` (`SourcePort`);
CREATE INDEX `IX_IpChatter_Main_Query_DestinationPort` ON `IpChatter` (`DestinationPort`);
ALTER TABLE `IpChatter` ADD CONSTRAINT `FK_IpChatter_FKToSomeTableWithExtraMetaData`
FOREIGN KEY (`FKToSomeTableWithExtraMetaDataId`) REFERENCES `FKToSomeTableWithExtraMetaData` (`Id`)
ON DELETE CASCADE;
Right now I've got 2mill rows of data and pulls back data I need in about 4sec. However this is from using relatively light testing data. I'd imagine the size of the data being 30X larger in the final product. So that 4 sec will surely mean 2mins in the final product. Is there a better way I could normalize this data or have I hit a bottle neck and there isn't much I can do? Also, Are the indexes I picked ok?
Never mind, I figured it out. I guess I just needed to type out the problem to help me think up a solution.
So after looking at my data I've noticed a lot of pairings are repeated but under a different FKToSomeTableWithExtraMetaDataId value.
So tells me I can normalize the data by creating a table with distinct pairings of SourceIp,DestinationIp,SourcePort,DestinationPort`. Then create a lookup table to join up that table with the ToSomeTableWithExtraMetaData table.
This reduces my raw IP data by 1700%! This will give a tremendous increase in performance when searching for a range of IPs and now it has to go though far less rows. Plus with the lookup table I have greater flexibility on how I can query.
CREATE TABLE `IpChatter` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`SourceIp` bigint(20) NULL,
`DestinationIp` bigint(20) NULL,
`SourcePort` int(11) NULL,
`DestinationPort` int(11) NULL,
`FKToSomeLookupTableId` bigint(20) NOT NULL,
CONSTRAINT `PK_IpChatter` PRIMARY KEY (`Id` ASC)
) ENGINE=InnoDB;
CREATE INDEX `IX_IpChatter_FKToSomeLookupTableId` ON `IpChatter` (`FKToSomeLookupTableId`) using HASH;
CREATE INDEX `IX_IpChatter_Main_Query_SourceIp` ON `IpChatter` (`SourceIp`);
CREATE INDEX `IX_IpChatter_Main_Query_DestinationIp` ON `IpChatter` (`DestinationIp`);
CREATE INDEX `IX_IpChatter_Main_Query_SourcePort` ON `IpChatter` (`SourcePort`);
CREATE INDEX `IX_IpChatter_Main_Query_DestinationPort` ON `IpChatter` (`DestinationPort`);
ALTER TABLE `IpChatter` ADD CONSTRAINT `FK_IpChatter_FKToSomeLookupTable`
FOREIGN KEY (`FKToSomeLookupTableId`) REFERENCES `FKToSomeLookupTable` (`Id`)
ON DELETE CASCADE;
CREATE TABLE `FKToSomeLookupTable` (
`FKToSomeTableWithExtraMetaDataId` bigint(20) NOT NULL,
`IpChatterId` bigint(20) NOT NULL,
CONSTRAINT `PK_FKToSomeLookupTable` PRIMARY KEY (`Id` ASC)
) ENGINE=InnoDB;
CREATE INDEX `IX_IpChatter_FKToSomeTableWithExtraMetaDataId` ON `FKToSomeLookupTable` (`FKToSomeTableWithExtraMetaDataId`) using HASH;
CREATE INDEX `IX_IpChatter_IpChatterId` ON `FKToSomeLookupTable` (`IpChatterId`) using HASH;
ALTER TABLE `FKToSomeLookupTable` ADD CONSTRAINT `FK_FKToSomeLookupTable_FKToSomeTableWithExtraMetaData`
FOREIGN KEY (`FKToSomeTableWithExtraMetaDataId`) REFERENCES `FKToSomeTableWithExtraMetaData` (`Id`)
ON DELETE CASCADE;
ALTER TABLE `FKToSomeLookupTable` ADD CONSTRAINT `FK_FKToSomeLookupTable_IpChatter`
FOREIGN KEY (`IpChatterId`) REFERENCES `IpChatter` (`Id`)
ON DELETE CASCADE;
Shrink the table size. Smaller is one way to help (some) with the speed.
IPv4 can be packed into INT UNSIGNED, which is 4 bytes versus your current 8-byte BIGINT. IPv6, on the other hand, needs BINARY(16); what you have will not work.
Port number, I think, will fit in a 2-byte SMALLINT UNSIGNED.
Are you expecting your tables to be bigger than 4 billion rows? If not, use INT UNSIGNED instead of BIGINT for ids.
Get rid of FOREIGN KEYs, they slow down things; meanwhile, the constraints have never triggered an error, have they? Do you really use the overhead of CASCADE?
Don't index every column. Look at your queries and index the columns or combinations of columns that would benefit SELECTs, UPDATEs, and DELETEs.
Please show the queries; without them, we cannot judge the performance.

Implicit column index

Given the following MySQL table (InnoDB type):
CREATE TABLE `table` (
`id` INT NOT NULL,
`foo_id` INT NOT NULL,
`bar_id` INT NOT NULL,
`name` VARCHAR NOT NULL,
PRIMARY KEY (`id`),
INDEX `on_foo_id` (`foo_id`),
INDEX `on_bar_id` (`bar_id`),
UNIQUE `on_foo_bar_id` (`btl_foo_id`, `btl_bar_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Note: even if I'm not using MySQL FOREIGN KEY constraints (I'm a Rails developer who handle this on the applicative level), the columns foo_id & bar_id are foreign keys.
Due to the presence of:
bar_id's index
(foo_id, bar_id)'s index
...I'm wondering if the index on foo_id is really relevant. Maybe MySQL may already index foo_id even without explicitly declare this column as an index.
In other words, is it possible to remove this line:
INDEX `on_foo_id` (`foo_id`),
without altering the performances?
Thank you for the light.

Foreign keys and unique index on the same table

I have this table:
CREATE TABLE `sites_routing` (
`origin_site_id` int(11) NOT NULL,
`destination_site_id` int(11) NOT NULL,
`distance_meters` int(11) DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(45) NOT NULL DEFAULT 'walking',
PRIMARY KEY (`id`)
)
I want to create two foreign keys for columns origin_site_id and destination_site_id and than another unique index on 3 columns: origin_site_id, destination_site_id and type. If I'm doing that mysql becomes really slow on reading and writing to the table and I can see in the error log stuff like:
sites_routing contains 4 indexes inside InnoDB, which is different from the number of ns 4 indexes inside InnoDB, which is different from the number of indexes 3 defined in the MySQL
Still, if I try to drop the fks and the indexs and recreate the fks when the unique key exists, mysql allows me to create only one fk using the unique key as index but for the other one it automatically creates another index.
What's the best way to go here?

Creating a new table that connects to the primary key of previous table in mySQL

Currently I have one table in a mySQL Database that I want to connect to another table I need to create. The second table is just going to have 2 columns, the member_id and product column. There is to be many entries in the product column for the same member_ID and the product column is to be a foreign key for another table.
Here is the SQL code I used to create the initial Login table:
'CREATE TABLE `members`(
`member_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`firstname` varchar(100) DEFAULT NULL,
`lastname` varchar(100) DEFAULT NULL,
`login` varchar(100) NOT NULL DEFAULT \'\',
`passwd` varchar(32) NOT NULL DEFAULT \'\',
PRIMARY KEY (`member_id`))
ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8'
If someone could show me how to create the second table to connect to member_id that would be great.
The MyISAM storage engine does not support foreign keys. If You want to ensure data integrity by using foreign keys You need to use another storage engine eg. InnoDB
Using MyISAM You can't define a foreign key but You can still define the tables as following:
http://sqlfiddle.com/#!2/4cb9c/1/0
Please note that it is important to create the indexes to ensure decent performance for joins. If You want to use InnoDB and foreign keys then the explicit index creation is not needed as adding a foreign key creates an implicit index.