I have recently switched my project tables to InnoDB (thinking the relations would be a nice thing to have). I'm using a PHP script to index about 500 products at a time.
A table storing word/ids association:
CREATE TABLE `windex` (
`word` varchar(64) NOT NULL,
`wid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`count` int(11) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`wid`),
UNIQUE KEY `word` (`word`)
) ENGINE=InnoDB AUTO_INCREMENT=324551 DEFAULT CHARSET=latin1
Another table stores product id/word id associations:
CREATE TABLE `indx_0` (
`wid` int(7) unsigned NOT NULL,
`pid` int(7) unsigned NOT NULL,
UNIQUE KEY `wid` (`wid`,`pid`),
KEY `pid` (`pid`),
CONSTRAINT `indx_0_ibfk_1` FOREIGN KEY (`wid`) REFERENCES `windex` (`wid`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `indx_0_ibfk_2` FOREIGN KEY (`pid`) REFERENCES `product` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The script was tested using MyISAM and it indexes products relatively fast (much, much faster than InnoDB). First time running in InnoDB it was ridiculously slow but after nesting more values together I ended up speeding it up by a lot (but not enough).
I would assume innodb would be much faster for this type of thing because of rowlevel locks but that's not the case.
I construct a query that looks something like:
SELECT
title,keywords,upc,...
FROM product
WHERE indexed = 0
LIMIT 500
I create a loop and fill an array with all the words that need to be added to windex and all the word id/product id pairs that need to be added to indx_0.
Because innodb keeps increasing my auto-increment values whenever i do a "REPLACE INTO" or "INSERT IGNORE INTO" that fails because of duplicate values, I need to make sure the values I add don't already exist. To do that I first select all values that exist using a query like such:
SELECT wid,word
FROM windex
WHERE
word = "someword1" or word = "someword2" or word = "someword3" ... ...
Then I filter out my array against the results which exist so all the new words I add are 100% new.
This takes about 20% of overall execution time. The other 80% goes into adding the pair values into indx_0, for which there are many more values.
Here's an example of what I get.
0.4806 seconds to select products. (0.4807 sec total).
0.0319 seconds to gather 500 items. (0.5126 sec total).
5.2396 seconds to select windex values for comparison. (5.7836 sec total).
1.8986 seconds to update count. (7.6822 sec total).
0.0641 seconds to add 832 windex records. (7.7464 sec total).
17.2725 seconds to add index of 3435 pid/wid pairs. (25.7752 sec total).
Operation took 26.07 seconds to index 500 products.
The 3435 pairs are being all executed in a single query such as:
INSERT INTO indx_0(pid,wid)
VALUES (1,4),(3,9),(9,2)... ... ...
Why is InnoDB so much slower than MyISAM in my case?
InnoDB provides more complex keys structure than MyIsam (FOREIGN KEYS) and regenerating keys is really slow in InnoDB. You should enclose all update/insert statements into one transactions (those are actually quite fast in InnoDB, once I had about 300 000 insert queries on InnoDb table with 2 indexes and it took around 30 minutes, once I enclosed every 10 000 inserts into BEGIN TRANSACTION and COMMIT it took less than 2 minutes).
I recommend to use:
BEGIN TRANSACTION;
SELECT ... FROM products;
UPDATE ...;
INSERT INTO ...;
INSERT INTO ...;
INSERT INTO ...;
COMMIT;
This will cause InnoDB to refresh indexes just once not few hundred times.
Let me know if it worked
I had a similar problem and it seems InnoDB has by default innodb_flush_log_at_trx_commit enabled which flushes every insert/update query on your hdd log file. The writing speed of your hard disk is a bottleneck for this process.
So try to modify your mysql config file
`innodb_flush_log_at_trx_commit = 0`
Restart mysql service.
I experienced about x100 speedup on inserts.
Related
MySQL seems to be very slow for updates.
A simple update statement is taking more time than MS SQL for same update call.
Ex:
UPDATE ValuesTbl SET value1 = #value1,
value2 = #value2
WHERE co_id = #co_id
AND sel_date = #sel_date
I have changed some config settings as below
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=10G
innodb_log_file_size=2G
log-bin="foo-bin"
skip-log-bin
This is the create table query
CREATE TABLE `valuestbl` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sel_date` datetime NOT NULL,
`co_id` int(11) NOT NULL,
`value1` decimal(10,2) NOT NULL,
`value2` decimal(10,2) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21621889 DEFAULT CHARSET=latin1;
MySQL version: 8.0 on Windows
The update query takes longer time to update when compared to MS SQL, anything else I need to do to make it faster?
There are no indices, the ValuesTbl tables has a PK, not using for anything. the id column is a Primary key from another table, the sel_date is a date field and 2 decimal columns
If there are no indexes on ValuesTbl then the update has to scan the entire table which will be slow if the table is large. No amount of server tuning will fix this.
A simple update statement is taking more time than MS SQL for same update call.
The MS SQL server probably has an index on either co_id or sel_date. Or it has fewer rows in the table.
You need to add indexes, like the index of a book, so the database doesn't have to search the whole table. At minimum an index on co_id will vastly help performance. If there are many columns with different sel_date per ID, a compound index on (co_id, sel_date) would help further.
See Use The Index, Luke for an extensive tutorial on indexes.
I have MyIsam table with few records (about 20):
CREATE TABLE `_cm_dtstd_37` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`NUMBER` int(10) NOT NULL COMMENT 'str',
`DESCRIPTION` char(32) NOT NULL COMMENT 'str',
PRIMARY KEY (`id`),
UNIQUE KEY `PHONE` (`NUMBER`)
) ENGINE=MyISAM AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT='==CORless Numbers=='
Single insert:
INSERT IGNORE INTO _cm_dtstd_37 VALUES(NULL, 55555, '55555')
takes very long time to execute (about 5 to 7 minutes) and makes MySql server put every next query on 'wait' state. No other query (even those that read/write other tables) is executed until first INSERT is done.
I have no idea how to debug this and where to search for any clue.
All inserts to another tables work well, whole database works great when not inserting to feral table.
That is one big reason for moving from MyISAM to InnoDB.
MyISAM allows multiple simultaneous reads (SELECT), but any type of write locks the entire table, even against writes.
InnoDB uses "row locking", so most simultaneous accesses to a table have no noticeable impact on each other.
In the table of 350 million records, the structure is:
CREATE TABLE `table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`job_id` int(10) unsigned NOT NULL,
`lock` mediumint(6) unsigned DEFAULT '0',
`time` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `job_id` (`job_id`),
KEY `lock` (`lock`),
KEY `time` (`time`)
) ENGINE=MyISAM;
What index should I create to speed up the query:
UPDATE `table` SET `lock` = 1 WHERE `lock` = 0 ORDER BY `time` ASC LIMIT 500;
lock is declared to be NULLable. Does this mean that the value is often NULL? If so, then there is a nasty problem in MyISAM (not InnoDB) that may lead to 500 additional fragmentation hits.
When a MyISAM row is updated and it becomes longer, then the row will not longer fit where it is. (Now my detailed knowledge gets fuzzy.) The new row will be put somewhere else and/or it will be broken into two parts, with a link between the parts. That implies writes in two places.
As Gordon pointed out, any change to any indexed column, lock in your case, involved a costly index update -- remove a 'row' from one place in the index's BTree and add a row in another place.
Does lock have only values 0 or 1? Then use TINYINT (1 byte), not MEDIUMINT (3 bytes).
You should check MAX(id). If it is clean, id's max will be about 350M (not too close to the limit of 4B). But if there has been any churn, it may be much closer to the limit.
I, too, advocate switching to InnoDB. However your 10GB (data+indexes) will grow to 20-30GB in the conversion.
Are you "locking the oldest unlocked" thingies? Will you then do a select to see what got locked?
If this is too slow, don't do 500 at once, pick a lower number.
With InnoDB, can you avoid locking? Perhaps transactional locking would suffice?
I think we need to see the rest of the environment -- other tables, job "flow", etc. There may be other things we can suggest.
And I second the motion for INDEX(lock, time). But when doing so, DROP the index on just lock as being redundant.
And when converting to InnoDB, do all the index changes in the same ALTER. This will run faster than separate passes.
For this query:
UPDATE `table`
SET `lock` = 1
WHERE `lock` = 0
ORDER BY `time` ASC
LIMIT 500;
The best index is table(lock, time). Do note, however, that the update also needs to update the index, so you should test how well this works in practice. Do not make this a clustered index. That will just slow down the process.
Hey guys so I have been wrestling with a problem in my Innodb. The database I am designing is binning built to house all domains listed under the .com and .net. I am reading these from a file and then input them into a data base each week. As you can guess there will be a lot of records. I have calculated close to 106 million .coms and 14 million .net (estimated) now in order to prevent duplicate records for domains I put a unique constraint on the domain name column and a second TLDid. When ever I do a update each week the inserts take 5-6 days. On the initial build with no data I got regular insert speeds until I started the inserts at 25 million and then it really started slowing.
I changed my innodb_buffer_pool_size=6000M with out much change. I was able to do inserts up to 45 million before it started to slow at around 3 hours.
I have read a lot of performance articles and changed more settings:
innodb_thread_concurrency=18
innodb_lock_wait_timeout = 50
innodb_file_per_table = 1
innodb_read_io_threads=3000 (Defaults to 64)
innodb_write_io_threads=7000 (Defaults to 64)
innodb_io_capacity=10000
innodb_flush_log_at_trx_commit = 2
I still am getting slow inserts:
Here is what the table looks like:
-- Dumping structure for table domains.domains
CREATE TABLE IF NOT EXISTS `domains` (
`DomainID` bigint(19) unsigned NOT NULL AUTO_INCREMENT,
`DomainName` varchar(100) DEFAULT NULL,
`TLDid` int(5) unsigned DEFAULT '1',
`FirstSeen` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`LastUpdated` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`DomainID`),
UNIQUE KEY `UNIQUE DOMAIN INDEX` (`TLDid`,`DomainName`),
KEY `TIMESTAMP INDEX` (`LastUpdated`,`FirstSeen`),
KEY `TLD INDEX` (`TLDid`),
KEY `DOMAIN NAME INDEX` (`DomainName`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
TLDid is either a 1 or a 2 but it will represent the extension of the domain for example "Test.com" will be stored as DomainName: Test TLDid 1. "Test.net" will be stored as DomainName: Test TLDid: 2.
My question is how can I optimize this table with 130 million plus records with 2 unique constraints that need to be check before inserts so that it doesn't slow down the table to take 14 days to update new and current records?
Thanks Guys
See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.
I repeatedly need to execute these two queries:
UPDATE `tree`
SET `rght` = `rght` + 2
WHERE `rght` > currentLeft;
UPDATE `tree`
SET `lft` = `lft` + 2
WHERE `lft` > currentLeft;
The table is defined as such:
CREATE TABLE `tree` (
`id` char(36) NOT NULL DEFAULT '',
`parent_id` char(36) DEFAULT NULL,
`lft` int(11) unsigned DEFAULT NULL,
`rght` int(11) unsigned DEFAULT NULL,
... (a couple of more columns) ...,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `lft` (`lft`),
KEY `rght` (`rght`),
... (a few more indexes) ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.
Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.
Any suggestion appreciated.
I never used it, but if your have enough memory, try the memory table.
Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.
Expanding on some ideas from comment as requested:
The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:
http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/
You're updating indexed columns - indexes negatively impact (read: slow down) INSERT/UPDATEs.
If this is a one time need to get things correct:
Drop/delete the indexes on the columns being updated (lft, rght)
Run the update statements
Re-create the indexes (this can take time, possibly equivalent to what you already experience in total)