Why TokuDB and InnoDB insert is so slow compared to MyISAM - mysql

I have prepared the following SQL statements to compare the performance behavior of MyISAM, InnoDB, and TokuDB (INSERT is executed for 100000 times):
MyISAM:
CREATE TABLE `testtable_myisam` (`id` bigint(20) NOT NULL AUTO_INCREMENT, `value1` INT DEFAULT NULL, `value2` INT DEFAULT NULL, PRIMARY KEY (`id`), KEY `index1` (`value1`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `testtable_myisam` (`value1`, `value2`) VALUES (FLOOR(RAND() * 1000), FLOOR(RAND() * 1000));
InnoDB:
CREATE TABLE `testtable_innodb` (`id` bigint(20) NOT NULL AUTO_INCREMENT, `value1` INT DEFAULT NULL, `value2` INT DEFAULT NULL, PRIMARY KEY (`id`), KEY `index1` (`value1`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `testtable_innodb` (`value1`, `value2`) VALUES (FLOOR(RAND() * 1000), FLOOR(RAND() * 1000));
TokuDB:
CREATE TABLE `testtable_tokudb` (`id` bigint(20) NOT NULL AUTO_INCREMENT, `value1` INT DEFAULT NULL, `value2` INT DEFAULT NULL, PRIMARY KEY (`id`), KEY `index1` (`value1`)) ENGINE=TokuDB DEFAULT CHARSET=utf8;
INSERT INTO `testtable_tokudb` (`value1`, `value2`) VALUES (FLOOR(RAND() * 1000), FLOOR(RAND() * 1000));
At the beginning, the INSERT performance of InnoDB is almost 50 times slower than MyISAM, and TokuDB is 40 times slower than MyISAM.
Then I figure out the setting of "innodb-flush-log-at-trx-commit=2" on InnoDB, to make its INSERT behavior similar with MyISAM.
The question is, what should I do on the TokuDB? I bet the poor INSERT performance of TokuDB is also caused by some inproper setting, but I cannot figure out the reason.
--------- UPDATE ---------
Thanks to tmcallaghan's comments, I have modified my setting into "tokudb_commit_sync=OFF", now the insert rate of TokuDB on small dataset seems to be meaningful (I will execute them on large dataset once I figure out the following problem):
However, the select performance of TokuDB is still wired compared to MyISAM and InnoDB with following SQL (wherein the ? is replaced by a different Int by my simulator):
SELECT id, value1, value2 FROM testtable_myisam WHERE value1=?;
SELECT id, value1, value2 FROM testtable_innodb WHERE value1=?;
SELECT id, value1, value2 FROM testtable_tokudb WHERE value1=?;
Upon a million dataset, each 10k SELECT statments cost 10 and 15 seconds by MyISAM and InnoDB individually, but TokuDB requires about 40 seconds.
Did I miss some other settings?
Thanks in advance!

This doesn't sound like a very interesting test (100,000 rows is not a lot, and your insertions are not concurrent), but here is the setting you are looking for.
Issuing "set tokudb_commit_sync=0;" will turn off fsync() on commit operations. Note that there are no durability guarantees in this mode.
As I mentioned prior, TokuDB's strength is indexing data that is significantly larger than RAM, and this test is not.

The reason why transactional engines are slower is because they force the hard disk to confirm it wrote the data down. For the HDD to write data down, it has to position the head over the magnetic disk plate and stream the data. Each transaction means that the disk will position the magnetic needle over the head, write the data down and tell the OS that it's there for sure.
The reason transactional engines do that is so they can conform to the D part of ACID. They ensure you that data you wanted to be written down, is, in fact, written down permanently. MyISAM doesn't do that.
Thus, the speed of insert is proportional to the number of Input Output Operations per Second (IOPS) of the hard disk.
That also means, if you wrap several queries in one transaction, you can exploit the write speed bandwith of the mentioned drives.
Also, that implies that drives with high IOPS (SSD for example, have 40+ thousand IOPS and mechanical ones range at about 250 - 300, but don't take my word for exact numbers).
Long story short, if you want really fast inserts using transactional engines - wrap multiple queries in a single transaction.
All the "optimizations" you do are slightly violating the D part of ACID, because the engines will try to exploit various fast memories lying around that can be used as buffers. That means, if something goes wrong, such as you lose power - kiss your data goodbye.
Also, the tests conducted by you are actually bad because they're on small scale. Both InnoDB and especially TokuDB are designed to contain hundreds of gigabytes of data and to offer linear performance.

I have updated my.cnf into something below, now the overall performance looks better.
For 10k times of SELECT from MyISAM, it takes 4 seconds, whereby InnoDB takes 5 seconds, and TokuDB takes 8 seconds. So can I conclude under below configuration, TokuDB is behaving similar (even not necessary better) with MyISAM and InnoDB.
Indeed, I am curious about tons of performance comparison between InnoDB v.s. TokuDB, but not MyISAM v.s. TokuDB, why?
tokudb_commit_sync=0
max_allowed_packet = 1M
table_open_cache = 128
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
thread_concurrency = 8
innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size = 2G
innodb_additional_mem_pool_size = 20M
innodb_log_buffer_size = 8M
innodb_lock_wait_timeout = 50

Related

MySQL: Denormalization on many tables performance

I need to decide which is better for performance:
A) Retrieving data from one of ~100 similar tables LEFT JOIN 1 table.
B) Retrieving data from one of ~100 similar tables after denormalizing 1 table that I joined in A).
I'm curious if denormalizing this one table pays off in SELECT performance, since I'm creating a lot more columns in database - similar tables will have 3-15(let's say 8) columns and the table to denormalize will have ~6columns.
So in variant A) I got 100 tables * 8 columns + 1 table * 6 columns = 806 columns.
In variant B) I got 100 tables * (8 columns + 6 columns) = 1400 columns.
So which is better when we're not looking at disk space, only focusing on performance?
-------------EDIT-----------------
As Rick James asked for SHOW CREATE TABLE - the competing one :
CREATE TABLE `ItemsGeneral` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`description` text COLLATE utf8mb4_unicode_ci NOT NULL,
`datePosted` datetime NOT NULL,
`dateEnds` datetime NOT NULL,
`photos` tinyint(3) unsigned NOT NULL,
`userId` int(10) unsigned NOT NULL,
`locationSimple` point NOT NULL,
`locationPrecise` point NOT NULL,
PRIMARY KEY (`id`),
SPATIAL KEY `locationPrecise` (`locationPrecise`),
SPATIAL KEY `locationSimple` (`locationSimple`),
KEY `userId` (`userId`),
KEY `dateEnds` (`dateEnds`)
) ENGINE=MyISAM AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
And all the other ~100s tables will have tiny/small/medium ints.
The implication of your question is that you already threw normalization out the window from the start.
In terms of performance, a standard rule of thumb in terms of result speed (I'll use > to mean faster than here)
cached data > data cached query > query on SSD > query on magneto optical disk
Another rule of thumb for database performance is that optimization of the size of the data set is directly related to performance against a given resource.
Now in terms of joins, there is a price to pay for a simple keyed join, but since these types of queries are typically measured in milliseconds, that certainly isn't a reason to denormalize lots of data and in the process blow up your dataset 20%, especially if you might need to update all that data.
Normalization is simply a cost for having atomic accurate data but it also helps keep your dataset to an optimal size.
Just as an example, if you have a mysql server, and you are using InnoDB, AND you have properly allocated memory on the server to your InnoDB cache, you can often see an extremely high cache hit ratio, where the queries are coming straight out of ram. At that point the fact that more of your database can be in cache due to the size of the dataset is more important than the fact that you joined 2 tables together.
I find on the regular, that people who setup mysql but aren't experts in it, aren't aware either that the size of their dataset could fit entirely in cache (were they to allocate it), or that they haven't even changed any of the default values and have essentially almost no allocation to cache.
Just to be clear, this involves configuration of the innodb_buffer_pool_size and innodb_buffer_pool_instances.

Why is myisam slower than Innodb

There are several Q&A for "Why is InnoDB (much) slower than MyISAM", but I could not find any topic for the opposite.
So I had a table defined as InnoDB wherin I stored file contents in a blob field. Because normally for that MyISAM should be used I switched over that table. Here is its structure:
CREATE TABLE `liv_fx_files_files` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filedata` longblob NOT NULL,
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filesize` int(11) NOT NULL,
`context` varchar(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
There are 4549 records stored in it so far (with filedata going from 0 to 48M. Sum of all files is about 6G.
So whenever I need to know current total filesize I issue the query
SELECT SUM(filesize) FROM liv_fx_files_files;
The problem is that since I switched from InnoDB to MyISAM this simple query lasts really long (about 30sec and longer) whereas on InnoDB it was done in a under one second.
But aggregations are not the only queries which are very slow; it's almost every query.
I guess I could fix it by adopting config (which is currently optimized for InnoDB (only) use), but don't know which settings to adjust. Does anyone have a hint for me please?
current mysql server config (SHOW VARIABLES as csv)
Example for another query fired on both table types (both contain exact same data and have same definition). All other tested queries behave the same, say run much longer against MyISAM table as InnoDB!
SELECT sql_no_cache `fxfilefile`.`filename` AS `filename` FROM `myisamtable`|`innodbtable` AS `fxfilefile` WHERE `fxfilefile`.`filename` LIKE '%foo%';
Executive Summary: Use InnoDB, and change the my.cnf settings accordingly.
Details:
"MyISAM is faster" -- This is an old wives' tale. Today, InnoDB is faster in most situations.
Assuming you have at least 4GB of RAM...
If all-MyISAM, key_buffer_size should be about 20% of RAM; innodb_buffer_pool_size should be 0.
If all-InnoDB, key_buffer_size should be, say, only 20MB; innodb_buffer_pool_size should be about 70% of RAM.
If a mixture, do something in between. More discussion.
Let's look at how things are handled differently by the two Engines.
MyISAM puts the entire BLOB 'inline' with the other columns.
InnoDB puts most or all of each blob in other blocks.
Conclusion:
A table scan in a MyISAM table spends a lot of time stepping over cow paddies; InnoDB is much faster if you don't touch the BLOB.
This makes InnoDB a clear winner for SELECT SUM(x) FROM tbl; when there is no index on x. With INDEX(x), either engine will be fast.
Because of the BLOB being inline, MyISAM has fragmentation issues if you update records in the table; InnoDB has much less fragmentation. This impacts all operations, making InnoDB the winner again.
The order of the columns in the CREATE TABLE has no impact on performance in either engine.
Because the BLOB dominates the size of each row, the tweaks to the other columns will have very little impact on performance.
If you decide to go with MyISAM, I would recommend a 'parallel' table ('vertical partitioning'). Put the BLOB and the id in it a separate table. This would help MyISAM come closer to InnoDB's model and performance, but would add complexity to your code.
For "point queries" (looking up a single row via an index), there won't be much difference in performance between the engines.
Your my.cnf seems antique; set-variable has not been necessary in a long time.
Try to edit your MySQL config file, usually /etc/mysql/my.cnf and use "huge" preset.
# The MySQL server
[mysqld]
port = 3306
socket = /var/run/mysqld/mysqld.sock
skip-locking
set-variable = key_buffer=384M
set-variable = max_allowed_packet=1M
set-variable = table_cache=512
set-variable = sort_buffer=2M
set-variable = record_buffer=2M
set-variable = thread_cache=8
# Try number of CPU's*2 for thread_concurrency
set-variable = thread_concurrency=8
set-variable = myisam_sort_buffer_size=64M
Certainly 30 seconds to read 4500 records is very slow. Assuming there is plenty of room for I/O caching then the first thing I would try is to change the order of the fields; if these are written to the table in the order they are declared the DBMS would need to seek to the end of each record before reading the size value (I'd also recommend capping the size of those vharchar(255) columns, and that varhar(1) NOT NULL should be CHAR).
CREATE TABLE `liv_fx_files_files2` (
`fid` int(11) NOT NULL AUTO_INCREMENT,
`filesize` int(11) NOT NULL,
`context` char(1) NOT NULL DEFAULT '',
`saveuser` varchar(32) NOT NULL,
`savetime` int(11) NOT NULL,
`_state` int(11) NOT NULL DEFAULT '0',
`filetype` varchar(255) NOT NULL,
`filename` varchar(255) NOT NULL,
`filedata` longblob NOT NULL,
PRIMARY KEY (`fid`),
KEY `_state` (`_state`)
) ENGINE=MyISAM AUTO_INCREMENT=4550 DEFAULT CHARSET=utf8;
INSERT INTO liv_fx_files_files2
(fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata)
SELECT fid, filesize, context, saveuser, savetime, _state, filetype, filename, filedata
FROM liv_fx_files_files;
But ideally I'd split the data and metadata into separate tables.

mysql repartitioned table much larger

I have a very large table on a mysql 5.6.10 instance (roughly 480 million rows).
The storage engine is InnoDB. (Table and DB Default).
The table was partitioned by hash of merchantId (bigint: a kind of client identifier) which helped when queries related to a single merchant. Due to significant performance degradation when queries spanned multiple merchants, I decided to repartition the table by Range on ACTION_DATE (the DATE that an activity occurred). Thinking I was being clever, I decided to add a few (5) new fields for future use (unused_varchar1 varchar(200), etc.), since the table is so large, adding new fields essentially requires a rebuild anyway, so why not...
I created the new table structure as _new, dumped the existing file to a secondary server using mysql dump. I then used an awk script to finesse the name and a few other details to fit the new table (change tableName to tableName_new), and started the load.
The existing table was approximately 430 GB. The text file similarly was about 403 GB. I was surprised therefore that the new table ended up taking about 840 GB!! (Based on the linux fize size of the .ibd files)
So, I have 2 basic questions, which really amount to why and what now...
I imagine that the new table is larger because the dump file was in the order of the previous partition (merchantId) while the load was inserting into the new partitioning (Activity date) creating a semi-random insertion order. The randomness led mysql to leave plenty of space (roughly 50%) in the pages for future insertions. (I'm a little fuzzy on the terminology here, having spent much more time in my career with Sql Server DBs than MySql Dbs...) I'm not able to find any internal statistics in mysql for space free per page. The INFORMATION_SCHEMA.TABLES DATA_FREE stat is an unconvincing 68MB.
If it helps these are the relevant stats from I_S.TABLES:
TABLE_TYPE: BASE TABLE
Engine: InnoDB
VERSION: 10
ROW_FORMAT: Compact
TABLE_ROWS: 488,094,271
AVG_ROW_LENGTH: 1,564
DATA_LENGTH: 763,509,358,592 (711 GB)
INDEX_LENGTH: 100,065,574,912 (93.19 GB)
DATA_FREE: 68,157,440 (0.06 GB)
I realize that that doesn't add up to 840 GB, but as I said, that was the size of the .ibd files which seems to be slightly different than the I_S.TABLES stats. Either way, it is significantly more than the text dump file.
I digress...
My question is whether my theory about whether the repartioning explains the roughly doubled size. Or is there another explanation? I think the extra columns (2 Bigint, 2 Varchar(200), 1 Date) are not the culprit since they are all null. My napkin calculation was that the additional columns would add < 9 GB. Likewise, one additional index on UID should be a relatively small addition.
The follow up question is what can I do now if I want to try to compact the table. (Server now only has about 385 GB free...)
If I repeated the procedure, dump to file, reload, this time in the current partition order, would I end up with a table more like the size of my original table ~430 GB?
Following are relevant parts of DDL.
OLD TABLE:
CREATE TABLE table_name (
`AUTO_SEQ` bigint(20) NOT NULL,
`MERCHANT_ID` bigint(20) NOT NULL,
`AFFILIATE_ID` bigint(20) DEFAULT NULL,
`PROGRAM_ID` bigint(20) NOT NULL,
`ACTION_DATE` date DEFAULT NULL,
`UID` varchar(128) DEFAULT NULL,
... additional columns ...
PRIMARY KEY (`AUTO_SEQ`,`MERCHANT_ID`,`PROGRAM_ID`),
KEY `oc_rpt_mpad_idx` (`MERCHANT_ID`,`PROGRAM_ID`,`ACTION_DATE`,`AFFILIATE_ID`),
KEY `oc_rpt_mapd` (`MERCHANT_ID`,`ACTION_DATE`),
KEY `oc_rpt_apda_idx` (`AFFILIATE_ID`,`PROGRAM_ID`,`ACTION_DATE`,`MERCHANT_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (merchant_id)
PARTITIONS 16 */
NEW TABLE:
CREATE TABLE `tableName_new` (
`AUTO_SEQ` bigint(20) NOT NULL,
`MERCHANT_ID` bigint(20) NOT NULL,
`AFFILIATE_ID` bigint(20) DEFAULT NULL,
`PROGRAM_ID` bigint(20) NOT NULL,
`ACTION_DATE` date NOT NULL DEFAULT '0000-00-00',
`UID` varchar(128) DEFAULT NULL,
... additional columns...
# NEW COLUMNS (ALL NULL)
`UNUSED_BIGINT1` bigint(20) DEFAULT NULL,
`UNUSED_BIGINT2` bigint(20) DEFAULT NULL,
`UNUSED_VARCHAR1` varchar(200) DEFAULT NULL,
`UNUSED_VARCHAR2` varchar(200) DEFAULT NULL,
`UNUSED_DATE1` date DEFAULT NULL,
PRIMARY KEY (`AUTO_SEQ`,`ACTION_DATE`),
KEY `oc_rpt_mpad_idx` (`MERCHANT_ID`,`PROGRAM_ID`,`ACTION_DATE`,`AFFILIATE_ID`),
KEY `oc_rpt_mapd` (`ACTION_DATE`),
KEY `oc_rpt_apda_idx` (`AFFILIATE_ID`,`PROGRAM_ID`,`ACTION_DATE`,`MERCHANT_ID`),
KEY `oc_uid` (`UID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE COLUMNS(ACTION_DATE)
(PARTITION p01 VALUES LESS THAN ('2012-01-01') ENGINE = InnoDB,
PARTITION p02 VALUES LESS THAN ('2012-04-01') ENGINE = InnoDB,
PARTITION p03 VALUES LESS THAN ('2012-07-01') ENGINE = InnoDB,
PARTITION p04 VALUES LESS THAN ('2012-10-01') ENGINE = InnoDB,
PARTITION p05 VALUES LESS THAN ('2013-01-01') ENGINE = InnoDB,
PARTITION p06 VALUES LESS THAN ('2013-04-01') ENGINE = InnoDB,
PARTITION p07 VALUES LESS THAN ('2013-07-01') ENGINE = InnoDB,
PARTITION p08 VALUES LESS THAN ('2013-10-01') ENGINE = InnoDB,
PARTITION p09 VALUES LESS THAN ('2014-01-01') ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN ('2014-04-01') ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN ('2014-07-01') ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN ('2014-10-01') ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN ('2015-01-01') ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN ('2015-04-01') ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN ('2015-07-01') ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN ('2015-10-01') ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN ('2016-01-01') ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN ('2016-04-01') ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN ('2016-07-01') ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN ('2016-10-01') ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN ('2017-01-01') ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN ('2017-04-01') ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN ('2017-07-01') ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN ('2017-10-01') ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN ('2018-01-01') ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN ('2018-04-01') ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN ('2018-07-01') ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN ('2018-10-01') ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN ('2019-01-01') ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */
adding new fields essentially requires a rebuild anyway, so why not
I predict you will regret it.
The existing table was approximately 430 GB.
According to size of .ibd? Or SHOW TABLE STATUS? Or the dump size, which would be bogus (see below).
it is significantly more than the text dump file
The lengths in TABLE STATUS include several flavors of overhead (BTree, free space, extra extents, etc), plus the indexes (which are not in the dump file).
Also, think about a BIGINT that contains 1234. The .ibd will 8 bytes plus some overhead; the dump will have 5 ('1234', plus a comma). That leads to my next point...
Are there really more than 4 billion merchants? merchant_id is BIGINT (8 bytes); INT UNSIGNED is only 4 bytes and allows 0..4 billion.
What's in uid? If it is some sort of UUID, it seems awfully long.
Do you happen to have the "stats from I_S.TABLES" from the old table?
So far, I have not addressed "whether the repartioning explains the roughly doubled size".
extra columns (2 Bigint, 2 Varchar(200), 1 Date)
That's about 29 bytes per row (15GB of Data_length), perhaps less since they are NULL.
You seem to be using the default ROW_FORMAT. I suspect this did not change in the conversion.
It is usually unwise to start an index with the "partition key" (merchant_id or action_date). This is because you are already "pruning" on that key; you are better off starting the index with something else. (Caveat: There are exceptions.)
Check the CHARACTER SET and datatype of the "additional columns". If something changed, that could be significant.
would I end up with a table more like the size of my original table ~430 GB?
Alas, until we figure out why it grew, I can't answer that question.
I'm more interested in whether random insertion vs. the partition (ACTION_DATE) would lead to wasted space / half empty pages.
I recommend you try the following experiment. Do not use optimize partition; see http://bugs.mysql.com/bug.php?id=42822 . Instead do this to defragment one partition (such as p02):
ALTER TABLE table_name REBUILD PARTITION p02;
You could do this SELECT before and after in order to see the change(s) to the PARTITIONs:
SELECT *
FROM information_schema.PARTITIONS
WHERE TABLE_SCHEMA = 'dbname' -- change as needed
AND TABLE_NAME = 'table_name' -- change as needed
ORDER BY PARTITION_ORDINAL_POSITION,
SUBPARTITION_ORDINAL_POSITION;
It's a generic query to get the table-status-like info for the partitions of one table.
If the REBUILD cuts the partition by about 50%, then we have the answer.
Generally, randomly inserting into a BTree should leave you with about 69% (not 50%) of the "full" size. Hence, I'm not 'expecting' this to be the solution/answer.

Does MOD(primarykey) on an InnoDB partitioned table cause fragmentation

I'm exporting a largeish table (1.5 billion rows) between servers. This is the table format.
CREATE TABLE IF NOT EXISTS `partitionedtable` (
`domainid` int(10) unsigned NOT NULL,
`instanceid` int(10) unsigned NOT NULL,
`urlid` int(10) unsigned NOT NULL,
`adjrankid` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`domainid`,`instanceid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (MOD(domainid,8192))
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (3) ENGINE = InnoDB
...
PARTITION p8191 VALUES LESS THAN (8192) ENGINE = InnoDB)
The data was exported to the new server in PK order and resulted in 8192 text files... which equated to around 200K records per file.
I'm simply iterating from 0 to 8191 importing the files into the new table.
LOAD DATA INFILE '/home/backup/rc/$i.tsv INTO TABLE partitionedtable PARTITION (p$i)
I'm thinking that each of these should only take a second to import, however they take around 6 seconds.
The spec of the server can be seen here.
http://www.ovh.co.uk/dedicated_servers/sp_32g.xml
There isn't much else going on in the server that'd bottleneck the process.
Could it be that partitioning by MOD() causes fragmentation? I was under the impression that there wouldnt be any fragmentation as each partition would be considered a separate table, and since data is inserted in PK order there'd be no fragmentation.
Added - probably useful... these settings were applied at the start of the batch.
SET autocommit=0;
SET foreign_key_checks=0;
SET sql_log_bin=0;
SET unique_checks=0;
A COMMIT is applied after every file.
The thread seems to spend the majority of its time in a System lock state, during LOAD DATA INFILE.
When I set up the server I mistakenly thought the open files limit was higher, though in reality it's sitting at 1024.
I've upped it to 16000 and rebooted the server, and it's running slightly quicker # 3 seconds (I was assuming the file opening/closing was causing the system lock status).
I also purged the bin logs.
Still seems a bit slow though.

Slow MySQL InnoDB Inserts and Updates

I am using magento and having a lot of slowness on the site. There is very, very light load on the server. I have verified cpu, disk i/o, and memory is light- less than 30% of available at all times. APC caching is enabled- I am using new relic to monitor the server and the issue is very clearly insert/updates.
I have isolated the slowness to all insert and update statements. SELECT is fast. Very simple insert / updates into tables take 2-3 seconds whether run from my application or the command line mysql.
Example:
UPDATE `index_process` SET `status` = 'working', `started_at` = '2012-02-10 19:08:31' WHERE (process_id='8');
This table has 9 rows, a primary key, and 1 index on it.
The slowness occurs with all insert / updates. I have run mysqltuner and everything looks good. Also, changed innodb_flush_log_at_trx_commit to 2.
The activity on this server is very light- it's a dv box with 1 GB RAM. I have magento installs that run 100x better with 5x the load on a similar setup.
I started logging all queries over 2 seconds and it seems to be all inserts and full text searches.
Anyone have suggestions?
Here is table structure:
CREATE TABLE IF NOT EXISTS `index_process` (
`process_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`indexer_code` varchar(32) NOT NULL,
`status` enum('pending','working','require_reindex') NOT NULL DEFAULT 'pending',
`started_at` datetime DEFAULT NULL,
`ended_at` datetime DEFAULT NULL,
`mode` enum('real_time','manual') NOT NULL DEFAULT 'real_time',
PRIMARY KEY (`process_id`),
UNIQUE KEY `IDX_CODE` (`indexer_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=10 ;
First: (process_id='8') - '8' is char/varchar, not int, so mysql convert value first.
On my system, I had long times (greater than one second) to update users.last_active_time.
The reason was that I had a few queries that long to perform. As I joined them for the users table. This resulted in blocking of the table to read. Death lock by SELECT.
I rewrote query from: JOIN to: sub-queries and porblem gone.