Slow INSERT .. ON DUPLICATE KEY UPDATE query with InnoDB - mysql

Basically I am monitoring slowest query on a website. It turns out they are something like:
INSERT INTO beststat (bestid,period,rawView) VALUES ( 'idX' , 2012 , 1 )
ON DUPLICATE KEY UPDATE rawView = rawView+1
Basically it's a logging table. If the row is already there it updates rawView with a +1
beststat is InnoDB so I have row-level locking and consindering I do a lot of inserts-updates it should be faster than MyISAM.
Anyway that query shouldn't take so long, maybe there is something else wrong. What it could be ?
Of course I have an Unique Index on bestid, period
Additional Info
This table (beststat) currently has ~1mil record and its size is: 68MB. I have 4GB RAM and innodb buffer pool size = 104,857,600. Mysql: 5.1.49-3
CREATE TABLE `beststat` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`bestid` int(11) unsigned NOT NULL,
`period` mediumint(8) unsigned NOT NULL,
`view` mediumint(8) unsigned NOT NULL DEFAULT '0',
`rawView` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `bestid` (`bestid`,`period`)
) ENGINE=InnoDB AUTO_INCREMENT=2020577 DEFAULT CHARSET=utf8
Notice to faster thing a litte bit i could do somethijng like:
UPDATE beststat SET rawView = rawView + 1 WHERE bestid = idX AND period = 2012;
if (mysql_affected_rows()==0)
INSERT INTO beststat (bestid,period,rawView) VALUES ('idX',2012,1)
So most of time i would run only the first query UPDATE. But I would like to understand why the first, more concise, query is slow.
I found this interesting article... still reading

dealing with big # of rows, i suggest to use load date infile to make query faster.
To further improve the query time, you can consider using memory table as well.

Related

Mysql partitioning effect on DDL and DML

I am using Mysql 5.6 with ~150 million records in Transaction table (InnodB). As the size is increasing this table is becoming unmanageable (adding column or index) and slow even with required indexing. After searching through internet I found it is appropriate time to partition the table. I am confidant that partitioning will solve following purpose for me
Improve DML statements response time (using partitioning pruning)
Improve archival process
But I am not sure wether (and how) it will improve DDL performance for this table or not. More specifically following DDL's performance.
ALTER TABLE ADD/DROP COLUMN
ALTER TABLE ADD/DROP INDEX
I went through Mysql documentation and internet but unable to find my answer. Can anyone please help me in this or provide any relevant documentation for this.
My table structure is as following
CREATE TABLE `TRANSACTION` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`parent_uuid` char(36) DEFAULT NULL,
`order_number` varchar(64) DEFAULT NULL,
`order_id` int(11) DEFAULT NULL,
`order_uuid` char(36) DEFAULT NULL,
`order_type` char(1) DEFAULT NULL,
`business_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`store_device_id` int(11) DEFAULT NULL,
`source` char(1) DEFAULT NULL COMMENT 'instore, online, order_ahead, etc',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`customer_lang` char(2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `business_id` (`business_id`,`store_id`,`store_device_id`),
KEY `parent_uuid` (`parent_uuid`),
KEY `order_uuid` (`order_uuid`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
And I am partitioning using following statement.
ALTER TABLE TRANSACTION PARTITION BY RANGE (id)
(PARTITION p0 VALUES LESS THAN (5000000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (10000000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Thanks!
Partitioning is not a performance panacea. Even the items you mentioned will not speed up; they may even slow down.
Instead, I will critique the table to look for ways to speed up some things.
UUIDs are terrible for performance once the index on it becomes too big to be cached. This is because of its randomness. Possible solutions: compact it into BINARY(16); shrink the table other ways; avoid UUIDs.
Why have both parent_id and parent_uuid??
Shrink the 4-byte INTs to smaller datatypes where practical.
Usually CHAR should be CHARACTER SET ascii (1-byte/character), not utf8mb4 (4 bytes/char).
Caution: 150M is getting remotely close to the 2-billion limit of INT SIGNED. Consider 4B limit of INT UNSIGNED. (Each is 4 bytes.)
Do you ever use created_at or updated_at?
MySQL 8.0.13 has a very fast ADD COLUMN and DROP COLUMN (for limited situations).
5.7.?? has a less-invasive ADD INDEX than previous versions, but I am not sure it applies to partitioned tables.
5.7.4: Online DDL support reduces table rebuild time and permits concurrent DML, which helps reduce user application downtime. For additional information, see Overview of Online DDL.
More importantly, let's see the main queries that are "too slow". There may be composite indexes and/or reformulations of the queries that will speed them up.
There is even a slim chance that partitioning will help but not on the PRIMARY KEY.
I think there are only 4 use cases where partitioning helps performance.

UPDATE by PRIMARY KEY query is too slow on big table

I have a MyISAM table (on a Mariadb) with 7 millions rows in it.
CREATE TABLE `mytable` (
`id` bigint(100) unsigned NOT NULL AUTO_INCREMENT,
`x` int(5) unsigned NOT NULL DEFAULT '0',
`y` int(5) unsigned NOT NULL DEFAULT '0',
`value` int(5) unsigned NOT NULL DEFAULT '0'
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=10152508 DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1
When i do
SELECT * FROM mytable WHERE id = 167880;
it takes around 0.272 sec
When i do
UPDATE mytable SET value = 1 WHERE id = 167880;
it takes randomly from 0.200 to 2.5 sec
I was thinking it's because my table have a lot of rows, but still, it shouldn't take that much time to update a row by it's primary key.
Since i did some researchs before posting, here are the checks i've already done :
No duplicate indexes
No others indexes than the primary key "id"
No triggers
Tried to switch to innoDB engine, it was worse (around 6 sec for an update)
Tried to switch to aria engine, it's even worse
Already did OPTIMIZE TABLE;
Config is the default config of last version of Mariadb (fresh install)
Made all theses check while the db was not used by anything else, so no heavy readings during the tests
I think that the problem is the data type you are using for id column.
Using INT rather then BIGINT can make a significant reduction in disk space.
Read this article instead.
http://ronaldbradford.com/blog/bigint-v-int-is-there-a-big-deal-2008-07-18/
Hope it helps

Simple count id in MySql table is taking to long

I have to tables with 65.5 Million rows:
1)
CREATE TABLE RawData1 (
cdasite varchar(45) COLLATE utf8_unicode_ci NOT NULL,
id int(20) NOT NULL DEFAULT '0',
timedate datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
type int(11) NOT NULL DEFAULT '0',
status int(11) NOT NULL DEFAULT '0',
branch_id int(20) DEFAULT NULL,
branch_idString varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (id,cdasite,timedate),
KEY idx_timedate (timedate,cdasite)
) ENGINE=InnoDB;
2)
Same table with partition (call it RawData2)
PARTITION BY RANGE ( TO_DAYS(timedate))
(PARTITION p20140101 VALUES LESS THAN (735599) ENGINE = InnoDB,
PARTITION p20140401 VALUES LESS THAN (735689) ENGINE = InnoDB,
.
.
PARTITION p20201001 VALUES LESS THAN (738064) ENGINE = InnoDB,
PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB);
I'm using the same query:
SELECT count(id) FROM RawData1
where timedate BETWEEN DATE_FORMAT(date_sub(now(),INTERVAL 2 YEAR),'%Y-%m-01') AND now();
2 problems:
1. why the partitioned table runs longer then the regular table?
2. the regular table returns 36380217 in 17.094 Sec. is it normal, all R&D leaders think it is not fast enough, it need to return in ~2 Sec.
What do I need to check / do / change ?
Is it realistic to scan 35732495 rows and retrieve 36380217 in less then 3-4 sec?
You have found one example of why PARTITIONing is not a performance panacea.
Where does id come from?
How many different values are there for cdasite? If thousands, not millions, build a table mapping cdasite <=> id and switch from a bulky VARCHAR(45) to a MEDIUMINT UNSIGNED (or whatever is appropriate). This item may help the most, but perhaps not enough.
Ditto for status, but probably using TINYINT UNSIGNED. Or think about ENUM. Either is 1 byte, not 4.
The (20) on INT(20) means nothing. You get a 4-byte integer with a limit of about 2 billion.
Are you sure there are no duplicate timedates?
branch_id and branch_idString -- this smells like a pair that needs to be in another table, leaving only the id here?
Smaller -> faster.
COUNT(*) is the same as COUNT(id) since id is NOT NULL.
Do not include future partitions before they are needed; it slows things down. (And don't use partitioning at all.)
To get that query even faster, build and maintain a Summary Table. It would have at least a DATE in the PRIMARY KEY and at least COUNT(*) as a column. Then the query would fetch from that table. More on Summary tables: http://mysql.rjweb.org/doc.php/summarytables

Mysql IO write too much

I have a table that uses myisam engine on my server. There are 10 update statements per second on average. I found that the mysql process disk write a lot higher than the theoretical value. After experimenting, I suspect that modifying any column of data would rewrite the entire row of data. The following is an experiment...
My table:
CREATE TABLE `test_update` (
`id` int(11) NOT NULL DEFAULT '0',
`str1` blob,
`str2` blob,
`str3` blob,
`update_time` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `update_time` (`update_time`)
) ENGINE=MyISAM;
I inserted 100000 rows data,each row has 30k string(10k per blob).After that I randomly update ‘update_time’ column 1 row/sec
while 1:
sql = "update test_update set update_time=%d where id=%d" %(now, randomid)
cur.execute(sql)
conn.commit()
slp_t = 1-(time.time()-end)
if slp_t>0:
time.sleep(slp_t)
end=time.time()
and iotop shows:
https://i.stack.imgur.com/sJa8y.png
It seems like modifying an int column would rewrite the entire row(even more). Is that true? If the answer is yes, why was it designed like this? what should i do to avoid this waste?

Optimize MySQL count query with JOIN

I have a query that takes about 20 seconds, I would like to understand if there is a way to optimize it.
Table 1:
CREATE TABLE IF NOT EXISTS `sessions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9845765 ;
And table 2:
CREATE TABLE IF NOT EXISTS `access` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`session_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `session_id ` (`session_id `)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9467799 ;
Now, what I am trying to do is to count all the access connected to all sessions about one user, so my query is:
SELECT COUNT(*)
FROM access
INNER JOIN sessions ON access.session_id=session.id
WHERE session.user_id='6';
It takes almost 20 seconds...and for user_id 6 there are about 3 millions sessions stored.
There is anything I can do to optimize that query?
Change this line from the session table:
KEY `user_id` (`user_id`)
To this:
KEY `user_id` (`user_id`, `id`)
What this will do for you is allow you to complete the query from the index, without going back to the raw table. As it is, you need to do an index scan on the session table for your user_id, and for each item go back to the table to find the id for the join to the access table. By including the id in the index, you can skip going back to the table.
Sadly, this will make your inserts slower into that table, and it seems like this may be a bid deal, given just one user has 3 millions sessions. Sql Server and Oracle would address this by allowing you to include the id column in your index, without actually indexing on it, saving a little work at insert time, and also by allowing you specify a lower fill factor for the index, reducing the need to re-build or re-order the indexes at insert, but MySql doesn't support these.