INSERT INTO SELECT takes long time on cluster - mysql

My mysql cluster: Ver 5.6.30-76.3-56 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel76.3, Revision aa929cb, WSREP version 25.16, wsrep_25.16)
I've a complicated sql query which inserts for about 36k rows into a table with this syntax:
INSERT INTO `sometable` (SELECT ...);
The select is a bit complicated but not slow (0.0023s) but the insert takes about 40-50s. The table is not in use when I'm inserting the rows.
My questions are:
Can I speed it up somehow?
The slow insert causes locking problems on the other tables (because of select)
This workflow is good or bad practice? Is there any better?
Thanks
UPDATE:
The table schema:
CREATE TABLE `sometable` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) unsigned DEFAULT NULL,
`a` varchar(255) DEFAULT NULL,
`b` smallint(6) unsigned DEFAULT NULL,
`c` smallint(6) unsigned DEFAULT NULL,
`d` smallint(6) unsigned DEFAULT NULL,
`e` smallint(6) unsigned DEFAULT NULL,
`f` varchar(255) DEFAULT '',
`country_id` int(10) unsigned DEFAULT NULL,
`city_id` int(10) unsigned DEFAULT NULL,
`g` smallint(6) unsigned DEFAULT NULL,
`h` smallint(6) unsigned DEFAULT NULL,
`i` smallint(6) unsigned DEFAULT NULL,
`j` smallint(6) unsigned DEFAULT NULL,
`k` smallint(6) unsigned DEFAULT NULL,
`l` varchar(3) DEFAULT NULL,
`m` varchar(3) DEFAULT NULL,
`n` text,
`o` varchar(255) DEFAULT NULL,
`p` varchar(32) DEFAULT NULL,
`q` varchar(32) DEFAULT NULL,
`r` varchar(32) DEFAULT NULL,
`s` time DEFAULT NULL,
`t` time DEFAULT NULL,
`u` text,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `country_id` (`country_id`),
KEY `city_id` (`city_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
UPDATE2:
When I try to run the query I get an error in some cases:
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
MY SOLUTION:
Here is my final solution if somebody interested in:
gist
The main problem was that while I fill mytable the other queries are stuck and the cluster had serious performance problems. In this solution I create a temporary table and fill it with data in "dirty read" mode, then I copy these data to mytable in chunks so it takes a bit more time but there is no performance problem and not stuck the queries.

A SELECT operation that returns a row of the length you describe every 64 nanoseconds is very fast. That's what 36 kilorows in 2.3 milliseconds works out to. It seems likely that your SELECT query timing doesn't account for the transport of the result set to the MySQL client. At any rate, using that performance as a comparison to an INSERT operation sets your expectations unreasonably high.
You might try issuing this command before starting your operation. It will allow your SELECT operation to proceed with fewer contentions with your application's traffic on the source tables for the SELECT. See here https://dev.mysql.com/doc/refman/5.7/en/set-transaction.html
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You might try a two step process, involving a temporary table. This will have the advantage of not having to update all the indexes in some_table at the same time as the SELECT operation. That operation will look like this.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
CREATE TEMPORARY TABLE insert_batch AS SELECT ... ;
INSERT INTO some_table SELECT * FROM insert_batch;
DROP TEMPORARY TABLE insert_batch;
You should understand that InnoDB posts your batch of insertions to your table as a single transaction. If you can do this in a way that handles about 500 rows at a time rather than 36K, you'll have more transactions, but they will be smaller. That's generally a way to get higher throughput.

If all else fails, this may be a viable solution. First, see http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
Load your corrections into a temp table (or non-replicated MyISAM table).
Loop through the temp table (using code similar to that link). Pick 100 rows at a time.
Do the INSERT ... SELECT ... of 100 rows in a separate transaction.
This technique may (or may not) take longer than 40-50s, but at least is much less likely to timeout or deadlock.
In general, avoid running any transaction that lasts longer than a few seconds. This link is somewhat generic on how to "chunk" lengthy (and repetitive) operations to avoid long transactions.

Related

Deadlocks happened when massive inserting executed simultaneously in mysql 5.0

I am using mysql 5.0 and I met some mysql deadlock problems when there were lots of inserts from different session at the same time (We estimated there would be a maximum about 900 insert statements executed in one second).
Here is the error I got:
1213, Deadlock found when trying to get lock; try restarting transaction
Here is one of my failure insert statement:
INSERT `cj_202203qmoh_prize_log` (`user_id`, `lottery_id`, `create_ip`, `flags`, `created_at`, `create_mac`)VALUES('388','58','???.???.???.???','0','2022-04-01 20:00:33','444937f4bc5d5aa8f4af3d96d31dbf61');
My table definition:
CREATE TABLE `cj_202203qmoh_prize_log` (
`id` int(10) unsigned NOT NULL auto_increment,
`user_id` int(10) unsigned NOT NULL,
`lottery_id` int(10) unsigned default NULL,
`code` int(11) default NULL,
`flags` int(10) unsigned default '0',
`create_ip` varchar(64) NOT NULL,
`create_mac` varchar(255) character set ascii NOT NULL,
`created_at` timestamp NOT NULL default '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY USING BTREE (`id`),
KEY `user_id` USING BTREE (`user_id`,`created_at`),
KEY `user_id_2` USING BTREE (`user_id`,`lottery_id`),
KEY `create_ip` USING BTREE (`create_ip`),
KEY `create_mac` USING BTREE (`create_mac`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
I didn't use transactions at all in my business. Except for the insert statement, the only possible (in fact it's extremely improbable to be executed simultaneously) sql that requires a x-lock executes at the same time is:
UPDATE `cj_202203qmoh_prize_log` SET `code` = ? WHERE `id` = ?;
There are several select statements using index 'user_id' or 'user_id_2' could be executed simultaneously, but they don't need a s-lock.
And same user_id could only be inserted in the same session.
According to my company's policy, I have no privileges to run SHOW ENGINE INNODB STATUS, so I am afraid I could not provide further information.
After I set the transaction level to READ COMMITTED, execute the statement in a transaction and drop both create_ip and create_mac indexes, it seemed this problem have not happened again. But I still couldn't figure out what caused the deadlock.

mysql 5.6.21 update statement is very slow when log_bin is on

my update statement is
update ptest set amount = amount - 2000 where id = 2
table ptest is
CREATE TABLE `ptest` (
`id` bigint(19) NOT NULL AUTO_INCREMENT,
`developerId` bigint(19) DEFAULT NULL,
`appId` bigint(19) DEFAULT NULL,
`caller` varchar(20) DEFAULT NULL,
`callerDisplay` varchar(20) DEFAULT NULL,
`called` varchar(20) DEFAULT NULL,
`calledDisplay` varchar(20) DEFAULT NULL,
`startTime` datetime DEFAULT NULL,
`endTime` datetime DEFAULT NULL,
`callTime` int(11) DEFAULT NULL,
`callId` varchar(32) NOT NULL ,
`billingTime` int(11) DEFAULT NULL,
`callResult` varchar(10) DEFAULT NULL,
`amount` bigint(20) DEFAULT NULL ,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=200001 DEFAULT CHARSET=utf8;
when system variable log_bin was set to: log_bin=mysql_bin, jmeter test result is 237.4 transaction/second .
when log_bin is comment out #log_bin=mysql_bin, jmeter test result is 3500.2 transaction/second .
on both setting the insert rate is similar, about 8000 transaction/second.
why log_bin has terrible performance impact on mysql?
how can I improve update performance when log_bin is turn on?
Short answer is no. You can't improve the performance while binary logging is on. I think this is one of the biggest tradeoffs in MySQL. The long answer comes from an article by Baron Schwartz, the lead author of the MySQL performance blog:
Enabling the binary log reduces MySQL’s performance dramatically. It
is not the logging itself that’s the problem — writing the log is
usually not much additional work. It’s ensuring consistency and
durability that is expensive. Flushing it to disk adds an fsync call
for every transaction. And the server performs an XA transaction
between InnoDB and the binary log. This adds more fsync calls, and
causes mutex contention, and prevents group commit, and probably other
things that aren’t coming to mind now.
You can read the full article here.

MySQL OUTER LEFT JOIN performance

I am updating an existing web-based inventory system that pulls data from a MySQL database. The main structures for the data stored are "items" and "tags" with a one-to-many relationship (items can have multiple corresponding tags)
The existing front-end system for the data is a Backbone.js app that pulls the entire datastore on login and manipulates that data in-memory, committing back to the database when necessary via a RESTful interface. (This is not how I would have designed the system, but it is now a common pattern in Backbone and Spine apps, and how most all of the tutorials and books teach these frameworks).
To serve the initial fetch performed by the front-end in which it captures the entire dataset (about 1000 items and 10,000 item tags at this point) the back-end performs a SELECT query for the items table, and then subsequent SELECT queries for tags table for each item fetched. Performance sucks, obviously. I thought this could be improved with an JOIN, figuring one select query is better than 1000. The following query fetches the data I need but takes over 15s to execute even on my local development server. What gives? Can we improve this system or query without setting up additional infrastructure like a caching key-value store?
SELECT items.*, itemtags.id as `tag_id`, itemtags.tag, itemtags.type
FROM items LEFT OUTER JOIN
itemtags
ON items.id = itemtags.item_id
ORDER BY items.id;
Here are the table structures:
CREATE TABLE `items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`num` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`length_inches` int(10) unsigned DEFAULT NULL,
`length_feet` int(10) unsigned DEFAULT NULL,
`width_inches` int(10) unsigned DEFAULT NULL,
`width_feet` int(10) unsigned DEFAULT NULL,
`height_inches` int(10) unsigned DEFAULT NULL,
`height_feet` int(10) unsigned DEFAULT NULL,
`depth_inches` int(10) unsigned DEFAULT NULL,
`depth_feet` int(10) unsigned DEFAULT NULL,
`retail_price` int(10) unsigned DEFAULT NULL,
`discount` int(10) unsigned DEFAULT NULL,
`decorator_price` int(10) unsigned DEFAULT NULL,
`new_price` int(10) unsigned DEFAULT NULL,
`sold` int(10) unsigned NOT NULL,
`push_date` int(10) unsigned DEFAULT NULL,
`updated` int(10) unsigned NOT NULL,
`created` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1747 DEFAULT CHARSET=latin1;
CREATE TABLE `itemtags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`item_id` int(10) unsigned NOT NULL,
`tag` varchar(100) NOT NULL,
`type` varchar(100) NOT NULL,
`created` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=61474 DEFAULT CHARSET=latin1;
I think you could use this:
SELECT *, a.id as `tag_id`, a.tag, a.type
FROM items LEFT OUTER JOIN
(SELECT id, item_id, tag, type from itemtags ORDER BY 1,2,3) a
ON items.id = a.item_id
ORDER BY items.id;
I didn't really change much, just the alias. a doesn't signify anything important.
I didn't fill the tables but your original query took 4ms, mine took 1ms.
http://sqlfiddle.com/#!2/b9551/6
Your application can pull the entire data store, irregardless of what you have in your data-set. As data store and data set are not synonymous.
You don't have any indexes either. You should put an index on ID, ITEM_ID in order to optimize the table to return results quicker. I created an index in my sub-query with the order by. Hope this helps.
In terms of performance, you are probably not comparing like-to-like.
The SQL query is completely doing the following things:
Joining the two tables together
Sorting the results by items.id
Returning all the results
Is the original version doing all three of these and waiting until they are completed?
My guess is that the original code is pulling the items back in the order you want them, and then only pulling the tags for a handful that are actually needed at any given time.
In addition, it is unclear how large the items.* data is. The way the query is formulated, you are pulling this about 10 times for each item -- potentially a much larger return set than the original data.
The real question is why you need all this information in the memory of the application. You have the database, just pull back what you need when you need it. Are you familiar with limit and offset -- these may be what you are really looking for.

Mysql partition indexing

I want to create a table from batch data for data mining purposes. I will have about 25 million rows of data a day going into this table. There are several indices defined on the table, so the insertion (I do batch insertions) speed is quite slow. With no indices I can stick 40K rows, while with indices it is more like 3-4 K, which makes this whole thing infeasible. So the idea is to partition the data by day, disable the keys, and then do the day's insertions, and reenable the indices. Reenabling indices on a day's worth of data takes, say, 20 minutes, which is fine. This takes me to my question. When you reenable the indices, will it have to recalculate the indices on all partions, or just for that day? It seems clear that for the index that the partitions are on (date in this case), it should be for that day only. But how about the other indices? If it needs to recalculate the indices for all partitions, there is no way it can be done in a reasonable amount of time. Does anyone know?
Show create is like this:
sts | CREATE TABLE `sts` (
`userid` int(10) unsigned DEFAULT NULL,
`urlid` int(10) unsigned DEFAULT NULL,
`geoid` mediumint(8) unsigned DEFAULT NULL,
`cid` mediumint(8) unsigned DEFAULT NULL,
`m` smallint(5) unsigned DEFAULT NULL,
`t` smallint(5) unsigned DEFAULT NULL,
`d` tinyint(3) unsigned DEFAULT NULL,
`requested` int(10) unsigned DEFAULT NULL,
`rate` tinyint(4) DEFAULT NULL,
`mode` varchar(12) DEFAULT NULL,
`session` smallint(5) unsigned DEFAULT NULL,
`sins` smallint(5) unsigned DEFAULT NULL,
`tos` mediumint(8) unsigned DEFAULT NULL,
PRIMARY KEY (userid, urlid, requested),
KEY `id_index` (`m`),
KEY `id_index2` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It is not currently partitioned.
You disable/enable index on a table. It means that index will be disabled/enables on all parts of the table.
Consider this scenario for loading new data:
Create a staging table defining all partitions you will need
Load data into staging table without indexes.
Create indexes on this table.
Move partition to target table, which is partitioned same as staging table.
Drop indexes on staging table
To partition your existing data in controllable manner you can use same logic to move data to new partitioned table.

Mysql Limit Performance

I have a large table in mysql, about 1 million records.
I'm using a dynamic query with different parameters in where clause and order, so i cant use some code like AND id > 34000 LIMIT 10
I have index on my fields in WHERE and LIMIT and ORDER but index doesn't help alone.
I need a better way than LIMIT 34000, 10, Is there any way to slove offset delay?
I put my table schema but i just copy more usable field without any index, because i'm using dynamic queries.
CREATE TABLE IF NOT EXISTS `p_apartmentbuy` (
`property_id` mediumint(8) unsigned NOT NULL,
`dateadd` int(10) unsigned NOT NULL,
`sqm` smallint(5) unsigned NOT NULL,
`sqmland` smallint(5) unsigned NOT NULL,
`age` tinyint(2) unsigned NOT NULL,
`price` bigint(12) unsigned NOT NULL,
`pricemeter` int(11) unsigned NOT NULL,
`floortotal` tinyint(3) unsigned NOT NULL,
`floorno` tinyint(3) unsigned NOT NULL,
`unittotal` smallint(4) unsigned NOT NULL,
`unitthisfloor` tinyint(3) unsigned NOT NULL,
`room` tinyint(1) unsigned NOT NULL,
`parking` tinyint(1) unsigned NOT NULL,
`renovate` tinyint(1) unsigned NOT NULL,
`address` varchar(255) COLLATE utf8_general_ci NOT NULL,
`describe` varchar(500) COLLATE utf8_general_ci NOT NULL,
`featured` tinyint(1) unsigned NOT NULL,
`l_location_id` smallint(5) unsigned NOT NULL,
`l_city_id` smallint(4) unsigned NOT NULL,
`pf_furnished_id` tinyint(2) unsigned NOT NULL,
PRIMARY KEY (`property_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
the problem with a table with 1 mill records wont be the AND id > 34000 LIMIT 10 or LIMIT 34000, 10 that will up to the Structure and the rest of the query. I.E, you need index, PK, FK to speed up the query, beside that an Order by probably will slow it down, make search like '%text%' it will make your query SLOW. Also it's up to the table's Engine
So don't expect that changing limit 10 will make a huge difference. There are a couple of tool that will help you to determinate a 'better' query, but not all queries works as the same so don't expect the "best solution" because it doesn't exists.
You can use Show create table or Describe select ...... or explain to see what's going on, or use the command benchmark to see the approximate time of a function that you are applying to improve it
EDIT:
Some tools for MySQL
I'll recommend you to take a look to this program that will help you with this part of performance.
Mysqlslap (it's like benchmark but you can customize more the result).
SysBench (test CPUperformance, I/O performance, mutex contention, memory speed, database performance).
Mysqltuner (with this you can analize general statistics, Storage engine Statistics, performance metrics).
mk-query-profiler (perform analysis of a SQL Statement).
mysqldumpslow (good to know witch queries are causing problems).
MySQL is able to optimize LIMIT clauses (i.e. only scan / evaluate the rows in the range specified by LIMIT) if it is able to use only indexes to find rows matching the query.
For queries like SELECT * FROM users WHERE active = 1 ORDER BY created_at, adding and index on (active, created_at) is enough.
See http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/