Slow INSERT INTO SELECT statement – not repeatable - mysql

Short Version:
INSERT INTO SELECT into a table with fulltext index with the same data takes sometimes 25 seconds and sometimes 2500 seconds. We have no idea where this huge gap is coming from.
Long Version:
I have a problem with a cronjob which imports new data into an import table and copies this data via an INSERT INTO SELECT statement to the production tables. I have split the tables because of the time an update takes with a mysql fulltext index – it seems to be faster to insert the data into this table with one INSERT INTO SELECT than with many single insert statements
The cron to import new data is running every 5 minutes. There is a function that checks if an instance of the cron is running to disallow parallel running of the script. Usually there are about 500 new records with every cron call.
In the night at 1-2 am there are a lot of more new data (about 5.000 – 15.000 new records) and the cron is running much more longer than 5 minutes.
When the cron is running long in the night and while tracking the performance of these queries I detected that the performance of the INSERT INTO SELECT statement is very (!) slow. To copy about 15.000 new records (with a filesize of about 30 MB) the query takes more than 2.500 seconds!
The query is:
INSERT IGNORE INTO mentiondata
SELECT * FROM mentionimport
WHERE id <= 1203780;
I profile the query and with the following result:
2012-10-31 06:52:06 Queryprofile: {
"starting":"0.000036",
"checking permissions":"0.000003",
"Opening tables":"0.000132",
"System lock":"0.000003",
"Table lock":"0.000007",
"init":"0.000041",
"optimizing":"0.000007",
"statistics":"0.000023",
"preparing":"0.000005",
"executing":"0.000002",
"Sending data":"999.999999",
"end":"0.000017",
"query end":"0.000005",
"freeing items":"1.458159",
"logging slow query":"0.000050",
"cleaning up":"0.000007"}
In the process-list the sending-data was over 2.500 – in the profile it is just 999.999999. Maybe this is the profiler-limit – whatever…
The really strange thing is: When I try to reproduce the problem via deleting the records from the fulltext-table (DELETE FROM mentiondata WHERE id >= 1203780;) and starting the copy process manually it just takes about 25 seconds!!!
So I don’t get it and I really need help! I don’t understand why there is such a performance-difference between the same query! I checked the mysql-processlist while the cron-copy-statement was running – there are no other queries which lock tables or something. There’s just the single copy-query in the processlist – and a “sending data” with more than 2.500 seconds. There is no other cron or any other tasks which influence the performance of the server running. It seems that the mysql-server slows down every night or that the sql-query takes an extreme long time when the connection was opened a long time before the insert statement takes place (to insert the data into the copy import tables).
Are there any status-variables I can check why mysql is so slow? Is there a possibility to check why these queries are so slow? Here some server-variables for info:
bulk_insert_buffer_size: 268435456
key_buffer_size: 536870912
query_cache_size: 536870912
Thanks for any help!
Timo

Related

Speed up select distinct process from very large table

I want to use select distinct on a single variable to extract data from a very large MyISAM table with ~300 million rows (~ 12.3 GiBs in size -- select distinct should yield ~100k observations, so much smaller than 1 GiB).
The problem is, this query takes 10+ hours to run. I actually don't know how long it takes because I've never finished the process due to impatience.
My query is as follows:
create table codebook(
symbol varchar(16) not null);
create index IDXcodebook on codebook(symbol);
insert into codebook
select distinct(symbol) from bigboytable
I've tried an indexon bigboytable(symbol) to speed up the process, but I have ran that indexing code for 15+ hours with no end in sight.
I've also tried:
SELECT symbol from bigboytable, GROUP BY symbol
But I get
Error Code: 2013. Lost connection to MySQL server during query
in fact, if any query, in this project or in other projects, is "too complicated", I get Error Code 2013 after only ~1-6+ hours, depending.
Other settings are:
Migration connection timeout (3600); DBS connection read timeout skipped; DBMS connection keep-alive interval (5 seconds); SSH BufferSize (10240 bytes); SSH connect, read write, and command timeouts (500 seconds);.
Any suggestions? I might work with Python's MySQL packages if that might speed things up; Workbench is very slow. I need this data ASAP for a large project, but don't need the 300+ million observations from bigboytable.
Edit: I attach my bigboytable definition and explain output here.

MySQL keeps losing connection when trying to make a query

I have a table with the following contents in MySQL:
I am to query a DATETIME column called 'trade_time' with a where clause as follows:
SELECT * FROM tick_data.AAPL
WHERE trade_time between '2021-01-01 09:30:00' and '2021-01-01 16:00:00';
What I'm getting is a 2013 error: lost connection to MySQL server after about 30 seconds.
I'm pretty new to SQL so I'm pretty sure I might be doing something wrong here, surely such a simple query shouldn't take longer than 30 seconds?
The data has 298M rows, which is huge, I was under the impression that MySQL should handle this kind of operations.
The table has just 3 columns, which is trade_time, price and volume, I would just want to query data by dates and times in a reasonable time for further processing in Python.
Thanks for any advice.
EDIT: I've put up the timeout limit on MySQL Workbench to 5 minutes, the query described above took 291 seconds to run, just to get 1 day of data, is there some way I can speed up the performance?
298M rows is a lot to go through. I can definitely see that taking more than 30 seconds, but not much more. First, thing I would do is remove your default disconnection time limit. Personally I always make mine around 300 seconds or 5 min. If you're using mysql workbench that can be done via this method: MySQL Workbench: How to keep the connection alive
Also, I would try and check to see if the trade_time column has an index on it. Having your column that you query often indexed is a good strategy to make queries faster.
SHOW INDEX FROM tablename;
Look to see if trade_time is in the list. If not, you can create an index like so:
CREATE INDEX dateTime ON tablename (trade_time);

MySQL long time of insert queries

I have a big sql dump ~1,3 million of rows.
I try to import it through mysql console this way:
source mysql_dump.sql
It goes well on start. It creates new table and etc., but after some time of inserting queries, it takes more and more time to proceed insertion queries.
E.g. every ~1700 records console outputs the results and time consumption for this stack of queries. In the beginning to do ~1700 mysql spends ~0.3 sec. After 5 minutes it takes ~ 1 minute.
What can be done, to make it proceed queries that faster, as in the beginning?
This is a bit long for a comment.
One possibility is indexes. You should drop all the indexes on the table before inserting records. Then add the indexes after all the data is in the table. Index insertions can slow down inserts.
Second, if you want to save all the data in a table, it is better to reload it using load data infile.
When you do so many inserts, do a commit after every 1000 recs.

MySQL High Insertion Rate for Bulk Inserts (InnoDB)

I recently conducted a mysqlslap stress test to determine which type of insert is best from a performance standpoint. The files containing the queries were stored on a linux system and read from it. I hope someone could through some insight on this. Following are the benchmark obtained through mysqlslap test:
Time taken for Individual/Separate Insert: 5.24 seconds
Time taken for Batch Insert: 2.77 seconds
Time taken for Bulk Insert: 0.52 seconds
MySQL Version: 5.6.14 (InnoDB)
Total Records inserted on empty table: 5000
The records put together has size < 10kb
Innodb_buffer_pool_pages_flushed is disabled
Query cache turned off
Does it have to do anything with parsing, Created_tmp_tables or Innodb_buffer_pool_pages_flushed or any global variables?

Doctrine query optimizations

I have created a query using doctrine query builder which inserts almost 65000 rows(including all 3 tables) to 3 different tables at a time when something is performed.And for this complete process to finish it takes almost 2-3 mins to execute .
What i have done is persist records in loops and then flush finally.
So is there any ways that will minimize my execution time and inserts data within seconds.
No, unfortunately Doctrine doesn't support grouping inserts into a single statement. If you need to do bulk inserts, one possibility is doing a $em->flush() and $em->clear() after every 100th or so row, see the manual's recommendation:
https://doctrine-orm.readthedocs.org/en/latest/reference/batch-processing.html