Many of us who are working on their home or pet projects and who use databases for storing structured data may encounter performance issues when trying to dump/restore data. It can annoying just to sit and wait for another dump restore operation for dozens of minutes or even for hours.
I have quite typical machine specs - 4 core i5 7300, 8 Gb RAM, quite fast M2 drive and Windows 10/MySQL 5.7.
The problem was that trying to restore ~4.5Gb file it took more than 4 hours. That was ridiculous and I wondered if mysqld process isn't using even a half of system resources - CPU/Memory/Disk I/O
Generally speaking, this post relates to some kind of summary of related issues including credits to many other posts which I put below
I performed a number of experiments with MySQL parameters for better dump restore operations
+--------------------------------+---------+---------+-----------------------+---------------------+
| Parameter | Default | Changed | Performance (minutes) | Perfomance gain (%) |
+--------------------------------+---------+---------+-----------------------+---------------------+
| All default | - | - | 259 min | - |
| innodb_buffer_pool_size | 8M | 4G | 32 min | +709% |
| innodb_buffer_pool_size | 4G | 6G | 32 min | ~0% |
| innodb_log_file_size | 48M | 1G | 11 min | +190% |
| innodb_log_file_size | 1G | 2G | 10 min | +10% |
| max_allowed_packet | 4M | 128M | 10 min | ~0% |
| innodb_flush_log_at_trx_commit | 1 | 0 | 9 min 25 sec | +5% |
| innodb_thread_concurrency | 9 | 0 | 9 min 27 sec | ~0% |
| innodb_double_write | - | off | 8 min 5 sec | +18% |
+--------------------------------+---------+---------+-----------------------+---------------------+
Summary (for best dump restore performance):
Set innodb_buffer_pool_size to half of RAM
Set innodb_log_file_size to 1G
Set innodb_flush_log_at_trx_commit to 0
Disabling innodb_double_write recommended only for fastest performance, it should be enabled on production. I also found, that changing another related parameter innodb_flush_method didn't change performance. But this can be an issue of Windows platform.
If you have complex structure with a lot of foreign keys for example, you can try Bulk Data Loading for InnoDB Tables tricks, link is listed at bottom of page
As you can see, I tried to increase CPU utilization by setting innodb_thread_concurrency to 0 (and also setting innodb_read_io_threads to maximum of 64) but results didn't change - it seems that mysqld process is already quite efficient for multi-core environment.
Restoring only data (without table structure) also didn't affect performance
I also changed a number of other parameters, but those above are most relevant ones for dump restore operation so far.
It may seem obvious, but novice question can be - where I can find and set those settings?
In Windows, my.ini file is located at ProgramData/MySQL/MySQL Server <version>/my.ini. You won't find some settings there (like innodb_double_write) - it's ok, just add to the end of the file.
The best way to change settings is to use MySQL Workbench (Server > Options file > InnoDB).
I pay my credits to following posts (and a lot of similar ones), which I found very useful:
https://www.percona.com/blog/2018/02/22/restore-mysql-logical-backup-maximum-speed/
https://www.percona.com/blog/2014/01/28/10-mysql-performance-tuning-settings-after-installation/
https://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-bulk-data-loading.html
https://dba.stackexchange.com/questions/86636/when-is-it-safe-to-disable-innodb-doublewrite-buffering
https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html
I have a database table, with 300,000 rows and 113.7 MB in size. I have my database running on Ubuntu 13.10 with 8 Cores and 8GB of RAM. As things are now, the MySQL server uses up an average of 750% CPU. and 6.5 %MEM (results obtained by running top in the CLI). Also to note, it runs on the same server as Apache2 Web Server.
Here's what I get on the Mem line:
Mem: 8141292k total, 6938244k used, 1203048k free, 211396k buffers
When I run: show processlist; I get something like this in return:
2098812 | admin | localhost | phpb | Query | 12 | Sending data | SELECT * FROM items WHERE thumb = 'Halloween 2013 Horns/thumbs/Halloween 2013 Horns (Original).png'
2098813 | admin | localhost | phpb | Query | 12 | Sending data | SELECT * FROM items WHERE thumb = 'Halloween 2013 Witch Hat/thumbs/Halloween 2013 Witch Hat (Origina
2098814 | admin | localhost | phpb | Query | 12 | Sending data | SELECT * FROM items WHERE thumb = 'Halloween 2013 Blouse/thumbs/Halloween 2013 Blouse (Original).png
2098818 | admin | localhost | phpb | Query | 11 | Sending data | SELECT * FROM items WHERE parent = 210162 OR auto = 210162
Some queries are taking an excess of 10 seconds to execute, this is not the top of the list, but somewhere in the middle just to give kind of a perspective of how many queries are stacking up in this list. I feel that it may have something to do with my Query Cash configurations. Here are the configurations show from running the SHOW STATUS LIKE 'Qc%';
+-------------------------+----------+
| Variable_name | Value |
+-------------------------+----------+
| Qcache_free_blocks | 434 |
| Qcache_free_memory | 2037880 |
| Qcache_hits | 62580686 |
| Qcache_inserts | 10865474 |
| Qcache_lowmem_prunes | 4157011 |
| Qcache_not_cached | 3140518 |
| Qcache_queries_in_cache | 1260 |
| Qcache_total_blocks | 4440 |
+-------------------------+----------+
I noticed that the Qcache_lowmem_prunes seem a bit high, is this normal?
I've been searching around StackOverflow, but I couldn't find anything that would solve my problem. Any help with this would be greatly appreciated, thank you!
This is probably one for http://dba.stackexchange.com. That said...
Why are your queries running slow? Do they return a large result set, or are they just really complex?
Have you tried running one of these queries using EXPLAIN SELECT column FROM ...?
Are you using indexes correctly?
How have you configured MySQL in your my.cnf file?
What table types are you using?
Are you getting any errors?
Edit: Okay, looking at your query examples. What data type is items.thumb? Varchar, Text? Is it not at all possible to query this table using another method than literal text matching? (e.g. ID number). Does this column have an index?
First, some background, I’m an SSIS newbie and I’ve just completed my second data-import project.
The package is very simple and consists of a dataflow that imports a tab-separated customer values file of ~30,000 records into an ADO recordset variable which in turn is used to power a ForEach Loop Container that executes a piece of SQL passing in values from each row of the recordset.
The import of the first ~21,000 records took 59 hours to accomplish, prior to it failing! The last ~9,000 took a further 8 hours. Yes, 67 hours in total!
The SQL consists of a check to determine if the record already exists, a call to a procedure to generate a new password, and a final call to another procedure to insert the customer data into our system. The final procedure returns a recordset, but I’m disintersted in the result and so I have just ignored it. I don’t know whether SSIS discards the recordset or not. I am aware that this is the slowest possible way of getting the data into the system, but I did not expect it to be this slow, nor to fail two thirds of the way through, and again whilst processing the last ~9,000.
When I tested the a ~3,000 record subset on my local machine the Execute Package Utility reported that each insert was taking approximately 1 second. A bit of quick math and the suggestion was that the total import would take around 8 hours to run. Seemed like a long time, which I had expected given all that I had read about SSIS and RBAR execution. I figured that the final import would be a bit quicker as the server is considerably more powerful. Although I am accessing the server remotely, but I wouldn’t have expected this to be an issue, as I have performed imports in the past, using bespoke c# console applications that use simple ADO connections and have had nothing run anywhere near as slowly.
Initially the destination table wasn’t optimised for the existence check, and I thought this could be the cause of the slow performance. I added an appropriate index to the table to change the test from a scan to a seek, expecting that this would get rid of the performance issue. Bizarrely it seemed to have no visible effect!
The reason we use the sproc to insert the data into our system is for consistency. It represents the same route that the data takes if it is inserted into our system via our web front-end. The insertion of the data also causes a number of triggers to fire and update various other entities in the database.
What’s been occurring during this import though, and has me scratching my head, is that the execution time for the SQL batch, as reported by the output of the Execute Package Utility has been logarithmically increasing during the run. What starts out as a sub-one second execution time, ends up over the course of the import at greater than 20 seconds, and eventually the import package just simply ground to a complete halt.
I've searched all over the web multiple times, thanks Google, as well as StackOverflow, and haven’t found anything that describes these symptoms.
Hopefully someone out there has some clues.
Thanks
In response to ErikE: (I couldn’t fit this into a comment, so I've added it here.)
Erik. as per your request I ran the profiler over the database whilst running the three thousand item test file through it’s paces.
I wasn’t able to easily figure out how to get SSIS to insert a visible difference into the code that would be visible to the profiler, so I just ran the profiler for the whole run. I know there will be some overhead associated with this, but, theoretically, it should be more or less consistent over the run.
The duration on a per item basis remains pretty constant over the whole run.
Below is cropped output from the trace. In the run that I've done here the first 800 overlapped previously entered data, so the system was effectively doing no work (Yay indexes!). As soon as the index stopped being useful and the system was actually inserting new data, you can see the times jump accordingly, but they don’t seem to change much, if at all between the first and last elements, with the number of reads being the largest item.
------------------------------------------
| Item | CPU | Reads | Writes | Duration |
------------------------------------------
| 0001 | 0 | 29 | 0 | 0 |
| 0002 | 0 | 32 | 0 | 0 |
| 0003 | 0 | 27 | 0 | 0 |
|… |
| 0799 | 0 | 32 | 0 | 0 |
| 0800 | 78 | 4073 | 40 | 124 |
| 0801 | 32 | 2122 | 4 | 54 |
| 0802 | 46 | 2128 | 8 | 174 |
| 0803 | 46 | 2128 | 8 | 174 |
| 0804 | 47 | 2131 | 15 | 242 |
|… |
| 1400 | 16 | 2156 | 1 | 54 |
| 1401 | 16 | 2167 | 3 | 72 |
| 1402 | 16 | 2153 | 4 | 84 |
|… |
| 2997 | 31 | 2193 | 2 | 72 |
| 2998 | 31 | 2195 | 2 | 48 |
| 2999 | 31 | 2184 | 2 | 35 |
| 3000 | 31 | 2180 | 2 | 53 |
------------------------------------------
Overnight I've also put the system through a full re-run of the import with the profiler switched on to see how things feared. It managed to get through 1 third of the import in 15.5 hours on my local machine. I exported the trace data to a SQL table so that I could get some statistics from it. Looking at the data in the trace, the delta between inserts increases by ~1 second per thousand records processed, so by the time it’s reached record 10,000 it’s taking 10 seconds per record to perform the insert. The actual code being executed for each record is below. Don’t bother critiquing the procedure, the SQL was written by the self-taught developer who was originally our receptionist long before anyone with actual developer education was employed by the company. We are well aware that it’s not good. The main thing is that I believe it should execute at a constant rate, and it very obviously doesn’t.
if not exists
(
select 1
from [dbo].[tblSubscriber]
where strSubscriberEmail = #EmailAddress
and ProductId = #ProductId
and strTrialSource = #Source
)
begin
declare #ThePassword varchar(20)
select #ThePassword = [dbo].[DefaultPassword]()
exec [dbo].[MemberLookupTransitionCDS5]
#ProductId
,#EmailAddress
,#ThePassword
,NULL --IP Address
,NULL --BrowserName
,NULL --BrowserVersion
,2 --blnUpdate
,#FirstName --strFirstName
,#Surname --strLastName
,#Source --strTrialSource
,#Comments --strTrialComments
,#Phone --strSubscriberPhone
,#TrialType --intTrialType
,NULL --Redundant MonitorGroupID
,NULL --strTrialFirstPage
,NULL --strTrialRefererUrl
,30 --intTrialSubscriptionDaysLength
,0 --SourceCategoryId
end
GO
Results of determining the difference in time between each execution (cropped for brevity).
----------------------
| Row | Delta (ms) |
----------------------
| 500 | 510 |
| 1000 | 976 |
| 1500 | 1436 |
| 2000 | 1916 |
| 2500 | 2336 |
| 3000 | 2816 |
| 3500 | 3263 |
| 4000 | 3726 |
| 4500 | 4163 |
| 5000 | 4633 |
| 5500 | 5223 |
| 6000 | 5563 |
| 6500 | 6053 |
| 7000 | 6510 |
| 7500 | 6926 |
| 8000 | 7393 |
| 8500 | 7846 |
| 9000 | 8503 |
| 9500 | 8820 |
| 10000 | 9296 |
| 10500 | 9750 |
----------------------
Let's take some steps:
Advice: Isolate if it is a server issue or a client one. Run a trace and see how long the first insert takes compared to the 3000th. Include in the SQL statements some difference on the 1st and 3000th iteration that can be filtered for in the trace so it is not capturing the other events. Try to avoid statement completion--use batch or RPC completion.
Response: The recorded CPU, reads, and duration from your profiler trace are not increasing, but the actual elapsed/effective insert time is.
Advice: Assuming that the above pattern holds true through the 10,000th insert (please advise if different), my best guess is that some blocking is occurring, maybe something like a constraint validation that is doing a nested loop join, which would scale logarithmically with the number of rows in the table just as you are seeing. Would you please do the following:
Provide the full execution plan of the INSERT statement using SET SHOWPLAN_TEXT ON.
Run a trace on the Blocked Process Report event and report on anything interesting.
Read Eliminating Deadlocks Caused by Foreign Keys with Large Transactions and let me know if this might be the cause or if I am barking up the wrong tree.
If none of this makes progress on the problem, simply update your question with any new information and comment here, and I'll continue to do my best to help.
I'm the owner of a web-game with a high-traffic database with a lot of queries.
I'm now optimizing the mySQL and tools like mysqltuner keep saying that I have insufficient query cache memory.
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| Qcache_free_blocks | 218 |
| Qcache_free_memory | 109155712 |
| Qcache_hits | 60955602 |
| Qcache_inserts | 38923475 |
| Qcache_lowmem_prunes | 100051 |
| Qcache_not_cached | 10858099 |
| Qcache_queries_in_cache | 19384 |
| Qcache_total_blocks | 39134 |
+-------------------------+-----------+
That's from about 1,5 day of running the server.
Ideally the lowmem_prunes would be 0 of course, but there are very much queries which are user-based. Like,
SELECT username FROM users WHERE id='1';
SELECT username FROM users WHERE id='12';
SELECT username FROM users WHERE id='12453';
SELECT username FROM users WHERE id='122348';
As I have about 3000 different users a day logging in you will have many queries like this in the memory.
I know about the ON DEMAND rule, but I'm avoiding it because we have a lot of queries which I don't want to check all over.
Increasing the memory wouldn't be a fix either since we're having so many queries.
Then I came up with the idea to make a cronjob to automatically RESET the query cache when the lowmem_prunes is zero. (which will probably be once an hour)
What's the best option?
1. Automatically reset the query cache once an hour
2. Buy more RAM and increase the cache with 1GB for example (which has it's downsides... I rather not do that)
3. Specify per query which should be kept in cache in which not.
4. Just keep it like this.
I have a MySQL query that is copying data from one table to another for processing. For some reason, this query that normally takes a few seconds locked up overnight and ran for several hours. When I logged in this morning, I tried to kill the query, but it is still listed in the process list.
| Id | User | Host | db | Command | Time | State | Info |
+---------+----------+-----------+------+---------+-------+--------------+--------------------------------------------------------------------------------------+
| 1061763 | tb_admin | localhost | dw | Killed | 45299 | Sending data | INSERT INTO email_data_inno_stage SELECT * FROM email_data_test LIMIT 4480000, 10000 |
| 1062614 | tb_admin | localhost | dw | Killed | 863 | Sending data | INSERT INTO email_data_inno_stage SELECT * FROM email_data_test LIMIT 4480000, 10000 |
What could have caused this, and how can I kill this process so I can get on with my work?
If the table email_data_test is MyISAM and it was locked, that would have held up the the INSERT.
If the table email_data_test is InnoDB, then a lot of MVCC data was being written in ib_logfiles, which may not have occurred yet.
In both cases, you had the LIMIT clause scroll through 4,480,000 rows just to get to 10,000 rows you actually needed to INSERT.
Killing the query only causes the InnoDB table email_data_inno_stage to execute a rollback.