I have the following INSERT statement:
cursor.execute('''UPDATE alg SET size=%(size)s, path=%(path)s,
last_modified=%(last_modified)s, search_terms=%
WHERE objectID=%(objectID)s''', data)
It is indexed and usually takes 1/1000th of a second to do. However, once every 200 INSERTs or so, it takes a super long time. Here's an example of it timed --
453
0.000407934188843
454
0.29783987999 <-- this one
455
0.000342130661011
456
0.000318765640259
457
0.000240087509155
The column is indexed and it is using InnoDB. Any idea what the issue might be? It also seems random in that if I run it over and over, different objects cause the INSERTs to be slow -- it's not tied to any particular object.
Also note that I do not have this issue on MyISAM.
You likely have default innodb_log_file_size, which id 5 Megabytes. Set this in your cnf file to be the minimum of 128M or 25% of your innodb_buffer_pool_size. You'll want the buffer pool to be as large as possible for your system. If it's a dedicated mysql server then 70-80% of system RAM would not be unreasonable (leaving some for OS page cache).
Setting the log file size larger will space out times things need to be flushed to tables. Setting it too large will increase crash recovery times on restarts.
Also be sure to set innodb_flush_method=O_DIRECT to avoid OS level page caching.
Related
i have a weird problem in with my aurora cluster.
the cluster have two instances, with 4 cores and 32GB spec, writer and reader instance.
currently its pushing about 2000 INSERT/second with 0.5KB row size to a table that is about 100GB size with 100GB secondary indexes, PK is bigint autoincrement.
current row count is about 1Billion+
as the table size grow i notice theres some significant amount of innodb buffer pool IO, which i don't understand why it could happen?
i understand that due to the table index size it can't fit everything into the memory, but still with the amount of disk IO hits, logically its not possible?
is there any reason there is a "read" into the buffer pool memory/disk when all i did is just inserting into that table?
so i did perform a table swap (rename the big table to _old and then recreate the table), the problem goes away but then after few days it returned.
also, currently using the golang to do the inserts but for some reason it always append this "START TRANSACTION" and "COMMIT" statement even though its just a single insert statement, but not sure if this impacts anything
I'm batching CSV 15GB (30mio rows) into a mysql-8 database.
Problem: the task takes about 20min, with approxy throughput of 15-20 MB/s. While the harddrive is capable of transfering files with 150 MB/s.
I have a RAM disk of 20GB, which holds my csv. Import as follows:
mysqlimport --user="root" --password="pass" --local --use-threads=8 mytable /tmp/mydata.csv
This uses LOAD DATA under the hood.
My target table does not have any indexes, but approx 100 columns (I cannot change this).
What is strange: I tried tweaking several config parameters as follows in /etc/mysql/my.cnf, but they did not give any significant improvement:
log_bin=OFF
skip-log-bin
innodb_buffer_pool_size=20G
tmp_table_size=20G
max_heap_table_size=20G
innodb_log_buffer_size=4M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_autoinc_lock_mode=2
Question: does LOAD DATA / mysqlimport respect those config changes? Or does it bypass? Or did I use the correct configuration file at all?
At least a select on the variables shows they are correctly loaded by the mysql server. For example show variables like 'innodb_doublewrite' shows OFF
Anyways, how could I improve import speed further? Or is my database the bottleneck and there is no way to overcome the 15-20 MB/s threshold?
Update:
Interestingly if I import my csv from harddrive into the ramdisk, performance is almost the same (just a little bit better, but never over 25 MB/s). I also tested the same amount of rows, but only with a few (5) columns. And there I'm getting to about 80 MB/s. So clearly the number of columns is the bottleneck? But why do more columns slow down this process?
MySQL/MariaDB engine have little parallelization when making bulk inserts. It can only use one CPU core per LOAD DATA statement. You may probably monitor CPU utilization during load to see one core is fully utilized and it can provide only so much of output data - thus leaving disk throughput underutilized.
The most recent version of MySQL has new parallel load feature: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-parallel-table.html . It looks promising but probably hasn't received much feedback yet. I'm not sure it would help in your case.
I saw various checklists on the internet that recommended having higher values in the following config params: log_buffer_size, log_file_size, write_io_threads, bulk_insert_buffer_size . But the benefits were not very pronounced when I performed comparison tests (maybe 10-20% faster than just innodb_buffer_pool_size being large enough).
This could be normal. Let's walk through what is being done:
The csv file is being read from a RAM disk, so no IOPs are being used.
Are you using InnoDB? If so, the data is going into the buffer_pool. As blocks are being built there, they are being marked 'dirty' for eventual flushing to disk.
Since the buffer_pool is large, but probably not as large as the table will become, some of the blocks will need to be flushed before it finishes reading all the data.
After all the data is read, and the table is finished, the dirty blocks will gradually be flushed to disk.
If you had non-unique indexes, they would similarly be written in a delayed manner to disk (cf 'Change buffering'). The change_buffer, by default occupies 25% of the buffer_pool.
How large is the resulting table? It may be significantly larger, or even smaller, than the 15GB of the csv file.
How much time did it take to bring the csv file into the ram disk? I proffer that that was wasted time and it should have been read from disk while doing the LOAD DATA; that I/O can be overlapped.
Please SHOW GLOBAL VARIABLES LIKE 'innodb%';; there are several others that may be relevant.
More
These are terrible:
tmp_table_size=20G
max_heap_table_size=20G
If you have a complex query, 20GB could be allocated in RAM, possibly multiple times!. Keep those to under 1% of RAM.
If copying the csv from hard disk to ram disk runs slowly, I would suspect the validity of 150 MB/s.
If you are loading the table once every 6 hours, and it takes 1/3 of an hour to perform, I don't see the urgency of making it faster. OTOH, there may be something worth looking into. If that 20 minutes is downtime due to the table being locked, that can be easily eliminated:
CREATE TABLE t LIKE real_table;
LOAD DATA INFILE INTO t ...; -- not blocking anyone
RENAME TABLE real_table TO old, t TO real_table; -- atomic; fast
DROP TABLE old;
I'm perf tuning a large query, and want to run it from the same baseline before and after, for comparison.
I know about the mysql query cache, but its not relevant to me, since the 2 queries would not be cached anyway.
What is being cached, is the innodb pages, in the buffer pool.
Is there a way to clear the entire buffer pool so I can compare the two queries from the same starting point?
Whilst restarting the mysql server after running each query would no doubt work, Id like to avoid this if possible
WARNING : The following only works for MySQL 5.5 and MySQL 5.1.41+ (InnoDB Plugin)
Tweak the duration of entries in the InnoDB Buffer Pool with these settings:
// This is 0.25 seconds
SET GLOBAL innodb_old_blocks_time=250;
SET GLOBAL innodb_old_blocks_pct=5;
SET GLOBAL innodb_max_dirty_pages_pct=0;
When you are done testing, setting them back to the defaults:
SET GLOBAL innodb_old_blocks_time=0;
SET GLOBAL innodb_old_blocks_pct=37;
SET GLOBAL innodb_max_dirty_pages_pct=90;
// 75 for MySQL 5.5/MySQL 5.1 InnoDB Plugin
Check out the definition of these settings
MySQL 5.5
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
MySQL 5.1.41+
innodb_old_blocks_time
innodb_old_blocks_pct
innodb_max_dirty_pages_pct
Much simpler... Run this twice
SELECT SQL_NO_CACHE ...;
And look at the second timing.
The first one warms up the buffer_pool; the second one avoids the QC by having SQL_NO_CACHE. (In MySQL 8.0, leave off SQL_NO_CACHE; it is gone.)
So the second timing is a good indication of how long it takes in a production system with a warm cache.
Further, Look at Handler counts
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handlers%';
gives a reasonably clear picture of how many rows are touched. That, in turn, gives you a good feel for how much effort the query takes. Note that this can be run quite successfully (and quickly) on small datasets. Then you can (often) extrapolate to larger datasets.
A "Handler_read" might be reading an index row or a data row. It might be the 'next' row (hence probably cached in the block that was read for the previous row), or it might be random (hence possibly subject to another disk hit). That is, the technique fails to help much with "how many blocks are needed".
This Handler technique is impervious to what else is going on; it gives consistent results.
"Handler_write" indicates that a tmp table was needed.
Numbers that approximate the number of rows in the table (or a multiple of such), probably indicate a table scan(s). A number that is the same as LIMIT might mean that you build such a good index that it consumed the LIMIT into itself.
If you do flush the buffer_pool, you could watch for changes in Innodb_buffer_pool_reads to give a precise(?) count of the number of pages read in a cold system. This would include non-leaf index pages, which are almost always cached. If anything else is going on in the system, this STATUS value should not be trusted because it is 'global', not 'session'.
We're running a moderate size (350GB) database with some fairly large tables (a few hundred million rows, 50GB) on a reasonably large server (2 x quad-core Xeons, 24GB RAM, 2.5" 10k disks in RAID10), and are getting some pretty slow inserts (e.g. simple insert of a single row taking 90 seconds!).
Our innodb_buffer_pool_size is set to 400MB, which would normally be way too low for this kind of setup. However, our hosting provider advises that this is irrelevant when running on ZFS. Is he right?
(Apologies for the double post on https://dba.stackexchange.com/questions/1975/is-tuning-the-innodb-buffer-pool-size-important-on-solaris-zfs, but I'm not sure how big the audience is over there!)
Your hosting provider is incorrect. There are various things you should tune differently when running MySQL on ZFS, but reducing the innodb_buffer_pool_size is not one of them. I wrote an article on the subject of running MySQL on ZFS and gave a lecture on it a while back. Specifically regarding innodb_buffer_pool_size, what you should do is set it to whatever would be reasonable on any other file system, and because O_DIRECT doesn't mean "don't cache" on ZFS, you should set primarycache=metadata on your ZFS file system containing your datadir. There are other optimisations to be made, which you can find in the article and the lecture slides.
I would still set the innodb_buffer_pool_size much higher that 400M. The reason? InnoDB Buffer Pool will still cache the data and index pages you need for tables accessed frequently.
Run this query to get the recommended innodb_buffer_pool_size in MB:
SELECT CONCAT(ROUND(KBS/POWER(1024,IF(pw<0,0,IF(pw>3,0,pw)))+0.49999),SUBSTR(' KMG',IF(pw<0,0,IF(pw>3,0,pw))+1,1)) recommended_innodb_buffer_pool_size FROM (SELECT SUM(data_length+index_length) KBS FROM information_schema.tables WHERE engine='InnoDB') A,(SELECT 2 pw) B;
Simply use either the result of this query or 80% of installed RAM (in your case 19660M) whichever is smaller.
I would also set the innodb_log_file_size to 25% of the InnoDB Buffer Pool size. Unfortunately, the maximum value of innodb_log_file_size is 2047M. (1M short of 2G) Thus, set innodb_log_file_size to 2047M since 25% of innodb_buffer_pool_size of my recommendated setting is 4915M.
Yet another recommedation is to disable ACID compliance. Use either 0 or 2 for innodb_flush_log_at_trx_commit (default is 1 which support ACID compliance) This will produce faster InnoDB writes AT THE RISK of losing up to 1 second's worth of transactions in the event of a crash.
May be worth reading slow-mysql-inserts if you haven't already. Also this link to the mysql docs on the matter - especially with regards to considering a transaction if you are doing multiple inserts to a large table.
More relevant is this mysql article on performance of innodb and zfs which specifically considers the buffer pool size.
The headline conclusion is;
With InnoDB, the ZFS performance curve suggests a new strategy of "set the buffer pool size low, and let ZFS handle the data buffering."
You may wish to add some more detail such as the number / complexity of the indexes on the table - this can obviously make a big difference.
Apologies for this being rather generic advice rather than from personal experience, I haven't run zfs in anger but hope some of those links might be of use.
I'm running:
MySQL v5.0.67
InnoDB engine
innodb_buffer_pool_size = 70MB
Question: What command can I run to ensure that my entire 50 MB database is stored entirely in RAM?
I am curious about why you want to store the entire table in memory. My guess is that you are not. The most important thing for me is if your queries are running well and if you are tied up on disk access. It is also possible that the OS has cached disk blocks that you need if there is memory available. In this case, even though MySQL might not have it in memory, the OS will. If your queries are not running well, and you can do it, I highly recommend adding more memory if you want it all in RAM. If you have slowdowns it is more likely that you are running into contention.
show table status
will show you some of the information.
If you get the server IO/buffer/cache statistics from
show server status
and then run a query that requires each row to be accessed (say sum the non empty values from each row using a column that is not indexed) and check to see if any IO has occurred.
I doubt you are caching the entire thing in memory though with only 70MB. You have to take out a lot of cache, temp, and index buffers from that total.
If you run SELECT COUNT(*) FROM yourtable USE INDEX (PRIMARY) then InnoDB will put every page of the PRIMARY index into buffer pool (assuming there is enough room in it). If the table has secondary indexes and if you want to load them into the buffer pool, too, then craft a similar query that would read from a secondary index and do the job.