Mysql InnoDB vs Mongodb write performance - mysql

I need to increment a counter each time a web page is rendered.
When I use mongodb to do that, I can do about 16000 writes per second on a 4 cores/8 threads CPU on a regular disk.
When I use Mysql InnoDB table, I can do only... 30 writes per second on regular disk or 200 writes on SSD !!
Because I have only one write per transaction (basically I have no other write to do after incrementing my counter for a same http request)
Using autocommit to False and manually commit will not help.
The différence is that Mongodb flushes writes lazyly.
I tried to have Mysql buffering writes before flushing them to disk by setting these parameters into my.cf, but it did not helped :
innodb_buffer_pool_size = 1G
innodb_flush_method=O_DIRECT
innodb_log_file_size=100M
innodb_change_buffering=all
innodb_thread_concurrency=8
Is there a way to have faster mysql writes ?

If all you are doing is "increment a counter each time a web page is rendered" then I suggest you ditch the database for this altogether. Keep the counter purely in memory (for example via memcached), and use a cronjob to dump it to disk every 10 mins to keep a more permanent record.
If you're recording more than just a counter and want to use a database, consider using the MySQL MEMORY storage engine (https://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html).
CREATE TABLE t (i INT) ENGINE = MEMORY;
The table will be kept in memory so will be much faster than a disk based table. You'll just need a script to do a manual 'flush' to disk (e.g. mysqldump) from time to time if you need the permanence.

innodb_flush_log_at_trx_commit = 2
Is the winner : 150x faster !!! on standard disk (will try on SSD later)

Related

Improve performance of mysql LOAD DATA / mysqlimport?

I'm batching CSV 15GB (30mio rows) into a mysql-8 database.
Problem: the task takes about 20min, with approxy throughput of 15-20 MB/s. While the harddrive is capable of transfering files with 150 MB/s.
I have a RAM disk of 20GB, which holds my csv. Import as follows:
mysqlimport --user="root" --password="pass" --local --use-threads=8 mytable /tmp/mydata.csv
This uses LOAD DATA under the hood.
My target table does not have any indexes, but approx 100 columns (I cannot change this).
What is strange: I tried tweaking several config parameters as follows in /etc/mysql/my.cnf, but they did not give any significant improvement:
log_bin=OFF
skip-log-bin
innodb_buffer_pool_size=20G
tmp_table_size=20G
max_heap_table_size=20G
innodb_log_buffer_size=4M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_autoinc_lock_mode=2
Question: does LOAD DATA / mysqlimport respect those config changes? Or does it bypass? Or did I use the correct configuration file at all?
At least a select on the variables shows they are correctly loaded by the mysql server. For example show variables like 'innodb_doublewrite' shows OFF
Anyways, how could I improve import speed further? Or is my database the bottleneck and there is no way to overcome the 15-20 MB/s threshold?
Update:
Interestingly if I import my csv from harddrive into the ramdisk, performance is almost the same (just a little bit better, but never over 25 MB/s). I also tested the same amount of rows, but only with a few (5) columns. And there I'm getting to about 80 MB/s. So clearly the number of columns is the bottleneck? But why do more columns slow down this process?
MySQL/MariaDB engine have little parallelization when making bulk inserts. It can only use one CPU core per LOAD DATA statement. You may probably monitor CPU utilization during load to see one core is fully utilized and it can provide only so much of output data - thus leaving disk throughput underutilized.
The most recent version of MySQL has new parallel load feature: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-parallel-table.html . It looks promising but probably hasn't received much feedback yet. I'm not sure it would help in your case.
I saw various checklists on the internet that recommended having higher values in the following config params: log_buffer_size, log_file_size, write_io_threads, bulk_insert_buffer_size . But the benefits were not very pronounced when I performed comparison tests (maybe 10-20% faster than just innodb_buffer_pool_size being large enough).
This could be normal. Let's walk through what is being done:
The csv file is being read from a RAM disk, so no IOPs are being used.
Are you using InnoDB? If so, the data is going into the buffer_pool. As blocks are being built there, they are being marked 'dirty' for eventual flushing to disk.
Since the buffer_pool is large, but probably not as large as the table will become, some of the blocks will need to be flushed before it finishes reading all the data.
After all the data is read, and the table is finished, the dirty blocks will gradually be flushed to disk.
If you had non-unique indexes, they would similarly be written in a delayed manner to disk (cf 'Change buffering'). The change_buffer, by default occupies 25% of the buffer_pool.
How large is the resulting table? It may be significantly larger, or even smaller, than the 15GB of the csv file.
How much time did it take to bring the csv file into the ram disk? I proffer that that was wasted time and it should have been read from disk while doing the LOAD DATA; that I/O can be overlapped.
Please SHOW GLOBAL VARIABLES LIKE 'innodb%';; there are several others that may be relevant.
More
These are terrible:
tmp_table_size=20G
max_heap_table_size=20G
If you have a complex query, 20GB could be allocated in RAM, possibly multiple times!. Keep those to under 1% of RAM.
If copying the csv from hard disk to ram disk runs slowly, I would suspect the validity of 150 MB/s.
If you are loading the table once every 6 hours, and it takes 1/3 of an hour to perform, I don't see the urgency of making it faster. OTOH, there may be something worth looking into. If that 20 minutes is downtime due to the table being locked, that can be easily eliminated:
CREATE TABLE t LIKE real_table;
LOAD DATA INFILE INTO t ...; -- not blocking anyone
RENAME TABLE real_table TO old, t TO real_table; -- atomic; fast
DROP TABLE old;

How to improve the speed of InnoDB writes per second of MySQL DB

On my server, doing insert records into MySQL DB is very slow. Regarding the Server Status, InnoDB writes per second is around 20.
I am not an expert, just graduated from university. I don't have much experience on it.
How could I improve the speed of InnoDB writes? If doesn't upgrade the hardware of my server, is there any way can do it?
My server is not good, so I installed Microsoft windows server 2003 R2. The hardware info is following:
CPU: Intel Xeon E5649 2.53GHZ
RAM: 2GB
Any comments, Thank you.
Some hints:
Minimize the number of indexes - there will be less index maintenance. This is obvously a trade-off with SELECT performance.
Maximize the number of INSERTs per transaction - the "durability price" will be less (i.e. physical writing to disk can be done in the background while the rest of the transaction is still executing, if the transaction is long enough). One large transaction will usually be faster than many small transaction, but this is obviously contingent on the actual logic you are trying to implement.
Move the table to a faster storage, such as SSD. Reads can be cached, but a durable transaction must be physically written to disk, so just caching is not enough.
Also, it would be helpful if you could show us your exact database structure and the exact INSERT statement you are using.
If using InnoDB engine+local disk, try to benchmark with innodb_flush_method = O_DSYNC. With O_DSYNC our bulk inserts (surrounded by TRANSACTION) was improved.
Adjust the flush method
In some versions of GNU/Linux and Unix, flushing files to disk with
the Unix fsync() call (which InnoDB uses by default) and similar
methods is surprisingly slow. If database write performance is an
issue, conduct benchmarks with the innodb_flush_method parameter set
to O_DSYNC.
https://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-diskio.html
Modify your config for MySQL server
innodb_flush_log_at_trx_commit = 0
then Restart MySQL server
please set the innodb_buffer_pool_size to 512M. It may increase the performance
SET GLOBAL innodb_buffer_pool_size=512M
Recommendations could vary based on your implementation. Here are some notes copied directly from MySQL documentation:
Bulk Data Loading Tips
When importing data into InnoDB, make sure that MySQL does not have
autocommit mode enabled because that requires a log flush to disk for
every insert. To disable autocommit during your import operation,
surround it with SET autocommit and COMMIT statements.
Use the multiple-row INSERT syntax to reduce communication overhead
between the client and the server if you need to insert many rows:
INSERT INTO yourtable VALUES (1,2), (5,5), ...;
If you are doing a huge batch insert, try avoiding the "select from
last_insert_id" that follows the insert as it seriously slows down the
insertions (to the order of making a 6 minute insert into a 13 hour
insert) if you need the number for another insertion (a subtable
perhaps) assign your own numbers to the id's (this obviously only
works if you are sure nobody else is doing inserts at the same time).
As mentioned already, you can increase the size of the InnoDB buffer pool (innodb_buffer_pool_size variable). This is generally a good idea because the default size is pretty small and most systems can accommodate lending more memory to the pool. This will increase the speed of most queries, especially SELECTs (as more records will be kept in the buffer between queries). The insert buffer is also a section of the buffer pool and will store recently inserted records, which will increase speed if you are basing future inserts on values from previous inserts. Hope this helps :)

MySQL bulk inserts with LOAD INFILE - MyISAM merely slower than MEMORY engine

We are currently performing several performance tests on MySQL to compare it to an approach we are developing for a database prototype. To say it short: database is empty, given a huge csv file, load the data into memory as fast as possible.
We are testing on a 12-core Westmere server with 48 GB RAM, so memory consumption is right now not a real issue.
The problem is the following. We haven chosen MySQL (widely spread, open source) for comparison. Since our prototype is an in-memory database, we have chosen the memory engine in MySQL.
We insert this way (file are up to 26 GB large):
drop table if exists a.a;
SET ##max_heap_table_size=40000000000;
create table a.a(col_1 int, col_2 int, col_3 int) ENGINE=MEMORY;
LOAD DATA CONCURRENT INFILE "/tmp/input_files/input.csv" INTO TABLE a.a FIELDS TERMINATED BY ";";
Performing this load on a 2.6 GB file takes about 80 s, which is four times slower that an (wc -l). Using MyISAM is only 4 seconds slower, even though is writing to disk.
What I am doing wrong here? I suppose that a data write using the memory engine must be by far faster than using MyISAM. And I don't understand why wc -l (both single threaded, but writing to mem is not that slow) is that much faster.
PS: changing read_buffer_size or any other vars I found googling, did not result in significant improvements.
try setting following variables as well
max_heap_table_size=40GB;
bulk_insert_buffer_size=32MB
read_buffer_size=1M
read_rnd_buffer_size=1M
It may reduce query execution time slightly.
Also CONCURRENT works only with MyISAM table and it slows inserts according to manual refer: Load Data Infile
I think you can't compare speed of insert which is a write operation with wc -l which is read operation as writes are always slower as compared to reads.
Loading 2.6GB data in RAM is going to take considerable amount of time. It mostly depends on the write speed of RAM and IO configuration of your OS.
Hope this helps.
I think the reason you didn't see a significant difference between the MEMORY engine and the MyISAM engine is due to disk caching. You have 48GB of RAM and are only loading 2.6GB of data.
The MyISAM engine is writing to 'files' but the OS is using its file caching features to make those file writes actually occur in RAM. Then it will 'lazily' make the actual writes to disk. Since you mentioned 'wc', I'll assume you are using Linux. Read up on the dirty_ratio and dirty_background_ratio kernel settings as a starting point to understanding how that works.

Mysql MEMORY table vs InnoDB table (many inserts, few reads)

I run my sites all on InnoDB tables which is working really well so far. Now I like to know what is going on in real-time on my sites, so I store each pageview (page, referrer, IP, hostname, etc) in an InnoDB table. There are about 100 inserts per second, and this table is only read once in a while when i'm browsing the logs.
I clean out the table every minute with a cron that removes old items. This leaves about 35.000 rows in that table on average, with a size of about 5MB.
Would it be easier on the server if I were to transfer the InnoDB table to a MEMORY table? As far as I can see this would save a lot of disk IO right? Restarting Mysql would result in a loss of data, but this does not matter in my case.
Question: In my case, would you recommend a Memory table over a InnoDB table?
Yes I would. The conditions you mention (a lot of writes, periodic purging of data, data persistence not required) make it pretty much an ideal candidate for MEMORY.
please optimize your innodb settings:
As long as you have configured InnoDB to use enough memory to hold your entire table (with innodb_buffer_pool_size), and there is not excessive pressure from other InnoDB tables on the same server, the data will remain in memory. If you're concerned about write performance (and again barring other uses of the same system) you can reduce durability to drastically increase write performance by setting innodb_flush_log_at_trx_commit = 0 and disabling binary logging.
Using any sort of triggers with temporary tables will be a mess to maintain, and won't give you any benefits of transactionality on the temporary tables.
You can find more details right here:
http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

MySQL - How to determine if my table is stored in RAM?

I'm running:
MySQL v5.0.67
InnoDB engine
innodb_buffer_pool_size = 70MB
Question: What command can I run to ensure that my entire 50 MB database is stored entirely in RAM?
I am curious about why you want to store the entire table in memory. My guess is that you are not. The most important thing for me is if your queries are running well and if you are tied up on disk access. It is also possible that the OS has cached disk blocks that you need if there is memory available. In this case, even though MySQL might not have it in memory, the OS will. If your queries are not running well, and you can do it, I highly recommend adding more memory if you want it all in RAM. If you have slowdowns it is more likely that you are running into contention.
show table status
will show you some of the information.
If you get the server IO/buffer/cache statistics from
show server status
and then run a query that requires each row to be accessed (say sum the non empty values from each row using a column that is not indexed) and check to see if any IO has occurred.
I doubt you are caching the entire thing in memory though with only 70MB. You have to take out a lot of cache, temp, and index buffers from that total.
If you run SELECT COUNT(*) FROM yourtable USE INDEX (PRIMARY) then InnoDB will put every page of the PRIMARY index into buffer pool (assuming there is enough room in it). If the table has secondary indexes and if you want to load them into the buffer pool, too, then craft a similar query that would read from a secondary index and do the job.