After noticing that our database has become a major bottleneck on our live production systems, I decided to construct a simple benchmark to get to the bottom of the issue.
The benchmark: I time how long it takes to increment the same row in an InnoDB table 3000 times, where the row is indexed by its primary key, and the column being updated is not part of any index. I perform these 3000 updates using 20 concurrent clients running on a remote machine, each with its own separate connection to the DB.
I'm interested in learning why the different storage engines I benchmarked, InnoDB, MyISAM, and MEMORY, have the profiles that they do. I'm also hoping to understand why InnoDB fares so poorly in comparison.
InnoDB (20 concurrent clients):
Each update takes 0.175s.
All updates are done after 6.68s.
MyISAM (20 concurrent clients):
Each update takes 0.003s.
All updates are done after 0.85s.
Memory (20 concurrent clients):
Each update takes 0.0019s.
All updates are done after 0.80s.
Thinking that the concurrency could be causing this behavior, I also benchmarked a single client doing 100 updates sequentially.
InnoDB:
Each update takes 0.0026s.
MyISAM:
Each update takes 0.0006s.
MEMORY:
Each update takes 0.0005s.
The actual machine is an Amazon RDS instance (http://aws.amazon.com/rds/) with mostly default configurations.
I'm guessing that the answer will be along the following lines: InnoDB fsyncs after each update (since each update is an ACID compliant transaction), whereas MyISAM does not since it doesn't even support transaction. MyISAM is probably performing all updates in memory, and regularly flushing to disk, which is how its speed approaches the MEMORY storage engine. If this is so, is there a way to use InnoDB for its transaction support, but perhaps relax some constraints (via configurations) so that writes are done faster at the cost of some durability?
Also, any suggestions on how to improve InnoDB's performance as the number of clients increases? It is clearly scaling worse than the other storage engines.
Update
I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.
Running the same benchmarks, we see a 10x improvement in write performance.
InnoDB (20 concurrent clients):
Each update takes 0.017s.
All updates are done after 0.98s.
Any other suggestions?
I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.
Running the same benchmarks, we see a 10x improvement in write performance.
InnoDB (20 concurrent clients): Each update takes 0.017s. All updates are done after 0.98s.
We have done some similar tests in our application and we noticed that if no transaction is explicitly opened, each single SQL instruction is treated inside a transaction, which takes much more time to execute. If your business logic allows, you can put several SQL commands inside a transaction block, reducing overall ACID overhead. In our case, we had great performance improvement with this approach.
Related
I know there is one issue in MySQL with concurrent SELECT and INSERT. However, my question is if I open up two connections with MySQL and keep loading data using both of them, does MySQL takes data concurrently or waits for one to finish before loading another?
I’d like to know how MySQL behaves in both cases. Like when I am trying to load data in the same table or different tables concurrently when opening separate connections.
If you will create a new connection to the database and perform inserts from both the links, then from the database's perspective, it will still be sequential.
The documentation of Concurrent Inserts for MyISAM on the MySQL's documentation page says something like this:
If MyISAM storage is used and table has no holes, multiple INSERT statements are queued and performed in sequence, concurrently with the SELECT statements.
Mind that there is no control over the order in which two concurrent inserts will take place. The order in this concurrency is at the mercy of a lot of different factors. To ensure order, by default you will have to sacrifice concurrency.
MySQL does support parallel data inserts into the same table.
But approaches for concurrent read/write depends upon storage engine you use.
InnoDB
MySQL uses row-level locking for InnoDB tables to support simultaneous write access by multiple sessions, making them suitable for multi-user, highly concurrent, and OLTP applications.
MyISAM
MySQL uses table-level locking for MyISAM, MEMORY, and MERGE tables, allowing only one session to update those tables at a time, making them more suitable for read-only, read-mostly, or single-user applications
But, the above mentioned behavior of MyISAM tables can be altered by concurrent_insert system variable in order to achieve concurrent write. Kindly refer to this link for details.
Hence, as a matter of fact, MySQL does support concurrent insert for InnoDB and MyISAM storage engine.
You ask about Deadlock detection, ACID and particulary MVCC, locking and transactions:
Deadlock Detection and Rollback
InnoDB automatically detects transaction deadlocks and rolls back a
transaction or transactions to break the deadlock. InnoDB tries to
pick small transactions to roll back, where the size of a transaction
is determined by the number of rows inserted, updated, or deleted.
When InnoDB performs a complete rollback of a transaction, all locks
set by the transaction are released. However, if just a single SQL
statement is rolled back as a result of an error, some of the locks
set by the statement may be preserved. This happens because InnoDB
stores row locks in a format such that it cannot know afterward which
lock was set by which statement.
https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlock-detection.html
Locking
The system of protecting a transaction from seeing or changing data
that is being queried or changed by other transactions. The locking
strategy must balance reliability and consistency of database
operations (the principles of the ACID philosophy) against the
performance needed for good concurrency. Fine-tuning the locking
strategy often involves choosing an isolation level and ensuring all
your database operations are safe and reliable for that isolation
level.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_locking
ACID
An acronym standing for atomicity, consistency, isolation, and
durability. These properties are all desirable in a database system,
and are all closely tied to the notion of a transaction. The
transactional features of InnoDB adhere to the ACID principles.
Transactions are atomic units of work that can be committed or rolled
back. When a transaction makes multiple changes to the database,
either all the changes succeed when the transaction is committed, or
all the changes are undone when the transaction is rolled back. The
database remains in a consistent state at all times -- after each
commit or rollback, and while transactions are in progress. If related
data is being updated across multiple tables, queries see either all
old values or all new values, not a mix of old and new values.
Transactions are protected (isolated) from each other while they are
in progress; they cannot interfere with each other or see each other's
uncommitted data. This isolation is achieved through the locking
mechanism. Experienced users can adjust the isolation level, trading
off less protection in favor of increased performance and concurrency,
when they can be sure that the transactions really do not interfere
with each other.
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_acid
MVCC
InnoDB is a multiversion concurrency control (MVCC) storage engine
which means many versions of the single row can exist at the same
time. In fact there can be a huge amount of such row versions.
Depending on the isolation mode you have chosen, InnoDB might have to
keep all row versions going back to the earliest active read view, but
at the very least it will have to keep all versions going back to the
start of SELECT query which is currently running
https://www.percona.com/blog/2014/12/17/innodbs-multi-versioning-handling-can-be-achilles-heel/
It depends.
It depends on the client -- some clients allow concurrent access; some will serialize access, thereby losing the expected gain. You have not even specified PHP vs Java vs ... or Apache vs ... or Windows vs ... Many combinations simply do not provide any parallelism.
If different tables, there is only general contention for I/O, CPU, Mutexes on the buffer_pool, etc. A reasonable amount of parallelism is possible.
If same table, it depends on the indexes and access patterns. In some cases the threads will block each other. In some cases it will even "deadlock" and rollback one of the transactions. Deadlocks not only slow you down, but make you retry the inserts.
If you looking for high speed ingestion of a lot of rows, see my blog. It lays out techniques, and points out sever of the ramifications, such as replication, Engine choice, multi-threading.
Multiple threads inserting into the same tables -- It depend a lot on the values you are providing for any PRIMARY or UNIQUE keys. It depends on whether other actions are taken in the same transaction. It depends on how much I/O is involved. It depends on whether you are doing single-row inserts, or batching. It depends on ... (Sorry to be vague, but your question is not very specific.)
If you would like to present specifics on two or three designs, we can discuss the specifics.
Which engine to be used for more than 100 insert query per second
I read differences and pros and cons of MYISAM and Innodb.
But i am still confused for 100+ insert query in a table (basically for tracking purpose) which db should i use.
I refered What's the difference between MyISAM and InnoDB?
Based on my understanding, for each insert MYISAM will lock table and hence innodb should be used for row locking.
But on the otherhand performance of MYISAM are 100times better.So what should be the optimal and correct selection and why?
Simple code that does one-row INSERTs without any tuning maxes out at about 100 rows per second in any engine, especially InnoDB.
But, it is possible to get 1000 rows per second or even more.
The quick fix for InnoDB is to set innodb_flush_log_at_trx_commit = 2; that will uncork the main thing stopping InnoDB at 100 inserts/second using a commodity spinning disk. Setting innodb_buffer_pool_size to about 70% of available RAM is also important.
If a user is inserting multiple rows into the same table at the same time, then LOAD DATA or a batch Insert (INSERT ... VALUES (...), (...), ...) of 100 rows or more will insert ten times as fast. This applies to any Engine.
MyISAM is not 100 times as fast; it is not even 10 times as fast as InnoDB. Today (5.6 or newer), you would be hard pressed to find a well tuned application that is more than a little faster in MyISAM. You are, or will be, I/O-limited.
As for corruption -- No engine suffers from corruption except during a crash. A power failure may mangle MyISAM indexes, usually recoverably. Moreover, a batch insert could be half done. InnoDB will be clean -- the entire batch is done or none of it is done; no corruption.
ARCHIVE saves disk space, but costs CPU.
MEMORY is often faster because it has no I/O. But you have too much data for that Engine, correct?
MariaDB with TokuDB can probably run faster than anything I describe here; but you have not indicated the need for it.
100 rows inserted per second = 8M/day = 3 Billion/year. Will you be purging the data eventually? Will you be querying the data? Purging: Let's talk about PARTITION. Querying: Let's talk about Summary Tables.
Indexing: Minimize the number of indexes. If you have a 'random' index, such as a UUID, and you have a billion rows, you will be stuck with 100 rows/second, regardless of which Engine and regardless of any tuning. Do I need to explain further?
If this is a queuing system, I say "Don't queue it, just do it."
Bottom line: Use a InnoDB. Tune it. Use batch inserts. Avoid random indexes. etc.
You are correct that MyISAM is a faster choice if your operational use case is lots of insertions. But that answer can change drastically based on the kind of use you make of the data. If this is an archival application you might consider the ARCHIVE storage engine. It is best for write-once, read-rarely applications.
You should investigate INSERT DELAYED as it will allow your client programs to fire-and-forget these inserts rather than waiting for completion. This burns RAM in your mysqld process, though. If that style of operation meets your needs, this is a compelling reason to go with MyISAM.
Beware indexes in the target table of your inserts. Maintaining indexes is a big part of the server's insert workload.
Don't forget to look into MariaDB. It's a compatible fork of MySQL with some more advanced storage engines and features.
I have experience with a similar application. In our case, the application scaled up beyond the original insert rate, and the server could not keep up.(It's always good when an application workload grows!) We ended up doing two things, one after the other.
Using a message queuing system, and running just a couple of processes to actually do the inserts. The original clients wrote their logging records to the message queue rather than directly to the database. (Amazon AWS's SQS is an example of such a queuing system).
reworking the insert process to use LOAD DATA INFILE to load great gobs of log rows at once.
(You probably have figured out that this kind of workload isn't feasible on a cheap shared hosting service or an AWS micro instance.)
On my server, doing insert records into MySQL DB is very slow. Regarding the Server Status, InnoDB writes per second is around 20.
I am not an expert, just graduated from university. I don't have much experience on it.
How could I improve the speed of InnoDB writes? If doesn't upgrade the hardware of my server, is there any way can do it?
My server is not good, so I installed Microsoft windows server 2003 R2. The hardware info is following:
CPU: Intel Xeon E5649 2.53GHZ
RAM: 2GB
Any comments, Thank you.
Some hints:
Minimize the number of indexes - there will be less index maintenance. This is obvously a trade-off with SELECT performance.
Maximize the number of INSERTs per transaction - the "durability price" will be less (i.e. physical writing to disk can be done in the background while the rest of the transaction is still executing, if the transaction is long enough). One large transaction will usually be faster than many small transaction, but this is obviously contingent on the actual logic you are trying to implement.
Move the table to a faster storage, such as SSD. Reads can be cached, but a durable transaction must be physically written to disk, so just caching is not enough.
Also, it would be helpful if you could show us your exact database structure and the exact INSERT statement you are using.
If using InnoDB engine+local disk, try to benchmark with innodb_flush_method = O_DSYNC. With O_DSYNC our bulk inserts (surrounded by TRANSACTION) was improved.
Adjust the flush method
In some versions of GNU/Linux and Unix, flushing files to disk with
the Unix fsync() call (which InnoDB uses by default) and similar
methods is surprisingly slow. If database write performance is an
issue, conduct benchmarks with the innodb_flush_method parameter set
to O_DSYNC.
https://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-diskio.html
Modify your config for MySQL server
innodb_flush_log_at_trx_commit = 0
then Restart MySQL server
please set the innodb_buffer_pool_size to 512M. It may increase the performance
SET GLOBAL innodb_buffer_pool_size=512M
Recommendations could vary based on your implementation. Here are some notes copied directly from MySQL documentation:
Bulk Data Loading Tips
When importing data into InnoDB, make sure that MySQL does not have
autocommit mode enabled because that requires a log flush to disk for
every insert. To disable autocommit during your import operation,
surround it with SET autocommit and COMMIT statements.
Use the multiple-row INSERT syntax to reduce communication overhead
between the client and the server if you need to insert many rows:
INSERT INTO yourtable VALUES (1,2), (5,5), ...;
If you are doing a huge batch insert, try avoiding the "select from
last_insert_id" that follows the insert as it seriously slows down the
insertions (to the order of making a 6 minute insert into a 13 hour
insert) if you need the number for another insertion (a subtable
perhaps) assign your own numbers to the id's (this obviously only
works if you are sure nobody else is doing inserts at the same time).
As mentioned already, you can increase the size of the InnoDB buffer pool (innodb_buffer_pool_size variable). This is generally a good idea because the default size is pretty small and most systems can accommodate lending more memory to the pool. This will increase the speed of most queries, especially SELECTs (as more records will be kept in the buffer between queries). The insert buffer is also a section of the buffer pool and will store recently inserted records, which will increase speed if you are basing future inserts on values from previous inserts. Hope this helps :)
I have a table with 17 million rows. I need to grab 1 column of that table and insert it all into another table. Here's what I did:
INSERT IGNORE INTO table1(name) SELECT name FROM main WHERE ID < 500001
InnoDB executes in around 3 minutes and 45 seconds
However, MyISAM executes in just below 4 seconds. Why the difference?
I see everyone praising InnoDB but honestly I don't see how it's better for me. It's so much slower. I understand that it's great for integrity and whatnot, but many of my tables will not be updated (just read). Should I even bother with InnoDB?
The difference is most likely due to configuration of innoDB, which takes a bit more tweaking than myISAM. The idea of innoDB is to keep most of your data in memory, and flushing/reading to disk only when you have a few spare cpu cycles.
should you even bother with InnoDB is a really good question. If you're going to keep using MySQL, it's highly recommended you get some experience with InnoDB. But if you're doing a quick-and-dirty job for a database that won't see a lot of traffic and not worried about scale, then the ease of MyISAM may just be a win for you. InnoDB can be overkill in many instances where someone just wants a simple database.
but many of my tables will not be updated
You can still get a performance lift from InnoDB if you are doing 99% reading. If you configure your buffer pool size to hold your entire database in memory, InnoDB will NEVER have to go to disk to get your data, even if it misses the mysql query cache.
In MyISAM, there is a good chance you have to read the row from disk, and you're leaving the operating system to do the caching and optimization for you.
innodb-buffer-pool-size
My first guess is to check innodb_buffer_pool_size which ships out of the box set to 8M. It's recommended to have this around 80% of your total memory. Once you hit that limit, innodb performance will drop significantly because it needs to flush something out of the buffer to make room for the new data, which can be expensive
autocommit=0
Also, make sure autocommit is turned off while you load your table, or flushing will happen on every insert. You can turn it back on after you're done, and it's a client-side setting. very safe.
Loading tables typically happens once
Think about if you really want to tune your database to accommodate "inserting 17million rows". How often do you do this? MyISAM might be quicker in this instance, but when you have 100 concurrent connections all reading and modifying this table at the same time, you'll find a well-tuned innoDB will win and MyISAM will choke on table locks.
How MyISAM sees this operation
MyISAM will be very good at this without any tuning, because under the covers, you're simply appending each row to a file (and updating an index). Your OS and disk caching will handle all those performance problems.
How InnoDB sees this operation
Innodb will know the table needs a write, so it throws the row into the insert buffer.
You give it no time before the next insert, so innoDB has no time to deal with the buffer, it runs out of room and is forced to 'hold up' the insert while it writes to the buffer pool and updates indexes.
Next, your buffer pool fills up, and innoDB is forced to 'hold up' the insert and flush some page out of the buffer pool to disk.
And you keep throwing inserts at it like crazy.
Note that when you do tune InnoDB to give you a MySQL> prompt very fast after you do this, InnoDB will still be scrambling underneath the covers to catch up in it's spare time, but will be willing to execute a new transaction for you.
MUST READ:
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html (see bulk data loading tips)
You're saying right upto some extend. InnoDB is slower than MyISAM but in which cases?
Everything is not made to meet everyone's requirements. INNODB is a transactional database engine while MyISAM is not. Therefore to make it ACID compliance and transactions aware storage engine, we have to pay its cost in terms of response time.
Further more InnoDB runs faster if it is properly tuned using my.ini or other configuration file.
At the end I am able to understand following reasons why people are praising InnoDB:
It is ACID compliant and transaction supported engine
It take row-level locking while working on a table while MyISAM take table-level locks
InnoDB is highly tunable for multi-core/multi-process machines to improve concurrency
Last but not the least comment from my side; anything can meet "everyone's" needs so its solely depends in which scenario you're comparing both engines.
Check out MYISAM vs Innodb comparison on Wikipedia.
http://en.wikipedia.org/wiki/Comparison_of_MySQL_database_engines
I ran a lookup test against an indexed MySQL table containing 20,000,000 records, and according to my results, it takes 0.004 seconds to retrieve a record given an id--even when joining against another table containing 4,000 records. This was on a 3GHz dual-core machine, with only one user (me) accessing the database. Writes were also fast, as this table took under ten minutes to create all 20,000,000 records.
Assuming my test was accurate, can I expect performance to be as as snappy on a production server, with, say, 200 users concurrently reading from and writing to this table?
I assume InnoDB would be best?
That depends on the storage engine you're going to use and what's the read/write ratio.
InnoDB will be better if there are lot of writes. If it's reads with very occasional write, MyISAM might be faster. MyISAM uses table level locking, so it locks up whole table whenever you need to update. InnoDB uses row level locking, so you can have concurrent updates on different rows.
InnoDB is definitely safer, so I'd stick with it anyhow.
BTW. remember that right now RAM is very cheap, so buy a lot.
Depends on any number of factors:
Server hardware (Especially RAM)
Server configuration
Data size
Number of indexes and index size
Storage engine
Writer/reader ratio
I wouldn't expect it to scale that well. More importantly, this kind of thing is to important to speculate about. Benchmark it and see for yourself.
Regarding storage engine, I wouldn't dare to use anything but InnoDB for a table of that size that is both read and written to. If you run any write query that isn't a primitive insert or single row update you'll end up locking the table using MyISAM, which yields terrible performance as a result.
There's no reason that MySql couldn't handle that kind of load without any significant issues. There are a number of other variables involved though (otherwise, it's a 'how long is a piece of string' question). Personally, I've had a number of tables in various databases that are well beyond that range.
How large is each record (on average)
How much RAM does the database server have - and how much is allocated to the various configurations of Mysql/InnoDB.
A default configuration may only allow for a default 8MB buffer between disk and client (which might work fine for a single user) - but trying to fit a 6GB+ database through that is doomed to failure. That problem was real btw - and was causing several crashes a day of a database/website till I was brought in to trouble-shoot it.
If you are likely to do a great deal more with that database, I'd recommend getting someone with a little more experience, or at least oing what you can to be able to give it some optimisations. Reading 'High Performance MySQL, 2nd Edition' is a good start, as is looking at some tools like Maatkit.
As long as your schema design and DAL are constructed well enough, you understand query optimization inside out, can adjust all the server configuration settings at a professional level, and have "enough" hardware properly configured, yes (except for sufficiently pathological cases).
Same answer both engines.
You should probably perform a load test to verify, but as long as the index was created properly (meaning indexes are optimized to your query statements), the SELECT queries should perform at an acceptable speed (the INSERTS and/or UPDATES may be more of a speed issue though depending on how many indexes you have, and how large the indexes get).