mysql innodb vs myisam inserts - mysql

I have a table with 17 million rows. I need to grab 1 column of that table and insert it all into another table. Here's what I did:
INSERT IGNORE INTO table1(name) SELECT name FROM main WHERE ID < 500001
InnoDB executes in around 3 minutes and 45 seconds
However, MyISAM executes in just below 4 seconds. Why the difference?
I see everyone praising InnoDB but honestly I don't see how it's better for me. It's so much slower. I understand that it's great for integrity and whatnot, but many of my tables will not be updated (just read). Should I even bother with InnoDB?

The difference is most likely due to configuration of innoDB, which takes a bit more tweaking than myISAM. The idea of innoDB is to keep most of your data in memory, and flushing/reading to disk only when you have a few spare cpu cycles.
should you even bother with InnoDB is a really good question. If you're going to keep using MySQL, it's highly recommended you get some experience with InnoDB. But if you're doing a quick-and-dirty job for a database that won't see a lot of traffic and not worried about scale, then the ease of MyISAM may just be a win for you. InnoDB can be overkill in many instances where someone just wants a simple database.
but many of my tables will not be updated
You can still get a performance lift from InnoDB if you are doing 99% reading. If you configure your buffer pool size to hold your entire database in memory, InnoDB will NEVER have to go to disk to get your data, even if it misses the mysql query cache.
In MyISAM, there is a good chance you have to read the row from disk, and you're leaving the operating system to do the caching and optimization for you.
innodb-buffer-pool-size
My first guess is to check innodb_buffer_pool_size which ships out of the box set to 8M. It's recommended to have this around 80% of your total memory. Once you hit that limit, innodb performance will drop significantly because it needs to flush something out of the buffer to make room for the new data, which can be expensive
autocommit=0
Also, make sure autocommit is turned off while you load your table, or flushing will happen on every insert. You can turn it back on after you're done, and it's a client-side setting. very safe.
Loading tables typically happens once
Think about if you really want to tune your database to accommodate "inserting 17million rows". How often do you do this? MyISAM might be quicker in this instance, but when you have 100 concurrent connections all reading and modifying this table at the same time, you'll find a well-tuned innoDB will win and MyISAM will choke on table locks.
How MyISAM sees this operation
MyISAM will be very good at this without any tuning, because under the covers, you're simply appending each row to a file (and updating an index). Your OS and disk caching will handle all those performance problems.
How InnoDB sees this operation
Innodb will know the table needs a write, so it throws the row into the insert buffer.
You give it no time before the next insert, so innoDB has no time to deal with the buffer, it runs out of room and is forced to 'hold up' the insert while it writes to the buffer pool and updates indexes.
Next, your buffer pool fills up, and innoDB is forced to 'hold up' the insert and flush some page out of the buffer pool to disk.
And you keep throwing inserts at it like crazy.
Note that when you do tune InnoDB to give you a MySQL> prompt very fast after you do this, InnoDB will still be scrambling underneath the covers to catch up in it's spare time, but will be willing to execute a new transaction for you.
MUST READ:
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/
http://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html (see bulk data loading tips)

You're saying right upto some extend. InnoDB is slower than MyISAM but in which cases?
Everything is not made to meet everyone's requirements. INNODB is a transactional database engine while MyISAM is not. Therefore to make it ACID compliance and transactions aware storage engine, we have to pay its cost in terms of response time.
Further more InnoDB runs faster if it is properly tuned using my.ini or other configuration file.
At the end I am able to understand following reasons why people are praising InnoDB:
It is ACID compliant and transaction supported engine
It take row-level locking while working on a table while MyISAM take table-level locks
InnoDB is highly tunable for multi-core/multi-process machines to improve concurrency
Last but not the least comment from my side; anything can meet "everyone's" needs so its solely depends in which scenario you're comparing both engines.

Check out MYISAM vs Innodb comparison on Wikipedia.
http://en.wikipedia.org/wiki/Comparison_of_MySQL_database_engines

Related

Which engine to be used for more than 100 insert query per second

Which engine to be used for more than 100 insert query per second
I read differences and pros and cons of MYISAM and Innodb.
But i am still confused for 100+ insert query in a table (basically for tracking purpose) which db should i use.
I refered What's the difference between MyISAM and InnoDB?
Based on my understanding, for each insert MYISAM will lock table and hence innodb should be used for row locking.
But on the otherhand performance of MYISAM are 100times better.So what should be the optimal and correct selection and why?
Simple code that does one-row INSERTs without any tuning maxes out at about 100 rows per second in any engine, especially InnoDB.
But, it is possible to get 1000 rows per second or even more.
The quick fix for InnoDB is to set innodb_flush_log_at_trx_commit = 2; that will uncork the main thing stopping InnoDB at 100 inserts/second using a commodity spinning disk. Setting innodb_buffer_pool_size to about 70% of available RAM is also important.
If a user is inserting multiple rows into the same table at the same time, then LOAD DATA or a batch Insert (INSERT ... VALUES (...), (...), ...) of 100 rows or more will insert ten times as fast. This applies to any Engine.
MyISAM is not 100 times as fast; it is not even 10 times as fast as InnoDB. Today (5.6 or newer), you would be hard pressed to find a well tuned application that is more than a little faster in MyISAM. You are, or will be, I/O-limited.
As for corruption -- No engine suffers from corruption except during a crash. A power failure may mangle MyISAM indexes, usually recoverably. Moreover, a batch insert could be half done. InnoDB will be clean -- the entire batch is done or none of it is done; no corruption.
ARCHIVE saves disk space, but costs CPU.
MEMORY is often faster because it has no I/O. But you have too much data for that Engine, correct?
MariaDB with TokuDB can probably run faster than anything I describe here; but you have not indicated the need for it.
100 rows inserted per second = 8M/day = 3 Billion/year. Will you be purging the data eventually? Will you be querying the data? Purging: Let's talk about PARTITION. Querying: Let's talk about Summary Tables.
Indexing: Minimize the number of indexes. If you have a 'random' index, such as a UUID, and you have a billion rows, you will be stuck with 100 rows/second, regardless of which Engine and regardless of any tuning. Do I need to explain further?
If this is a queuing system, I say "Don't queue it, just do it."
Bottom line: Use a InnoDB. Tune it. Use batch inserts. Avoid random indexes. etc.
You are correct that MyISAM is a faster choice if your operational use case is lots of insertions. But that answer can change drastically based on the kind of use you make of the data. If this is an archival application you might consider the ARCHIVE storage engine. It is best for write-once, read-rarely applications.
You should investigate INSERT DELAYED as it will allow your client programs to fire-and-forget these inserts rather than waiting for completion. This burns RAM in your mysqld process, though. If that style of operation meets your needs, this is a compelling reason to go with MyISAM.
Beware indexes in the target table of your inserts. Maintaining indexes is a big part of the server's insert workload.
Don't forget to look into MariaDB. It's a compatible fork of MySQL with some more advanced storage engines and features.
I have experience with a similar application. In our case, the application scaled up beyond the original insert rate, and the server could not keep up.(It's always good when an application workload grows!) We ended up doing two things, one after the other.
Using a message queuing system, and running just a couple of processes to actually do the inserts. The original clients wrote their logging records to the message queue rather than directly to the database. (Amazon AWS's SQS is an example of such a queuing system).
reworking the insert process to use LOAD DATA INFILE to load great gobs of log rows at once.
(You probably have figured out that this kind of workload isn't feasible on a cheap shared hosting service or an AWS micro instance.)

InnoDB Bottleneck: Relaxing ACID to Improve Performance

After noticing that our database has become a major bottleneck on our live production systems, I decided to construct a simple benchmark to get to the bottom of the issue.
The benchmark: I time how long it takes to increment the same row in an InnoDB table 3000 times, where the row is indexed by its primary key, and the column being updated is not part of any index. I perform these 3000 updates using 20 concurrent clients running on a remote machine, each with its own separate connection to the DB.
I'm interested in learning why the different storage engines I benchmarked, InnoDB, MyISAM, and MEMORY, have the profiles that they do. I'm also hoping to understand why InnoDB fares so poorly in comparison.
InnoDB (20 concurrent clients):
Each update takes 0.175s.
All updates are done after 6.68s.
MyISAM (20 concurrent clients):
Each update takes 0.003s.
All updates are done after 0.85s.
Memory (20 concurrent clients):
Each update takes 0.0019s.
All updates are done after 0.80s.
Thinking that the concurrency could be causing this behavior, I also benchmarked a single client doing 100 updates sequentially.
InnoDB:
Each update takes 0.0026s.
MyISAM:
Each update takes 0.0006s.
MEMORY:
Each update takes 0.0005s.
The actual machine is an Amazon RDS instance (http://aws.amazon.com/rds/) with mostly default configurations.
I'm guessing that the answer will be along the following lines: InnoDB fsyncs after each update (since each update is an ACID compliant transaction), whereas MyISAM does not since it doesn't even support transaction. MyISAM is probably performing all updates in memory, and regularly flushing to disk, which is how its speed approaches the MEMORY storage engine. If this is so, is there a way to use InnoDB for its transaction support, but perhaps relax some constraints (via configurations) so that writes are done faster at the cost of some durability?
Also, any suggestions on how to improve InnoDB's performance as the number of clients increases? It is clearly scaling worse than the other storage engines.
Update
I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.
Running the same benchmarks, we see a 10x improvement in write performance.
InnoDB (20 concurrent clients):
Each update takes 0.017s.
All updates are done after 0.98s.
Any other suggestions?
I found https://blogs.oracle.com/MySQL/entry/comparing_innodb_to_myisam_performance, which is precisely what I was looking for. Setting innodb-flush-log-at-trx-commit=2 allows us to relax ACID constraints (flushing to disk happens once per second) for the case where a power failure or server crash occurs. This gives us a similar behavior to MyISAM, but we still get to benefit from the transaction features available in InnoDB.
Running the same benchmarks, we see a 10x improvement in write performance.
InnoDB (20 concurrent clients): Each update takes 0.017s. All updates are done after 0.98s.
We have done some similar tests in our application and we noticed that if no transaction is explicitly opened, each single SQL instruction is treated inside a transaction, which takes much more time to execute. If your business logic allows, you can put several SQL commands inside a transaction block, reducing overall ACID overhead. In our case, we had great performance improvement with this approach.

MySQL - InnoDB or MyISAM - Read Only Tables

I have a database with 48 tables and 45 of the tables are InnoDB.
I have 3 MyISAM tables which range in size from 200 records to 1.5Mil and also a 6.5Mil entries.
These 3 tables contain GEO Location information and are read only (never write - unless i was to update one - extremely infrequently).
I considered changing them to InnoDB to make the database 100% the same but then read the MYiSAM is faster. Note: I don't need any of the special INNODB functions - its just selects/joins... thats it.
Should I keep these MyISAM or change them to InnoDB?
thx
MyISAM used to be faster years ago, but if you use any reasonably current version of InnoDB, then InnoDB is faster for most workloads. Here's a performance comparison from way back in 2007 that shows InnoDB already matched or bettered MyISAM in all but a few types of queries.
http://www.mysqlperformanceblog.com/2007/01/08/innodb-vs-myisam-vs-falcon-benchmarks-part-1/
Since that test in 2007, InnoDB has continued to get better, whereas the MySQL developers have spent virtually no time improving MyISAM. It's dead, Jim.
The only cases where MyISAM may be faster is when doing full table-scans, and you should try to define indexes to avoid table-scans anyway.
InnoDB has been the default storage engine in MySQL since 5.5 (circa 2010). With each major version of MySQL, it becomes more clear that MyISAM is going away.
InnoDB has many benefits even if you don't use the explicit features like transactions or foreign keys. Try this:
Execute a long-running UPDATE against a MyISAM table.
Interrupt it partway through. How many rows have been changed? Some, but not all.
Repeat the same test with an InnoDB table. How many rows have been changed? Zero!
InnoDB supports atomic changes, so every SQL statement either succeeds completely, or else rolls back. You won't get partially-completed changes.
InnoDB also support crash recovery, so you won't lose data if mysqld ever crashes. MyISAM is renowned for corrupting tables during a crash.
InnoDB also caches data in RAM (the InnoDB buffer pool), whereas MyISAM relies on the filesystem cache to speed up data I/O. This makes some queries a lot faster in InnoDB if you have enough RAM.
Use MyISAM only if you don't care about your data.
No need to change In INNODB. As you say thay have lot of records SO thay are faster as MYISAM
MyISAM in most cases will be faster than InnoDB for run of the mill sort of work. Selecting, updating and inserting are all very speedy under normal circumstances.
I wouldn't bother changing it. I was just researching the same thing and came across this useful post: http://www.kavoir.com/2009/09/mysql-engines-innodb-vs-myisam-a-comparison-of-pros-and-cons.html
The main reason you'd want Innodb would be for data integrity and to avoid locking the entire table on inserts. But if you're not doing a lot of inserts and these are not high traffic tables, then why make the change?
No change is necessary, i am working on similar project where the database is going to be used for read-only and Myisam is the best option for it.
In addition you can even use sphinx if you want faster reads.
hope this helps.

MEMORY(HEAP) vs. InnoDB in a Read and Write Environment

I want to program a real-time application using MySQL.
It needs a small table (less than 10000 rows) that will be under heavy read (scan) and write (update and some insert/delete) load. I am really speaking of 10000 updates or selects per second. These statements will be executed on only a few (less than 10) open mysql connections.
The table is small and does not contain any data that needs to be stored on disk. So I ask which is faster: InnoDB or MEMORY (HEAP)?
My thoughts are:
Both engines will probably serve SELECTs directly from memory, as even InnoDB will cache the whole table. What about the UPDATEs? (innodb_flush_log_at_trx_commit?)
My main concern is the locking behavior: InnoDB row lock vs. MEMORY table lock. Will this present the bottleneck in the MEMORY implementation?
Thanks for your thoughts!
If you're really having to have that much concurrent updates, it's almost certain that innodb will perform better, as HEAP tables only have table-level locks, not row-level locks like Innodb.
If you're starting from scratch I would investigate using MySQL 5.5 or Percona's XtraDB as they both contain many scalability improvements over the stock MySQL 5.1.
It's not just a question of row locks - InnoDB also has MVCC http://en.wikipedia.org/wiki/Multiversion_concurrency_control so the readers won't even block writers.
But I think your question is missing the all important detail - what sort of data are you storing? If you need to be able to recover post-crash MEMORY is not an option.
If you don't need to recover post crash, then why are you using a database? Why not use something like memcached or redis?

Are there any pitfalls / things you need to know when changing from MyISAM to InnoDB

One of my projects use the MyISAM engine in MySQL, but I'm considering changing it to InnoDB as I need transaction support here and there.
What should I look at or consider before doing this?
Can I just change the engine, or should the data be prepared for it?
Yes absolutely, there are many things, you should test your application extremely thoroughly:
Transactions can deadlock and need to be repeated. This is the case (in some circumstances) even with an autocommitted transaction which only inserts one row.
Disc usage will almost certainly increase
I/O load during writes will almost certainly increase
Behaviour of indexing will change because InnoDB uses clustered indexes - this may be a beneficial effect in some cases
Your backup strategy will be impacted. Consider this carefully.
The migration process itself will need to be carefully planned, as it will take a long time if you have a lot of data (during which time the data will be either readonly, or completely unavailable - do check!)
There is one big caveat. If you get any kind of hardware failure (or similar) during a write, InnoDB will corrupt tables.
MyISAM will also, but a mysqlcheck --auto-repair will repair them. Trying this with InnoDB tables will fail. Yes, this is from experience.
This means you need to have a good regular data backup plan to use InnoDB.
Some other notes:
InnoDB does not reallocate free space on the filesystem after you drop a table/database or delete a record, this can be solved by "dumping and importing" or setting innodb_file_per_table=1 in my.cnf.
Adding/removing indexes on a large InnoDB table can be quite painfull, because it locks the current table, creates a temporary one with your altered indexes and inserts data - row by row. There is a plugin from Innobase, but it works only for MySQL 5.1
InnoDB is also MUCH MORE memory intense, I suggest you to have as large innodb_buffer_pool_size variable as your server memory allows (70-80% should be a safe bet). If your server is UNIX/Linux, consider reducing sysctl variable vm.swappiness to 0 and use innodb_flush_method=O_DIRECT to avoid double buffering. Always test if you hit swap when toggling those values.You can always read more at Percona blog, which is great.
Also, you can run mysqlbackup with --single-transaction --skip-lock-tables and have no table locks while the backup is commencing.
In any case, InnoDB is great, do not let some pitfalls discourage you.
Just altering the table and setting the engine should be fine.
One of the big ones to watch out for is that select count(*) from MyTable is much slower in InnoDB than MyISAM.
auto_increment values will reset to the highest value in the table +1 after a server restart -- this can cause funny problems if you have a messy db with some deletes.
Optimum server settings are going to be different to a mainly MyISAM db.
Make sure the size of the innodb file is big enough to hold all your data or you'll be crucified by constant reallocation when you change the engines of the tables.
If you are intending to use InnoDB as a way to get concurrent queries, then you will want to set innodb_file_trx_commit=1 so you get some performance back. OTOH, if you were looking to re-code your application to be transaction aware, then deciding this setting will be part of the general performance review needed of the InnoDB settings.
The other major thing to watch out for is that InnoDB does not support FullText indices, nor INSERT DELAYED. But then, MyISAM doesn't support referential integrity. :-)
However, you can move over only the tables you need transaction aware. I've done this. Small tables (up to several thousand rows) can often be changed on-the-fly, incidentally.
The performance characteristics can be different, so you may need to keep an eye on the load.
The data will be fine.