How-to's for MySQL InnoDB (insert) performance optimization? - mysql

I'm still struggling with the performance of my MySQL database using the InnoDB engine. Especially the performance of data insertion, minor the performance of running queries.
I've been googling for information, how-to's and so on, but I found most information rather profound matter. Can I find somewhere on the net some basic information for "newbies", a starting point for performance optimization? The first, most import steps for InnoDB optimization, explained in a less complicated way.
I'm using the Windows platform

I used to manage a couple very large MySQL Databases (like, 1TB+). They were huge, unforgiving beasts with an endless appetite to cause me stomach problems.
I read everything I could find on MySQL Performance Tuning and innodb. Here's a summary of what helped me:
The book High Performance MySQL is good, but only gets you so far.
The blog MySQL Performance Blog (this link is to their posts tagged 'innodb') was the most useful overall resource I found on the net. They go into detail on a lot of innodb tuning issues. It gets 'ranty' at times, but overall it's great. Here's another link there on InnoDB Performance Optimization Basics that's good.
The last main thing I did to learn it was to simply read the MySQL Docs themselves. I read how every last parameter works, changed them on my server and then did some basic profiling. After a while you figure out what works by running big queries and seeing what happens. Here's a good place to start:
InnoDB Performance Tuning and Troubleshooting
In the end, it's just experimenting and working through things until you gain enough knowledge to know what works.

For newbies: innodb_flush_log_at_trx_commit=0, if you can afford to lose up to 1 second of your work if server crashes. This is the performace vs reliability tradeoff, but it will improve your write performance hugely. If you can afford battery backed write cache, use it.
Specifically on Windows, and for write performance, MariaDB 5.3 might be a better idea than stock MySQL from Oracle, since MariaDB is able to better utilize asynchronous IO on Windows. I wrote a note about it some time ago here, on standard synthetic benchmark it performs up to 500% better than stock MySQL 5.5 (see pictures at the end of the note).
However, the first and foremost thing that kills performance is the disk flushing. This is solvable if you relax durability with *innodb_flush_log_at_trx_commit* parameter, of with battery backed write cache. Also you might consider using larger transactions, they reduce the amount of disk flushes.

Try the MySQL Primer script: http://day32.com/MySQL/

I didn't use the 'net, I used books. :)
The book I used to learn MySQL is "Beginning MySQL" from Wrox Press, by Robert Sheldon and Geoff Moes. Chapter 15 goes into some basics of optimization. I liked this book a lot and think it would be good reading and has been my #1 reference. But it isn't very storage engine specific.
I have another book, Pro MySQL from apress that goes into a lot more detail about particular storage engines, but it is also much harder to read. Still a good reference though.

Related

MySQL changing large table to InnoDB

I have a MySQL server running on CentOS which houses a large (>12GB) DB. I have been advised to move to InnoDB for performance reasons as we are experiencing lockups where the application that relies on the DB becomes unresponsive when the server is busy.
I have been reading around and can see that the ALTER command that changes the table to InnoDB is likely to take a long time and hammer the server in the process. As far as I can see, the only change required is to use the following command:
ALTER TABLE t ENGINE=InnoDB
I have run this on a test server and it seems to complete fine, taking about 26 minutes on the largest of the tables that needs to be converted.
Having never run this on a production system I am interested to know the following:
What changes are recommended to be made to the MySQL config to take advantage of additional performance of InnoDB tables? The server currently has 3GB assigned to InnoDB cache - was thinking of increasing this to 15GB once the additional RAM is installed.
Is there anything else I should do to the server with this change?
I would really recommend using either Percona MySQL or MariaDB. Both have tools that will help you get the most out of InnoDB, as well as some tools to help you diagnose and optimize your database further (for example, Percona's Online Schema Change tool could be used to alter your tables without downtime).
As far as optimization of InnoDB, I think most would agree that innodb_buffer_pool_size is one of the most important parameters to tune (and typically people set it around 70-80% of total available memory, but that's not a magic number). It's not the only important config variable, though, and there's really no magic run_really_fast setting. You should also pay attention to innodb_buffer_pool_instances (and there's a good discussion about this topic on https://dba.stackexchange.com/questions/194/how-do-you-tune-mysql-for-a-heavy-innodb-workload)
Also, you should definitely check out the tips offered in the MySQL documentation itself (http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb.html). It's also a good idea to pay attention to your InnoDB hit ratio (Rolado over at DBA Stackexchange has a great answer on this topic, eg, https://dba.stackexchange.com/questions/65341/innodb-buffer-pool-hit-rate) and analyze your slow query logs carefully. Towards that later end, I would definitely recommend taking a look at Percona again. Their slow query analyzer is top notch and can really give you a leg up when it comes to optimizing SQL performance.

MySQL was shut down during Index - How to fix the index?

MySQL was shut down in the middle of an indexing operation.
It still works but some of the queries seem much slower than before.
Is there anything particular we can check?
Is it possible that an index gets half way through?
Thanks much
As a suggested in my comment, you could try a repair on the relevant table(s).
That said, there's a section of the MySQL manual dedicated to this precise topic, which details how to use the REPAIR <table> statement and indeed dump/re-import.
Is this doesn't make any difference, you may need to check the database settings (if it's a InnoDB engined table/database, it'll love being able to be resident in memory for example) and perhaps try to see what specific indexes are being used via an EXPLAIN on the queries that are causing pain.
There are also commercial tools such as New Relic that'll show what specific queries are being sluggish in quite a lot of detail as well as monitoring other aspects of your system, which may be worth exploring if this is a commercial project/web site.

Maximum capabilities of MySQL

How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.

Are the consistency/data loss/query optimization issues I read about "that bad"?

As I've been looking into the differences between Postgres and MySQL, it has struck me that, if what I read is to be believed, MySQL should be (disclaimer: by reading the rest of this sentence, you agree to read the next paragraph as well) the laughingstock of the RMDB world: it doesn't enforce ACID by default, the net is rife with stories of MySQL-related data loss and by all accounts and the query optimizer is a joke.
But none of this seems to matter. It's not hard to tell that MySQL has about a million times* as much hype as Postgres (it's LAMP, not LAPP), big installations of MySQL are not unheard of (LJ? Digg?) and I haven't noticed a drop in MySQL's popularity.
This makes me wonder: are these "problems" with MySQL really that bad?
So, if you have used MySQL for a reasonably large project**, what was your experience like? Did you use Postgres as well? How was it worse? How was it better?
*: [citation needed]
**: I'm well aware that, for "small things" (blogs, what have you), MySQL (along with practically every other RDB) is just fine.
Since it's tagged [subjective], I'll be subjective. For me it's about the little things. PostgreSQL is more developer friendly and makes it easy to do the right thing regarding data integrity by default.
If you give MySQL an incorrect type, it will implicitly convert it even if the conversion is incorrect. PostgreSQL will complain.
EXPLAIN in PostgeSQL is way more useful than in MySQL. It gives you the exact structured query plan. What kind of algorithm will it use, what cost does does each step have, etc. This means that if the query optimizer in MySQL doesn't do what you think it does, you will have hard time to debug it.
If you ever wrote anything more complex in the MySQL stored procedure language, you will know how painful it is. PL/pgSQL is actually a nice language + you can use many other languages.
MySQL doesn't have sequences, so if you need them you have to roll your own. Most people will do it wrong and have race conditions in their code.
PostgreSQL exposes most of it's internal lock types to the developer. If you need to lock your table in a special way, you can do that.
Everything is programmable in PostgreSQL. For example, if you need your own data type for some specific data, you can add it. You can add casts and operators for the data types. Probably not worth the effort for small projects, but it's better than storing things as strings.
PostgreSQL adds every action including DDL changes to a transaction, unlike MySQL. If you have a conversion script that creates/drops tables, BEGIN/END won't help you in MySQL to keep it in consistent state.
That doesn't mean it's impossible to write good database applications with MySQL, it just requires more effort.
MySQL can be used for reasonably large applications, provided you really know what you do and don't trust the defaults.
MySQL defaults are optimized to be easy-to-use and to get started quickly and to provide best performance (usually). Other databases choose defaults that are at the very least ACID and are scalable (i.e. choose defaults that are not necessarily the best/fastest for small data sets)
Another item is that MySQL only learned to be a "real database" relatively recent, while almost all competing products started life with full ACID in mind.
MySQL had problems with almost all aspects of ACID at one time or another. Most of them are gone or can be configured away, but you will have to check each one. The problem with troubles in atomicity for example is that you will not notice them until you place your system under heavy load (which often coincides with it being a production system, unfortunately).
So my summary would be: MySQL is capable of working in this environments, but it takes work. And the path it took to get to that point cost it quite a few points in the confidence area.
Provided you know what its capabilities are, then it may fit your use case.
If used correctly, then it is ACID compliant. If used incorrectly, it is not. The trouble is, that people seem to assume that it's a good thing to have ACID compliance.
In reality ACID is often the enemy of performance (Particularly the D for durability). By relaxing durability very slightly, we can typically get a very large performance boost.
Likewise, even using the MyISAM engine (which doesn't have much by way of durability, and not a lot of the others either) is still appropriate to some problem domains.
We are using MySQL in some applications - and it is doing a pretty good job.
In the newer projects we are using the InnoDB engine - and albeit it may be slower than the default engine it is working well.
Right now we are using an ORM mapper - and so most of the complexity is hidden behind the ORM mapper (and working nice).
I think the infrastructure (Tools and information) is one of MySQL's big plusses: we are using really nice tools: Toad for MySQL and MySQL Administrator.
Altough I have to admit that I had a shocking experience last week when helping a friend with a SQL statement and the correleated subquery nearly stopped his MySQL server - but with the trick of enclosing it in another query - it worked really well.
This is nothing which REALLY shocks me - because I've used other DB systems which cost big bucks (I'm looking at you - DB2) - and they had other things to work around. (maybe not as drastic - but still you had to optimize for them).
I haven't used both for a single large project, but having used both I have some idea of how they compare.
In general almost all MySQL's problems can be worked around with good discipline. The issue is more that developer has to know all the gotchas and work around them. After working with PostgreSQL or Oracle this feels a bit like death by a thousand papercuts. You get that used to stuff just working.
This is a pretty significant issue in the types of stuff that I have worked on. Complex schemas with complex queries and lots of data. tight schedules with little time for performance engineering meaning that getting consistently reasonable performance without having to manually optimize queries is important. A good cost based optimizer is almost a requirement. Combine that with quite a lot of outsourcing with development teams that don't have the experience to catch all the gotchas in time and the little issues escalate to large QA problems. Hitting any of MySQL silent data corruption gotchas in production is something that really scares me. I'll take any declarative constraints at the database level that I can get to have atleast some safety net, MySQL unfortunately falls short on that.
PostgreSQL has the added benefit that it can run significantly more algorithms using more advanced data-structures in the database. Most of our large projects have a few cases where MySQL will hit its limits. Moving the algorithms outside the database requires considerably more effort with pretty tricky code involving correct locking and synchronization. In particular I have at one time or another hit the need for partial indexes, indexes on expressions, custom aggregate functions, set returning stored procedures, array and hash datatypes, inverted indexes on array values, update/delete-returning, deferrable foreign key constraints.
On the other hand MySQL has at least for now a better story for scale out. If I had to support a huge number users on a reasonably simple application, and had the team to build a heavily partitioned and replicated database with eventual consistency, I'd pick MySQL over PostgreSQL for the low level data storage building block. On the other the competitors in that space are the key-value databases.
are these "problems" with MySQL really that bad?
Actually, the pain MySQL will inflict on you can range from moderate to insane, and much of it depends on MyISAM.
I find a good rule of thumb is this :
are you backing up some MyISAM tables ?
MyISAM is great for data you don't really care about, like traffic logs and the like, or for data that you can easily restore in case of a problem since it's read-only and hence never changed since the time you loaded that 10GB dump. In those cases the compact row format of MyISAM brings great space savings (that however do not translate into faster seq scan speed, for some reason).
If the data you put in MyISAM tables is worth backing up, you are going to enter in a world of hurt when you realize some day that it is all inconsistent because of the lack of FK and constraint checks, and incidentally all your backups will contain inconsistent data too.
If you make lots of concurrent updates to MyISAM tables, then you are gonna go way past the world of hurt stage : when the load reaches a certain threshold, you are doomed. Of course the readers block writers which block readers which block queued writers, etc, so the performance is bad, load avg goes to 200, and your box is nuked, but also I could consistently crasy MyISAM tables in a benchmark I wrote 2 years ago just by hitting them with too much load. Random data ensued, sometimes crashing the mysql on selects or spewing random errors.
So, if you avoid MyISAM like the plague it is, the problems with MySQL aren't really that bad. InnoDB is robust. However, generally I find it inferior to Postgres, which is faster and has so many less gotchas, and Gets The Job Done easier and faster.
No, the issues you mention are NOT a big deal. See Google and Facebook as two examples of companies that are using MySQL to accomplish Herculean tasks you'll only ever dream of encountering.
I use the following rules when running a MySQL to prevent headaches down the line:
Take daily, weekly, monthly snapshots of database. More often than not the problems you'll run in to have nothing to do with MySQL, instead it's a boneheaded developer running:
DELETE FROM mytable; # Where is the WHERE?
Use InnoDB by default, the only reason to use MyISAM is for full text search.
Get your database schema under source control.

MySQL database optimization best practices

What are the best practices for optimizing a MySQL installation for best performance when handling somewhat larger tables (> 50k records with a total of around 100MB per table)? We are currently looking into rewriting DelphiFeeds.com (a news site for the Delphi programming community) and noticed that simple Update statements can take up to 50ms. This seems like a lot. Are there any recommended configuration settings that we should enable/set that are typically disabled on a standard MySQL installation (e.g. to take advantage of more RAM to cache queries and data and so on)?
Also, what performance implications does the choice of storage engines have? We are planning to go with InnoDB, but if MyISAM is recommended for performance reasons, we might use MyISAM.
The "best practice" is:
Measure performance, isolating the relevant subsystem as well as you can.
Identify the root cause of the bottleneck. Are you I/O bound? CPU bound? Memory bound? Waiting on locks?
Make changes to alleviate the root cause you discovered.
Measure again, to demonstrate that you fixed the bottleneck and by how much.
Go to step 2 and repeat as necessary until the system works fast enough.
Subscribe to the RSS feed at http://www.mysqlperformanceblog.com and read its historical articles too. That's a hugely useful resource for performance-related wisdom. For example, you asked about InnoDB vs. MyISAM. Their conclusion: InnoDB has ~30% higher performance than MyISAM on average. Though there are also a few usage scenarios where MyISAM out-performs InnoDB.
InnoDB vs. MyISAM vs. Falcon benchmarks - part 1
The authors of that blog are also co-authors of "High Performance MySQL," the book mentioned by #Andrew Barnett.
Re comment from #ʞɔıu: How to tell whether you're I/O bound versus CPU bound versus memory bound is platform-dependent. The operating system may offer tools such as ps, iostat, vmstat, or top. Or you may have to get a third-party tool if your OS doesn't provide one.
Basically, whichever resource is pegged at 100% utilization/saturation is likely to be your bottleneck. If your CPU load is low but your I/O load is at its maximum for your hardware, then you are I/O bound.
That's just one data point, however. The remedy may also depend on other factors. For instance, a complex SQL query may be doing a filesort, and this keeps I/O busy. Should you throw more/faster hardware at it, or should you redesign the query to avoid the filesort?
There are too many factors to summarize in a StackOverflow post, and the fact that many books exist on the subject supports this. Keeping databases operating efficiently and making best use of the resources is a full-time job requiring specialized skills and constant study.
Jeff Atwood just wrote a nice blog article about finding bottlenecks in a system:
The Computer Performance Shell Game
Go buy "High Performance MySQL" from O'Reilly. It's almost 700 pages on the topic, so I doubt you'll find a succinct answer on SO.
It's hard to broadbrush things, but a moderately high-level view is possible.
You need to evaluate read:write ratios. For tables with ratios lower than about 5:1, you will probably benefit from InnoDB because then inserts won't block selects. But if you aren't using transactions, you should change innodb_flush_log_at_trx_commit to 1 to get performance back over MyISAM.
Look at the memory parameters. MySQL's defaults are very conservative and some of the memory limits can be raised by a factor of 10 or more on even ordinary hardware. This will benefit your SELECTs rather than INSERTs.
MySQL can log things like queries that aren't using indices, as well as queries that just take too long (user-defineable).
The query cache can be useful, but you need to instrument it (i.e. see how much it is used). Cacti can do that; as can Munin.
Application design is also important:
Lightly caching frequently fetched but smallish datasets will have a big difference (i.e. cache lifetime of a few seconds).
Don't re-fetch data that you already have to hand.
Multi-step storage can help with a high volume of inserts into tables that are also busily read. The basic idea is that you can have a table for ad-hoc inserts (INSERT DELAYED can also be useful), but a batch process to move the updates within MySQL from there to where all the reads are happening. There are variations of this.
Don't forget that perspective and context are important, too: what you might think is a long time for an UPDATE to happen might actually be quite trivial if that "long" update only happens once a day.
There are tons of best practices which have been previously discussed so there is no reason to repeat them. For actually concrete advice on what to do, I would try running MySQL Tuner. Its a perl script that you can download and then run on your database server, it will give you a bunch of statistics on how your database is performing (e.g. cache hits) along with some concrete recommendations for what issues or config parameters need to be adjusted to improve performance.
While these statistics are all available in MySQL itself, I find that this tool provides them in a much easier to understand fashion. While it is important to note that YMMV with respect to the recommendations, I have found them to generally be pretty accurate. Just make sure that you have done a good job exercising the database beforehand with realistic traffic.