MySQL database optimization best practices

MySQL database optimization best practices - mysql

What are the best practices for optimizing a MySQL installation for best performance when handling somewhat larger tables (> 50k records with a total of around 100MB per table)? We are currently looking into rewriting DelphiFeeds.com (a news site for the Delphi programming community) and noticed that simple Update statements can take up to 50ms. This seems like a lot. Are there any recommended configuration settings that we should enable/set that are typically disabled on a standard MySQL installation (e.g. to take advantage of more RAM to cache queries and data and so on)?
Also, what performance implications does the choice of storage engines have? We are planning to go with InnoDB, but if MyISAM is recommended for performance reasons, we might use MyISAM.

The "best practice" is:
Measure performance, isolating the relevant subsystem as well as you can.
Identify the root cause of the bottleneck. Are you I/O bound? CPU bound? Memory bound? Waiting on locks?
Make changes to alleviate the root cause you discovered.
Measure again, to demonstrate that you fixed the bottleneck and by how much.
Go to step 2 and repeat as necessary until the system works fast enough.
Subscribe to the RSS feed at http://www.mysqlperformanceblog.com and read its historical articles too. That's a hugely useful resource for performance-related wisdom. For example, you asked about InnoDB vs. MyISAM. Their conclusion: InnoDB has ~30% higher performance than MyISAM on average. Though there are also a few usage scenarios where MyISAM out-performs InnoDB.
InnoDB vs. MyISAM vs. Falcon benchmarks - part 1
The authors of that blog are also co-authors of "High Performance MySQL," the book mentioned by #Andrew Barnett.
Re comment from #ʞɔıu: How to tell whether you're I/O bound versus CPU bound versus memory bound is platform-dependent. The operating system may offer tools such as ps, iostat, vmstat, or top. Or you may have to get a third-party tool if your OS doesn't provide one.
Basically, whichever resource is pegged at 100% utilization/saturation is likely to be your bottleneck. If your CPU load is low but your I/O load is at its maximum for your hardware, then you are I/O bound.
That's just one data point, however. The remedy may also depend on other factors. For instance, a complex SQL query may be doing a filesort, and this keeps I/O busy. Should you throw more/faster hardware at it, or should you redesign the query to avoid the filesort?
There are too many factors to summarize in a StackOverflow post, and the fact that many books exist on the subject supports this. Keeping databases operating efficiently and making best use of the resources is a full-time job requiring specialized skills and constant study.
Jeff Atwood just wrote a nice blog article about finding bottlenecks in a system:
The Computer Performance Shell Game

Go buy "High Performance MySQL" from O'Reilly. It's almost 700 pages on the topic, so I doubt you'll find a succinct answer on SO.

It's hard to broadbrush things, but a moderately high-level view is possible.
You need to evaluate read:write ratios. For tables with ratios lower than about 5:1, you will probably benefit from InnoDB because then inserts won't block selects. But if you aren't using transactions, you should change innodb_flush_log_at_trx_commit to 1 to get performance back over MyISAM.
Look at the memory parameters. MySQL's defaults are very conservative and some of the memory limits can be raised by a factor of 10 or more on even ordinary hardware. This will benefit your SELECTs rather than INSERTs.
MySQL can log things like queries that aren't using indices, as well as queries that just take too long (user-defineable).
The query cache can be useful, but you need to instrument it (i.e. see how much it is used). Cacti can do that; as can Munin.
Application design is also important:
Lightly caching frequently fetched but smallish datasets will have a big difference (i.e. cache lifetime of a few seconds).
Don't re-fetch data that you already have to hand.
Multi-step storage can help with a high volume of inserts into tables that are also busily read. The basic idea is that you can have a table for ad-hoc inserts (INSERT DELAYED can also be useful), but a batch process to move the updates within MySQL from there to where all the reads are happening. There are variations of this.
Don't forget that perspective and context are important, too: what you might think is a long time for an UPDATE to happen might actually be quite trivial if that "long" update only happens once a day.

There are tons of best practices which have been previously discussed so there is no reason to repeat them. For actually concrete advice on what to do, I would try running MySQL Tuner. Its a perl script that you can download and then run on your database server, it will give you a bunch of statistics on how your database is performing (e.g. cache hits) along with some concrete recommendations for what issues or config parameters need to be adjusted to improve performance.
While these statistics are all available in MySQL itself, I find that this tool provides them in a much easier to understand fashion. While it is important to note that YMMV with respect to the recommendations, I have found them to generally be pretty accurate. Just make sure that you have done a good job exercising the database beforehand with realistic traffic.

Related

Is enabling innodb_dedicated_server good for performance?

From MySQL8 documentation:
When innodb_dedicated_server is enabled, InnoDB automatically configures the following variables:
innodb_buffer_pool_size
innodb_log_file_size
innodb_log_files_in_group (as of MySQL 8.0.14)
innodb_flush_method
Only consider enabling innodb_dedicated_server if the MySQL instance resides on a dedicated server
where it can use all available system resources. Enabling innodb_dedicated_server is not recommended
if the MySQL instance shares system resources with other applications.
Assuming the server is dedicated for MySQL, does enabling innodb_dedicated_server actually give better performance than tuning those parameters on my own?

Short answer: No, it does not improve performance any more than setting those tuning options yourself.
The variable innodb_dedicated_server is explained in detail when the feature was announced (2017-08-24):
https://mysqlserverteam.com/plan-to-improve-the-out-of-the-box-experience-in-mysql-8-0/
It's just a shorthand for a number of tuning options. The new variable doesn't improve performance in any special way, it's exactly the same as setting those other tuning options yourself.
I wrote this comment on the blog when they announced the feature:
I’m sorry, but I don’t like this feature at all. I understand the goal
of improving the out-of-the-box experience for naive users, but I
don’t think this solution will be successful at this goal.
Trying to pre-tune a MySQL installation with some formula is a
one-size-fits-all solution, and these kinds of solutions are
unreliable. We can recall examples of other products that have tried
to do this, but eventually removed their auto-tuning features.
It’s not a good assumption that the buffer pool needs as much physical
RAM as you can afford. You already know this, because you need the
innodb_dedicated_server option. Rick mentioned the possibility that
the dataset is already smaller than RAM. In this case, adding more RAM
has little or no benefit.
Many naive users mistakenly believe (after reading some blog) that
increasing RAM allocation always increases performance. It’s difficult
to explain to them why this is not true.
Likewise innodb log file. We assume that bigger is better, because of
benchmarks showing that heavy write traffic benefits from bigger log
files, because of delaying checkpoints. But what if you don’t have
heavy write traffic? What if you use MySQL for a blog or a CMS that is
99% reads? The large log file is unnecessary. Sizing it for an assumed
workload or dataset size has a high chance of being the wrong choice
for tuning.
I understand the difficulty of asking users questions during
installation. I recently did a project automating MySQL provisioning
with apt. It was annoying having to figure out debconf to work around
the installation prompts that do exist (btw, please document MySQL’s
debconf variables!).
There’s also the problem that even if you do prompt the user for
information, they don’t know the answers to the questions. This is
especially true of the naive users that you’re targeting with this
feature.
If the installer asks “Do you use MySQL on a dedicated server?” do
they even know what this means? They might think “dedicated” is simply
the opposite of shared hosting.
If the installer asks “Do you want to use all available memory on this
system?” you will be surprised at how many users think “memory” refers
to disk space, not RAM.
In short: (1) Using formulas to tune MySQL is error-prone. (2) Asking
users to make choices without information is error-prone.
I have an alternative suggestion: Make it easier for users to become
less naive about their choices.
I think users need a kind of friendly cheat-sheet or infographic of
how to make tuning decisions. This could include a list of questions
about their data size and workload, and then a list of performance
indicators to monitor and measure, like buffer pool page create rate,
and log file write rate. Give tips on how to measure these things,
what config options to change, and then how to measure again to verify
that the change had the desired effect.
A simple monitoring tool would also be useful. Nothing so
sophisticated as PMM or VividCortex for long-term trending, but
something more like pt-mext for quick, ephemeral measurements.
The only thing the installation process needs to do is tell the user
that tuning is a thing they need to do (many users don’t realize
this), and refer them to the cheat-sheet documentation.

Just tuning.
It is a challenging task to provide "good" defaults for everything. The biggest impediment is not knowing how much of the machine's RAM and CPU will be consumed by other products (Java, WordPress, etc, etc) running on the same server.
A large number of MySQL servers are used by big players; they separate MySQL servers from webservers, etc. This makes it simple from them to tweak a small number of tunables quickly when deploying the server.
Meanwhile, less-heavy users get decent tuning out-of-the-box by leaving that setting out.

Is mongoDB or Cassandra better than MySQL for large datasets?

In our (currently MySQL) database there are over 120 million records, and we make frequent use of complex JOIN queries and application-level logic in PHP that touch the database. We're a marketing company that does data mining as our primary focus, so we have many large reports that need to be run on a daily, weekly, or monthly basis.
Concurrently, customer service operates on a replicated slave of the same database.
We would love to be able to make these reports happen in real time on the web instead of having to manually generate spreadsheets for them. However, many of our reports take a significant amount of time to pull data for (in some cases, over an hour).
We do not operate in the cloud, choosing instead to operate using two physical servers in our server room.
Given all this, what is our best option for a database?

I think you're going the wrong way about the problem.
Thinking if you drop in NoSQL that you'll get better performance is not really true. At the lowest level, you're writing and retrieving a fair chunk of data. That implies your bottleneck is (most likely) HDD I/O (which is the common bottleneck).
Sticking to the hardware you have momentarily and using a monolithic data storage isn't scalable and as you noticed - has implications when wanting to do something in real-time.
What are your options? You need to scale your server and software setup (which is what you'd have to do with any NoSQL anyway, stick in faster hard drives at some point).
You also might want to look into alternative storage engines (other than MyISAM and InnoDB - for example, one of better engines that seemingly turn random I/O to sequential I/O is TokuDB).
Implementing faster HDD subsystem would also aid to your needs (FusionIO if you have the resources to get it).
Without more information on your end (what the server setup is, what MySQL version you're using and what storage engines + data sizes you're operating with), it's all speculation.

Cassandra still needs Hadoop for MapReduce, and MongoDB has limited concurrency with regard to MapReduce...
... so ...
... 120 mio records is not that much, and MySQL should easily be able to handle that. My guess is an IO bottleneck, or you're doing lots of random reads instead of sequential reads. I'd rather hire a MySQL techie for a month or so to tune your schema and queries, instead of investing into a new solution.
If you provide more information about your cluster, we might be able to help you better. "NoSQL" by itself is not the solution to your problem.

As much as I'm not a fan of MySQL once your data gets large, I have to say that you're nowhere near needing to move to a NoSQL solution. 120M rows is not a big deal: the database I'm currently working with has ~600M in one table alone and we query it efficiently. Managing that much data from an ops perspective is the problem; querying it isn't.
It's all about proper indexes and the correct use of them when joining, and secondarily memory settings. Find your slow queries (mysql slow query log FTW!), and learn to use the explain keyword to understand whey they are slow. Then tweak your indexes so your queries are efficient. Further, make sure you understand MySQL's memory settings. There are great pages in the docs explaining how they work, and they aren't that hard to understand.
If you've done both of those things and you're still having problems, make sure disk I/O isn't an issue. Then you should look in to another solution for querying your data if it is.
NoSQL solutions like Cassandra have a lot of benefits. Cassandra is fantastic at writing data. Scaling your writes is very easy--just add more nodes! But the tradeoff is that it's harder to get the data back out. From a cost perspective, if you have expertise in MySQl, it's probably better to leverage that and scale your current solution until it hits a limit before completely switching your underlying architecture.

Does MySQL consume significantly more resources compared to other DBMS?

There is a growing tendency for shifting from mysql to NOSQL, SQLite, etc. I have read many blogs and articles comparing the speed of mysql with other types of DBMS. However, I believe that speed is not a problem with mysql, as it is really fast; but the problem is more connected with resource usage. It is common to face extreme server load due to mysql slow queries. For instance, an advantage of Oracle over mysql is to have less problem associated with memory leaks.
Is it true that mysql consumes significantly more resources (CPU and memory) comparing with other databases such as SQLite, Non-relational databases, key/value databases. By significantly I mean is it the main reason for not using mysql for large databases (to save server costs).
If YES (to 1), what can be an estimate of better resource usage of a similar system like SQLite comparing with Mysql.
Note: Consider a simple system as advanced features of mysql is not needed. Just comparing the performance for simple queries.

If you're only using "simple" queries, I don't think there's much of a difference regarding ressource usage between MySQL and e.g. Oracle.
Those "professional" DBMS do a lot of "magic" regarding caching, prefetching and data maintanance.
Of course MySQL does that as well, but it might not be as efficient for really complex databases and advanced queries.
Your choise of DBMS highly depends on what you're planning to do, especially if you're choosing between SQL/NoSQL/Key-Value/..., which are for completely different scenarios… that's not so much a question of memory and CPU usage.

CPU and Memory never are the reason, as they are cheap. The problem is with the I/O speed. NoSQL databases are used in write-intensive applications, as well as in applications which need schema-less database (because changing the table schema in MySQL involves rewriting the table, which may be extremely slow). So some trade-offs are made to optimize the disk operations, which often lead to consuming more CPU, memory or disk space.
Another reason could be pessimistic vs optimistic locks. Which is another topic.
But since the answer to the question "Is it true that mysql consumes significantly more resources (CPU and memory) comparing with other databases" is NO, it is pointless to discuss it further :)

How to measure mySQL bottlenecks?

What mySQL server variables should we be looking at and what thresholds are significant for the following problem scenarios:
CPU bound
Disk read bound
Disk write bound
And for each scenario, what solutions are recommended to improve them, short of getting better hardware or scaling the database to multiple servers?

This is a complicated area. The "thresholds" that will affect each of your three categories overlap quite a bit.
If you are having problems with your operations being CPU bound, then you definitely need to look at:
(a) The structure of your database - is it fully normalized. Bad DB structure leads to complex queries which hit the processor.
(b) Your indexes - is everything needed for your queries sufficiently indexed. Lack of indexes can hit both the processor and the memory VERY hard. To check indexes, do "EXPLAIN ...your query". Anything row in the resulting explanation that says it isn't using an index, you need to look at closely and if possible, add an index.
(c) Use prepared statements wherever possible. These can save the CPU from doing quite a bit of crunching.
(d) Use a better compiler with optimizations appropriate for your CPU. This is one for the dedicated types, but it can glean you the odd extra percent here and there.
If you are having problems with your operations being read bound
(a) Ensure that you are caching where possible. Check the configuration variables for query_cache_limit and query_cache_size. This isn't a magic fix, but raising these can help.
(b) As with above, check your indexes. Good indexes reduce the amount of data that needs to be read.
If you having problems with your operations being write bound
(a) See if you need all the indexes you currently have. Indexes are good, but the trade-off for them improving query time, is that maintaining those indexes can impact the time spent writing the data and keeping them up to date. Normally you want indexes if in doubt, but sometimes you're more interested in rapidly writing to a table than you are in reading from it.
(b) Make possible use of INSERT DELAYED to "queue" writes to the database. Note, this is not a magic fix and often inappropriate, but in the right circumstances can be of help.
(c) Check for tables that are heavily read from and written to at the same time, e.g. an access list that update's visitor's session data constantly and is read from just as much. It's easy to optimize a table for reading from, and writing to, but not really possible to design a table to be good at both. If you have such a case and it's a bottleneck, consider whether it's possible to split its functions or move any complex operations using that table to a temporary table that you can update as a block periodically.
Note, the only stuff in the above that has a major effect, is good query design / indexing. Beyond that, you want to start considering at better hardware. In particular, you can get a lot of benefit out of a RAID-0 array which doesn't do a lot for writing bound problems, but can do wonders for read-bound problems. And it can be a pretty cheap solution for a big boost.
You also missed two items off your list.
Memory bound. If you are hitting memory problems then you must check everything that can be usefully indexed is indexed. You can also look at greater connection pooling if for some reason you're using a lot of discrete connections to your DB.
Network bound. If you are hitting network bound problems... well you probably aren't, but if you are, you need another network card or a better network.
Note, that a convenient way to analyze your DB performance is to turn on the log_slow_queries option and set long_query_time to either 0 to get everything, or 0.3 or similar to catch anything that might be holding your database up. You can also turn on log-queries-not-using-indexes to see if anything interesting shows up. Note, this sort of logging can kill a busy live server. Try it on a development box to start.
Hope that's of some help. I'd be interested in anyone's comments on the above.

Best storage engine for constantly changing data

I currently have an application that is using 130 MySQL table all with MyISAM storage engine. Every table has multiple queries every second including select/insert/update/delete queries so the data and the indexes are constantly changing.
The problem I am facing is that the hard drive is unable to cope, with waiting times up to 6+ seconds for I/O access with so many read/writes being done by MySQL.
I was thinking of changing to just 1 table and making it memory based. I've never used a memory table for something with so many queries though, so I am wondering if anyone can give me any feedback on whether it would be the right thing to do?

One possibility is that there may be other issues causing performance problems - 6 seconds seems excessive for CRUD operations, even on a complex database. Bear in mind that (back in the day) ArsDigita could handle 30 hits per second on a two-way Sun Ultra 2 (IIRC) with fairly modest disk configuration. A modern low-mid range server with a sensible disk layout and appropriate tuning should be able to cope with quite a substantial workload.
Are you missing an index? - check the query plans of the slow queries for table scans where they shouldn't be.
What is the disk layout on the server? - do you need to upgrade your hardware or fix some disk configuration issues (e.g. not enough disks, logs on the same volume as data).
As the other poster suggests, you might want to use InnoDB on the heavily written tables.
Check the setup for memory usage on the database server. You may want to configure more cache.
Edit: Database logs should live on quiet disks of their own. They use a sequential access pattern with many small sequential writes. Where they share disks with a random access work load like data files the random disk access creates a big system performance bottleneck on the logs. Note that this is write traffic that needs to be completed (i.e. written to physical disk), so caching does not help with this.

I've now changed to a MEMORY table and everything is much better. In fact I now have extra spare resources on the server allowing for further expansion of operations.

Is there a specific reason you aren't using innodb? It may yield better performance due to caching and a different concurrency model. It likely will require more tuning, but may yield much better results.
should-you-move-from-myisam-to-innodb

I think that that your database structure is very wrong and needs to be optimised, has nothing to do with the storage

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008