Is enabling innodb_dedicated_server good for performance? - mysql

From MySQL8 documentation:
When innodb_dedicated_server is enabled, InnoDB automatically configures the following variables:
innodb_buffer_pool_size
innodb_log_file_size
innodb_log_files_in_group (as of MySQL 8.0.14)
innodb_flush_method
Only consider enabling innodb_dedicated_server if the MySQL instance resides on a dedicated server
where it can use all available system resources. Enabling innodb_dedicated_server is not recommended
if the MySQL instance shares system resources with other applications.
Assuming the server is dedicated for MySQL, does enabling innodb_dedicated_server actually give better performance than tuning those parameters on my own?

Short answer: No, it does not improve performance any more than setting those tuning options yourself.
The variable innodb_dedicated_server is explained in detail when the feature was announced (2017-08-24):
https://mysqlserverteam.com/plan-to-improve-the-out-of-the-box-experience-in-mysql-8-0/
It's just a shorthand for a number of tuning options. The new variable doesn't improve performance in any special way, it's exactly the same as setting those other tuning options yourself.
I wrote this comment on the blog when they announced the feature:
I’m sorry, but I don’t like this feature at all. I understand the goal
of improving the out-of-the-box experience for naive users, but I
don’t think this solution will be successful at this goal.
Trying to pre-tune a MySQL installation with some formula is a
one-size-fits-all solution, and these kinds of solutions are
unreliable. We can recall examples of other products that have tried
to do this, but eventually removed their auto-tuning features.
It’s not a good assumption that the buffer pool needs as much physical
RAM as you can afford. You already know this, because you need the
innodb_dedicated_server option. Rick mentioned the possibility that
the dataset is already smaller than RAM. In this case, adding more RAM
has little or no benefit.
Many naive users mistakenly believe (after reading some blog) that
increasing RAM allocation always increases performance. It’s difficult
to explain to them why this is not true.
Likewise innodb log file. We assume that bigger is better, because of
benchmarks showing that heavy write traffic benefits from bigger log
files, because of delaying checkpoints. But what if you don’t have
heavy write traffic? What if you use MySQL for a blog or a CMS that is
99% reads? The large log file is unnecessary. Sizing it for an assumed
workload or dataset size has a high chance of being the wrong choice
for tuning.
I understand the difficulty of asking users questions during
installation. I recently did a project automating MySQL provisioning
with apt. It was annoying having to figure out debconf to work around
the installation prompts that do exist (btw, please document MySQL’s
debconf variables!).
There’s also the problem that even if you do prompt the user for
information, they don’t know the answers to the questions. This is
especially true of the naive users that you’re targeting with this
feature.
If the installer asks “Do you use MySQL on a dedicated server?” do
they even know what this means? They might think “dedicated” is simply
the opposite of shared hosting.
If the installer asks “Do you want to use all available memory on this
system?” you will be surprised at how many users think “memory” refers
to disk space, not RAM.
In short: (1) Using formulas to tune MySQL is error-prone. (2) Asking
users to make choices without information is error-prone.
I have an alternative suggestion: Make it easier for users to become
less naive about their choices.
I think users need a kind of friendly cheat-sheet or infographic of
how to make tuning decisions. This could include a list of questions
about their data size and workload, and then a list of performance
indicators to monitor and measure, like buffer pool page create rate,
and log file write rate. Give tips on how to measure these things,
what config options to change, and then how to measure again to verify
that the change had the desired effect.
A simple monitoring tool would also be useful. Nothing so
sophisticated as PMM or VividCortex for long-term trending, but
something more like pt-mext for quick, ephemeral measurements.
The only thing the installation process needs to do is tell the user
that tuning is a thing they need to do (many users don’t realize
this), and refer them to the cheat-sheet documentation.

Just tuning.
It is a challenging task to provide "good" defaults for everything. The biggest impediment is not knowing how much of the machine's RAM and CPU will be consumed by other products (Java, WordPress, etc, etc) running on the same server.
A large number of MySQL servers are used by big players; they separate MySQL servers from webservers, etc. This makes it simple from them to tweak a small number of tunables quickly when deploying the server.
Meanwhile, less-heavy users get decent tuning out-of-the-box by leaving that setting out.

Related

Tuning MySQL Database

I am having a MySQL database which is running on a dedicated Ubuntu server having 2GB RAM and 500GB hard drive. I appreciate if anyone could help on fine tuning the database to increase the performance. Enhancements need to impact on CRUD tasks of the database, including procedure calls' and scheduled events' performances.
I have done searches on the web regarding this and found various mechanisms, tools and etc in various websites to do the job. But I need to know the proper way of escalating the performance (ex: execution time of an SQL query and etc) of a MySQL database itself without using any 3rd party tools or software. The database configurations which I am having are listed below.
MySQL version: 5.5
Used storage engine: MyISAM
Operating system: Ubuntu 12
Hard disk capacity: 500GB
RAM: 2GB
Other: The database consists of Tables, Indexes, Stored Procedures, Scheduled Events and Views
You have said nothing about the specifics of your data, its distribution, the type of workload you use, the ratio of reads to writes, the variety of your queries, the complexity of your queries, and so on. This is a vital part of the tuning process for one simple reason:
Tuning is specific to your data and your workload.
The guys who make database platforms such as MySQL pay a lot of attention to making sure the default settings are good enough for the majority of users. If there was some easy route to improving the performance of a database, they'd already have done it at the factory.
The guys who make the third party tools, on the other hand, write code that reads your data and your logs to find out information about your tables, their contents, and your queries, and that code makes best-guess estimates about tuning based on your data and your workload. They're not perfect, but they sure beat having to do that stuff manually if you don't know how to.
Think of tuning a database like tuning a guitar: You start with an idea of what you want (Standard tuning? Drop D? DADGAD?) and then you make small adjustments to one string at a time, measuring it against your desired result. Once you've achieved the best possible result for that string, you move onto the next one and make small changes there etc. When you get to the final string, you might have adjusted the balance of the whole guitar so you might have to revisit the settings from the beginning to make tiny incremental changes until the whole lot is singing perfectly.
Read http://dev.mysql.com/doc/refman/5.5/en/server-parameters.html to get started on the most important "strings" to tune in MySQL 5.5. There are lots, but none of them are particularly difficult on their own.
As a tangent, tuning your server away from the defaults might give you a 5-10% boost in performance. You'd be much better spending your time looking at your database design, data types, and the indexes you're using. You can often make 50%-100% improvements in performance by doing that sort of thing.
You should find http://www.mysqlcalculator.com/ helpful for starters.
This will show you some critical general defaults and allow you to
enter your own values as displayed by
SHOW GLOBAL VARIABLES
to calculate MySQL maximum memory usage.
This will only scratch the surface - and will be enlightening.
There is NO simple answer.

Maximum capabilities of MySQL

How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.

What is optimal isolation level for MySql using InnoDB running Moodle 1.9.X

Which InnoDB isolation level should be used with Moodle 1.9.X. The default is REPEATABLE READ, is it save, however, to use READ COMMITTED for better performace?
You wont get a sensible answer.... without.... getting.... more detailed. This REALLY depends no the usage of the database - you may even MIX them. Read only fast transactions in a web application, for example...
you only read, no write when creating the form
you dont need repeatable read, as you only load drop downs (as example)
=> no need for more isolation than ReadCommited.
OTOH if you do complex processing, and updates, then ReadCommited may not be good enough.
I have seen application using multiple different levels in different parts.
Moodle will run on myisam, so the answer is 'probably yes, but it is probably easier to increase performance through other means and getting support with other issues on moodle.org may be harder once you do this.'
What you might want to do is some profiling.
Download and install XDebug, a PHP extension for tracing and profiling PHP functions.
More details about the Xdebug profiler are available here.
With Xdebug, it's really easy to find bottlenecks and to understand how much a function or an operation is heavy for both memory and CPU.
Play with the parameters, try different settings and profile!
Also, please share the results with the Moodle community.

MySQL database optimization best practices

What are the best practices for optimizing a MySQL installation for best performance when handling somewhat larger tables (> 50k records with a total of around 100MB per table)? We are currently looking into rewriting DelphiFeeds.com (a news site for the Delphi programming community) and noticed that simple Update statements can take up to 50ms. This seems like a lot. Are there any recommended configuration settings that we should enable/set that are typically disabled on a standard MySQL installation (e.g. to take advantage of more RAM to cache queries and data and so on)?
Also, what performance implications does the choice of storage engines have? We are planning to go with InnoDB, but if MyISAM is recommended for performance reasons, we might use MyISAM.
The "best practice" is:
Measure performance, isolating the relevant subsystem as well as you can.
Identify the root cause of the bottleneck. Are you I/O bound? CPU bound? Memory bound? Waiting on locks?
Make changes to alleviate the root cause you discovered.
Measure again, to demonstrate that you fixed the bottleneck and by how much.
Go to step 2 and repeat as necessary until the system works fast enough.
Subscribe to the RSS feed at http://www.mysqlperformanceblog.com and read its historical articles too. That's a hugely useful resource for performance-related wisdom. For example, you asked about InnoDB vs. MyISAM. Their conclusion: InnoDB has ~30% higher performance than MyISAM on average. Though there are also a few usage scenarios where MyISAM out-performs InnoDB.
InnoDB vs. MyISAM vs. Falcon benchmarks - part 1
The authors of that blog are also co-authors of "High Performance MySQL," the book mentioned by #Andrew Barnett.
Re comment from #ʞɔıu: How to tell whether you're I/O bound versus CPU bound versus memory bound is platform-dependent. The operating system may offer tools such as ps, iostat, vmstat, or top. Or you may have to get a third-party tool if your OS doesn't provide one.
Basically, whichever resource is pegged at 100% utilization/saturation is likely to be your bottleneck. If your CPU load is low but your I/O load is at its maximum for your hardware, then you are I/O bound.
That's just one data point, however. The remedy may also depend on other factors. For instance, a complex SQL query may be doing a filesort, and this keeps I/O busy. Should you throw more/faster hardware at it, or should you redesign the query to avoid the filesort?
There are too many factors to summarize in a StackOverflow post, and the fact that many books exist on the subject supports this. Keeping databases operating efficiently and making best use of the resources is a full-time job requiring specialized skills and constant study.
Jeff Atwood just wrote a nice blog article about finding bottlenecks in a system:
The Computer Performance Shell Game
Go buy "High Performance MySQL" from O'Reilly. It's almost 700 pages on the topic, so I doubt you'll find a succinct answer on SO.
It's hard to broadbrush things, but a moderately high-level view is possible.
You need to evaluate read:write ratios. For tables with ratios lower than about 5:1, you will probably benefit from InnoDB because then inserts won't block selects. But if you aren't using transactions, you should change innodb_flush_log_at_trx_commit to 1 to get performance back over MyISAM.
Look at the memory parameters. MySQL's defaults are very conservative and some of the memory limits can be raised by a factor of 10 or more on even ordinary hardware. This will benefit your SELECTs rather than INSERTs.
MySQL can log things like queries that aren't using indices, as well as queries that just take too long (user-defineable).
The query cache can be useful, but you need to instrument it (i.e. see how much it is used). Cacti can do that; as can Munin.
Application design is also important:
Lightly caching frequently fetched but smallish datasets will have a big difference (i.e. cache lifetime of a few seconds).
Don't re-fetch data that you already have to hand.
Multi-step storage can help with a high volume of inserts into tables that are also busily read. The basic idea is that you can have a table for ad-hoc inserts (INSERT DELAYED can also be useful), but a batch process to move the updates within MySQL from there to where all the reads are happening. There are variations of this.
Don't forget that perspective and context are important, too: what you might think is a long time for an UPDATE to happen might actually be quite trivial if that "long" update only happens once a day.
There are tons of best practices which have been previously discussed so there is no reason to repeat them. For actually concrete advice on what to do, I would try running MySQL Tuner. Its a perl script that you can download and then run on your database server, it will give you a bunch of statistics on how your database is performing (e.g. cache hits) along with some concrete recommendations for what issues or config parameters need to be adjusted to improve performance.
While these statistics are all available in MySQL itself, I find that this tool provides them in a much easier to understand fashion. While it is important to note that YMMV with respect to the recommendations, I have found them to generally be pretty accurate. Just make sure that you have done a good job exercising the database beforehand with realistic traffic.

Concurrency handling using the filesystem VS an RDMBS (MySQL)

I'm building an English web dictionary where users can type in words and get definitions. I thought about this for a while and since the data is 100% static and I was only to retrieve one word at a time I was better off using the filesystem (ext3) as the database system instead of opting to use MySQL to store definitions. I figured there would be less overhead considering that you have to connect to MySQL and that in itself is a very slow operation.
My fear is that if my system were to get bombarded by let's say 500 word retrievals/sec, would I still be better off using the filesystem as the database? or will the increased filesystem reads hinder performance as opposed to something that MySQL might be doing under the hood?
Currently the hierarchy is segmented by first letter, second letter and third letter of the word. So if you were to search for the definition of "water", the script (PHP) will try to read from "../dict/w/a/t/water.word" (after cleaning up the word of problematic characters and lowercasing it)
Am I heading in the right direction with this or is there a faster solution (not counting storing definitions in memory using something like memcached)? Will the amount of files stored in any directory factor in performance? What's the rough benchmark for the number of files that I should store in a directory?
What are your grounds for your belief that this decision will matter to the overall performance of the solution? WHat does it do other than provide definitions?
Do you have MySQL as part of the solution anyway, or would you need to add it should you select it as the solution here?
Where is the definitive source of definitions? The (maybe replicated) filesystem, or some off line DB?
It seems like something that should be in a DB architecturally - filesystems are a strange place to map a large number of names to values (as is evidenced by your file system structure breaking things down by initial letters)
If it's in the DB, answering questions like "how many definitions are there?" is a lot easier, but if you don't care about such things for your application, this may not matter.
So to some extent this feels like looking to hyper optimise the performance of something whose performance won't actually make much difference to the overall solution.
I'm a fan of "make it correct, then make it fast", and "correct" would be more straightforward to achieve with a DB.
And of course, the ultimate answer would to be try both and see which one works best in your situation.
Paul
The type of lookups that a dictionary requires is exactly what a database is good at. I think the filesystem method you describe will be unworkable. Don't make it hard! Use a Database.
You can keep a connection pool around to speed up connecting to the DB.
Also, if this application needs to scale to multiple servers, the file system may be tricky to share between servers.
So, I third the suggestion. Use a DB.
But unless it's a fabulously large dictionary, caching would mean you're nearly alwys getting stuff from local memory, so I don't think this is going to be the biggest issue for your application :)
A DB sounds perfect for your needs.
I also don't see why memcached is relevant (how big is your data? Can't be more than a few GB... right?)
The data is approximately a couple of GBs. And my goal is speed, speed, speed (definitions will be loaded using XHR). The data as I said is static and is never going to change, and in no where would I using anything other than a single read operation for each request. So I'm having a pretty hard time getting convinced of using MySQL and all its bloat.
Which would be first to fail under high load using this strategy, the filesystem or MySQL? As for scaling replication is the answer since the data will never change and is only a couple of GBs.
Make it work first. Premature optimisation is bad.
Using a database enables easier refactoring of your schema, and you don't have to write an implementation of an index-based lookup, which in actual fact is nontrivial.
Saying that connecting to a database "is a very slow operation" overstates the problem. Actually connecting should not take very long, plus you can reuse connections anyway.
If you are worried about read-scaling, a 1G database is very small, so you can push readonly replicas of it to each web server and they can each read from their local copy. Provided the writes stay at a level which doesn't impact read performance, that gives you almost perfect read-scalability.
Moreover, 1G of data will fit into ram easily, so you can make it fast by loading the entire database into memory at startup time (before that node advertises itself to the load balancer).
500 lookups per second is trivially small. I would start worrying about 5000 per second per server, maybe. If you can't achieve 5000 key lookups per second on modern hardware (from a database which fits in RAM?!!), there is something seriously wrong with your implementation.
Agreeing that this is premature optimization, and that MySQL surely will be performant enough for this use case. I must add you can also use a file based database, like the very fast Tokyo Cabinet as a compromise. Sadly it doesn't have a PHP binding so you could use its grandfather, DBM.
That said, do not use a filesystem, there's no good reason to, as far as I can see.
Use a virtual Drive in your ram (google it for a how to for your distro) or if your data is provided by PHP use APC, memcache might work well with mysql. Personally I don't think the optimization you are doing here is really where you should be spending your time. 500 requests a second is massive, I think using mysql would give you better forward features for later. I think you need to concentrate on features and not speed if you want to differentiate yourself from your competitors. Also there are a few good talks about UI for the web, the server speed is only a small factor in the whole picture.
Good luck
You might also think about a no-sql database (like riak, mongo, or even redis) for something like this. They are all super-fast and help out with your replication. Mysql might be over-kill and hard-to-scale in an instance like this, but the other ones have some robust tools