Changing from BLOB to filestorage - mySQL tweaks - mysql

I took over a project some time ago, where file binaries were stored as BLOBs. Those were from sizes of 0.5-50 Mb, therefore that table was touched as less as possible (-> eBeans lazyloading etc). The BLOB stuff worked quite fine as long as the whole system was running on one dedicated server, once we switched to AWS EC2 instances + RDS, things were (obviously) slower.
Therefore I switched the data storage from BLOB to S3 (+ reference to the bucket/key stored in the DB), which is much faster for our backend and our clients.
Now my question is, obviously the programmer before set up the mySQL DB to handle bigger chunks of data (max packagesize etc), and I also stumbled over some discussion about connection pool size.
What are critical parameters to check in the mySQL setup, and what are effective ways of evaluating them?

The most likely answer to your question is "change nothing."
MySQL has many, many, many "tunable" parameters, and there is an absolute wealth of bad advice available online about "optimizing" them. But this is a temptation best avoided.
To the extent that system variables have been changed from defaults, if you ever find yourself in a situation where you believe tweaking the configuration is necessary, your first instinct should be to revert settings to their defaults unless you have a specific and justified reason not to.
Settings like max_allowed_packet if set too small, will break some things (like large blobs) but if set larger than necessary will have little or no impact... the "excess" isn't allocated or otherwise harmful. In the case of max_allowed_packet, this does impose a constraint on memory use by limiting the amount of memory the server would ever need to allocate for a single packet, but since it's a brick wall limit, you don't necessarily want to shrink it. If you aren't sending packets that big, it isn't hurting anything.
It is safe to increase the value of this variable because the extra memory is allocated only when needed. For example, mysqld allocates more memory only when you issue a long query or when mysqld must return a large result row. The small default value of the variable is a precaution to catch incorrect packets between the client and server and also to ensure that you do not run out of memory by using large packets accidentally.
http://dev.mysql.com/doc/refman/5.7/en/packet-too-large.html
Other parameters, though, can have dramatically counter-intuitice negative effects because the range of "valid" values is a superset of the range of "optimal" values. The query cache is a prime example of this. "But it's more cache! How can that be bad?!" Well, a bigger house increases the amount of housework you have to do, and the query cache is a big house with only one tiny broom (a global mutex that each thread contends for when entering and leaving).
Still others, like innodb_buffer_pool_size only really have one relatively small optimal range of values for a given server. Too small will increase disk I/O and impair performance because the pool is smaller than the system could support, too large will increase disk I/O due to the server using swap space or crash it entirely by exhausting the system of every last available kilobyte of free RAM.
Perhaps you get the idea.
Unless you have a specific parameter that you believe may be suboptimally configured, leave a working system working. If you change things, change them one at a time, and prove or disprove that each change was a good idea before proceeding. If you are using a non-default value, consider the default as a potentially good candidate value.
And stay away from "tuning scripts" that make suggestions about parameters you should change. Those are interesting to look at, but their advice is often dangerous. I've often thought about writing my own one of these, but all it would do is check for values not set to the default and tell the user to explain themselves or set it back. :) Maybe that would catch on.

Related

Apache & MySQL with Persistent Disks to Multiple Instances

I plan on mount persistent disks into folders Apache(/var/www) and Mysql (/var/lib/mysql) to avoid having to replicate information between servers.
Anyone has done tests to know the I/O performance of persistent disk is similar when attaching the same disk to 100 instances as well as only 2 instances? Also has a limit of how many instances can be attach one persistent disk?
I'm not sure exactly what setup you're planning to use, so it's a little hard to comment specifically.
If you plan to attach the same persistent disk to all servers, note that a disk can only be attached to multiple instances in read-only mode, so you may not be able to use temporary tables, etc. in MySQL without extra configuration.
It's a bit hard to give performance numbers for a hypothetical configuration; I'd expect performance would depend on amount of data stored (e.g. 1TB of data will behave differently than 100MB), instance size (larger instances have more memory for page cache and more CPU for processing I/O), and access pattern. (Random reads vs. sequential reads)
The best option is to set up a small test system and run an actual loadtest using something like apachebench, jmeter, or httpperf. Failing that, you can try to construct an artificial load that's similar to your target benchmark.
Note that just running bonnie++ or fio against the disk may not tell you if you're going to run into problems; for example, it could be that a combination of sequential reads from one machine and random reads from another causes problems, or that 500 simultaneous sequential reads from the same block causes a problem, but that your application never does that. (If you're using Apache+MySQL, it would seem unlikely that your application would do that, but it's hard to know for sure until you test it.)

MySQL scale up or scale out?

I have been tasked with investigating reasons why our internal web application is hitting performance problems.
The web application itself is part written in PHP and part written in Perl, and we have a MySQL database which is where I believe the source of performance hit is occurring.
We have about 400 users of the system, of which, most are spread across different timezones, so generally there are only ever a max of 30 users online at any one time. The performance problems have crept up on us, particularly over the past year as the database keeps growing.
The system is running on one single 32-bit debian server - 6GB of RAM, with 8 x 2.4GHz intel CPU. This is probably not hefty enough for the job in-hand. However, even at times where I am the only user online, page loading time can still be slow.
I'm trying to determine whether we need to scale up or scale out. Firstly, I'd like to know is how well our hardware is coping with the demands placed upon it. And secondly, whether it might be worth scaling out and creating some replication slaves to balance the load.
There are a lot of tools available on the internet - probably a bit too many to investigate. Can anyone recommend any tools that can provide some profiling/performance monitoring that may help me on my quest.
Many thanks,
ns
Your slow-down seems to be related to the data and not to the number of concurrent users.
Properly indexed queries tend to scale logarithmically with the amount of data - i.e. doubling the data increases the query time by some constant C, doubling the data again by the same C, doubling again by the same C etc... Before you know it, you have humongous amounts of data, yet your queries are just a little slower.
If the slow-down wasn't as gradual in your case (i.e. it was linear to the amount of data, or worse), this might be an indication of badly optimized queries. Throwing more iron at the problem will postpone it, but unless you have unlimited budget, you'll have to actually solve the root cause at some point:
Measure the query performance on the actual data to identify slow queries.
Examine the execution plans for possible improvements.
If necessary, learn about indexing, clustering, covering and other performance techniques.
And finally, apply that knowledge onto queries you have identified in steps (1) and (2).
If nothing else helps, think about your data model. Sometimes, a "perfectly" normalized model is not the best performing one, so a little judicial denormalization might be warranted.
The easy (lazy) way if you have budget is just to throw some more iron at it.
A better way would be, before deciding where or how to scale, would be to identify the bottlenecks. Is it every page load that is slow? Or just particular pages? If it is just a few pages then invest in a profiler (for PHP both xDebug and the Zend Debugger can do profiling). I would also (if you haven't) invest in a test system that is as similar as possible to the live system to run diagnostics.
You could also look at gathering some stats; either at server level with a program such as sar (from the sysstat package and also at the db level (have you got the slow query log running?).

Upload large files to BLOB

I'm working with saving big files(~200mb) directly into db.
I have issue with that.
Caused by increased huge use of free RAM(about 3gb of ram and 3gb of swap) on stage when file saves to db:
#job.pdf = params[:job][:pdf].read
After this is completed there is still some RAM and swap in use.
Is there some way to optimize that?
p.s. project on rails 3.0.3, uses mysql, running on mogrel.
In MySQL, to be able to save or read BLOB fields with size more than 1MB, you have to increase server side parameter max_allowed_packet to be larger than default. In practice, you can't go much farther than 16-32MB for this parameter. Price for this increase is that every new db client will consume at least as much memory, and in general, server performance will greatly suffer.
In other words, MySQL does not really support handling BLOB fields larger than 1MB (if you can't or don't want to fiddle with server configuration) to around 16MB (even if you do want to do that).
This can be philosophical question - is it good idea or not to keep big blobs in database? I think for many tasks (but not for all) is it great idea, and because MySQL is so bad it this (and for host of other reasons), I simply avoid using it as my SQL server solution.
Instead, I use PostgreSQL, which perfectly supports BLOBs (actually, BYTEA) to advertized limit of 4GB without any tweaks on client or server. In addition to that, it will actually transparently compress them with LZ algorithm - slightly worse than gzip, but still much better than no compression at all.

How to measure mySQL bottlenecks?

What mySQL server variables should we be looking at and what thresholds are significant for the following problem scenarios:
CPU bound
Disk read bound
Disk write bound
And for each scenario, what solutions are recommended to improve them, short of getting better hardware or scaling the database to multiple servers?
This is a complicated area. The "thresholds" that will affect each of your three categories overlap quite a bit.
If you are having problems with your operations being CPU bound, then you definitely need to look at:
(a) The structure of your database - is it fully normalized. Bad DB structure leads to complex queries which hit the processor.
(b) Your indexes - is everything needed for your queries sufficiently indexed. Lack of indexes can hit both the processor and the memory VERY hard. To check indexes, do "EXPLAIN ...your query". Anything row in the resulting explanation that says it isn't using an index, you need to look at closely and if possible, add an index.
(c) Use prepared statements wherever possible. These can save the CPU from doing quite a bit of crunching.
(d) Use a better compiler with optimizations appropriate for your CPU. This is one for the dedicated types, but it can glean you the odd extra percent here and there.
If you are having problems with your operations being read bound
(a) Ensure that you are caching where possible. Check the configuration variables for query_cache_limit and query_cache_size. This isn't a magic fix, but raising these can help.
(b) As with above, check your indexes. Good indexes reduce the amount of data that needs to be read.
If you having problems with your operations being write bound
(a) See if you need all the indexes you currently have. Indexes are good, but the trade-off for them improving query time, is that maintaining those indexes can impact the time spent writing the data and keeping them up to date. Normally you want indexes if in doubt, but sometimes you're more interested in rapidly writing to a table than you are in reading from it.
(b) Make possible use of INSERT DELAYED to "queue" writes to the database. Note, this is not a magic fix and often inappropriate, but in the right circumstances can be of help.
(c) Check for tables that are heavily read from and written to at the same time, e.g. an access list that update's visitor's session data constantly and is read from just as much. It's easy to optimize a table for reading from, and writing to, but not really possible to design a table to be good at both. If you have such a case and it's a bottleneck, consider whether it's possible to split its functions or move any complex operations using that table to a temporary table that you can update as a block periodically.
Note, the only stuff in the above that has a major effect, is good query design / indexing. Beyond that, you want to start considering at better hardware. In particular, you can get a lot of benefit out of a RAID-0 array which doesn't do a lot for writing bound problems, but can do wonders for read-bound problems. And it can be a pretty cheap solution for a big boost.
You also missed two items off your list.
Memory bound. If you are hitting memory problems then you must check everything that can be usefully indexed is indexed. You can also look at greater connection pooling if for some reason you're using a lot of discrete connections to your DB.
Network bound. If you are hitting network bound problems... well you probably aren't, but if you are, you need another network card or a better network.
Note, that a convenient way to analyze your DB performance is to turn on the log_slow_queries option and set long_query_time to either 0 to get everything, or 0.3 or similar to catch anything that might be holding your database up. You can also turn on log-queries-not-using-indexes to see if anything interesting shows up. Note, this sort of logging can kill a busy live server. Try it on a development box to start.
Hope that's of some help. I'd be interested in anyone's comments on the above.

MySQL schema size

I have a development MySQL (InnoDB only) server with several users. Each user has access on one exclusive schema. How can I limit the schema size so that each user can use only 1GB (for example)?
MySQL itself does not offer a quota system. Using the method suggested by James McNellis would probably work, however having InnoDB reach a hard quota limit suddenly would certainly not benefit stability; all data files are still connected via the system table space which cannot get rid of.
Unfortunately I do not see a practical way to achieve what you want. If you are concerned about disk space usage exceeding predefined limits and do not want to go the way of external quota regulations, I suggest staying with the combined table space settings (i. e. no innodb_file_per_table) and remove the :autoextend from the configuration.
That way you still will not get user or schema specific limits, but at least prevent the disk from being filled up with data, because the table space will not grow past its initial size in this setup. With innodb_file_per_table there unfortunately is no way to configure each of them to stop at a certain maximum size.
This is one of the aspects MySQL differs from other, supposedly more enterprise-level databases. Don't get me wrong though, we use InnoDB with lots of data in several thousand installations, so it has certainly proven to be ready for production grade. Only the management features are a little lacking at times.