gCloud / GCE Disk Size warning - is it meaningful? - google-compute-engine

When I create a boot disk with gCloud less than 200GB in size, I see this error:
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks/persistent-disks#pdperformance.
I don't, however, see the details about this 200GB size which it alludes to being at somewhere on the page at that url.
Should I care about this warning at all? I wonder is it is more of a ploy for them to make more money trying to encourage you to lease more space?
Note: I'm using a standard disk, not a solid state. My disk access performance which is of any concern is via MySQL with very small read / writes 99% of the time, and occasionally blobs in the range of say 1 to 100 MBs.

It looks like the documentation has shifted around a little, and the warning is out of date.
There is a section of the Block Storage page that explains the relationship between persistent disk size and performance.
We'll fix the URL in gcloud.

Related

Changing from BLOB to filestorage - mySQL tweaks

I took over a project some time ago, where file binaries were stored as BLOBs. Those were from sizes of 0.5-50 Mb, therefore that table was touched as less as possible (-> eBeans lazyloading etc). The BLOB stuff worked quite fine as long as the whole system was running on one dedicated server, once we switched to AWS EC2 instances + RDS, things were (obviously) slower.
Therefore I switched the data storage from BLOB to S3 (+ reference to the bucket/key stored in the DB), which is much faster for our backend and our clients.
Now my question is, obviously the programmer before set up the mySQL DB to handle bigger chunks of data (max packagesize etc), and I also stumbled over some discussion about connection pool size.
What are critical parameters to check in the mySQL setup, and what are effective ways of evaluating them?
The most likely answer to your question is "change nothing."
MySQL has many, many, many "tunable" parameters, and there is an absolute wealth of bad advice available online about "optimizing" them. But this is a temptation best avoided.
To the extent that system variables have been changed from defaults, if you ever find yourself in a situation where you believe tweaking the configuration is necessary, your first instinct should be to revert settings to their defaults unless you have a specific and justified reason not to.
Settings like max_allowed_packet if set too small, will break some things (like large blobs) but if set larger than necessary will have little or no impact... the "excess" isn't allocated or otherwise harmful. In the case of max_allowed_packet, this does impose a constraint on memory use by limiting the amount of memory the server would ever need to allocate for a single packet, but since it's a brick wall limit, you don't necessarily want to shrink it. If you aren't sending packets that big, it isn't hurting anything.
It is safe to increase the value of this variable because the extra memory is allocated only when needed. For example, mysqld allocates more memory only when you issue a long query or when mysqld must return a large result row. The small default value of the variable is a precaution to catch incorrect packets between the client and server and also to ensure that you do not run out of memory by using large packets accidentally.
http://dev.mysql.com/doc/refman/5.7/en/packet-too-large.html
Other parameters, though, can have dramatically counter-intuitice negative effects because the range of "valid" values is a superset of the range of "optimal" values. The query cache is a prime example of this. "But it's more cache! How can that be bad?!" Well, a bigger house increases the amount of housework you have to do, and the query cache is a big house with only one tiny broom (a global mutex that each thread contends for when entering and leaving).
Still others, like innodb_buffer_pool_size only really have one relatively small optimal range of values for a given server. Too small will increase disk I/O and impair performance because the pool is smaller than the system could support, too large will increase disk I/O due to the server using swap space or crash it entirely by exhausting the system of every last available kilobyte of free RAM.
Perhaps you get the idea.
Unless you have a specific parameter that you believe may be suboptimally configured, leave a working system working. If you change things, change them one at a time, and prove or disprove that each change was a good idea before proceeding. If you are using a non-default value, consider the default as a potentially good candidate value.
And stay away from "tuning scripts" that make suggestions about parameters you should change. Those are interesting to look at, but their advice is often dangerous. I've often thought about writing my own one of these, but all it would do is check for values not set to the default and tell the user to explain themselves or set it back. :) Maybe that would catch on.

couchbase metadata overhead warning. 62% RAM is taken up by keys and metadata

Okay since i don't have 10 repitation I'm unable to post images, but I will try to explain in text.
I have a 7 node Couchbase (Community) cluster with 4 buckets.
Recently I've been getting spammed(constantly) by Metadata overhead warnings for one of the buckets..
The warning pops up and looks like this:
Metadata overhead warning. Over 62% of RAM allocated to bucket XXXX on node "xxx" is taken up by keys and metadata.
And I've read that it is usually a sign that the bucket needs more ram. But I don't thing that is the issue for me. I simply have a lot of metadata I would guess.
When I look at the Data Buckets tab this bucket has RAM/Quota Usage 64GB/75GB. So for me it looks that there is around 11GB(75-64GB) available.
If i look at the Bucket Analytics VBUCKET RESOURCES metrics I see that there is 59GB user data in RAM and 46GB metadata in RAM. So to my understanding there should be 105GB in RAM on a bucket that has a total of 75GB!?!
But that doesn't add up for me so clearly there is something that I don't understand here.
And yes 46GB of 75GB is around 62%. But what about the 59GB user data that is supposedly in RAM?
EDIT:
A typical document can look like this:
ID=1:CAESEA---rldZ5PhdV4msSdEchI
CONTENT=z2TjZEzkZ84=
And to my question. What do I do? Is the situation acceptable in my circumstances. If so, do I change the threshold for that warning(which I read is not recommended since the warning is set at 50% for a reason).
Or do I assign more RAM? And if so how does that help me if there is already 11GB free?
Please help me clarify these numbers and suggest if I need to take any actions.
First of all, there isn't necessarily a problem with having a high percentage of memory used by metadata - it just means there's less RAM available for caching actual documents. If your application is working well then it may be fine for your use-case. However, having said that let me try and address your questions on it, and what to change if you do want to improve things:
If i look at the Bucket Analytics VBUCKET RESOURCES metrics I see that there is 59GB user data in RAM and 46GB metadata in RAM. So to my understanding there should be 105GB in RAM on a bucket that has a total of 75GB!?!
IIRC "user data in RAM" is inclusive of "metadata in RAM" - so you have a total of 59 GB data used, of which 46 GB is metadata.
And to my question. What do I do? Is the situation acceptable in my circumstances. If so, do I change the threshold for that warning(which I read is not recommended since the warning is set at 50% for a reason).
Or do I assign more RAM? And if so how does that help me if there is already 11GB free?
So basically you are storing lots of very small documents, so the per-document metadata overhead (~48 bytes plus the length of the key) is very high compared to the actual document size.
The 11GB free is mainly made up of the difference between the bucket quota and the high watermark.
Here are a few options to improve this:
Allocate more RAM to the bucket (as you mentioned) - if there's any unallocated in the Server Quota.
Add more memory to the nodes (and allocate to the server quota and bucket).
Reduce the number of replicas (if that's acceptable to you) - at the moment you are essentially storing each object (and it's metadata) three times - once for the active vBuckets and twice for the two replica vBucket sets.
Change your documents to have shorter keys - This will reduce the average metadata per document.
Consolidate multiple documents into one - This will reduce the number of documents, and hence the overall metadata overhead.

Does couchbase actually support datasets larger than memory?

Couchbase documentation says that "Disk persistence enables you to perform backup and restore operations, and enables you to grow your datasets larger than the built-in caching layer," but I can't seem to get it to work.
I am testing Couchbase 2.5.1 on a three node cluster, with a total of 56.4GB memory configured for the bucket. After ~124,000,000 100-byte objects -- about 12GB of raw data -- it stops accepting additional puts. 1 replica is configured.
Is there a magic "go ahead and spill to disk" switch that I'm missing? There are no suspicious entries in the errors log.
It does support data greater than memory - see Ejection and working set management in the manual.
In your instance, what errors are you getting from your application? When you start to reach the low memory watermark, items need to be ejected from memory to make room for newer items.
Depending on the disk speed / rate of incoming items, this can result in TEMP_OOM errors being sent back to the client - telling it needs to temporary back off before performing the set, but these should generally be rare in most instances. Details on handling these can be found in the Developer Guide.
My guess would be that it's not the raw data that is filling up your memory, but the metadata associated with it. Couchbase 2.5 needs 56 bytes per key, so in your case that would be approximately 7GB of metadata, so much less than your memory quota.
But... metadata can be fragmented on memory. If you batch-inserted all the 124M objects in a very short time, I would assume that you got at least a 90% fragmentation. That means that with only 7GB of useful metadata, space required to hold it has filled up your RAM, with lots of unused parts in each allocated block.
The solution to your problem is to defragment the data. It can either be achieved manually or triggered as needed :
manually :
automatically :
If you need more insights about why compaction is needed, you can read this blog article from Couchbase.
Even if none of your documents is stored in RAM, CouchBase still stores all the documents IDs and metadata in memory(this will change in version 3), and also needs some available memory to run efficiently. The relevant section in the docs:
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota
Note that when you use a replica you need twice as much RAM. The formula is roughly:
(56 + avg_size_of_your_doc_ID) * nb_docs * 2 (replica) * (1 + headroom) / (high_water_mark)
So depending on your configuration it's quite possible 124,000,000 documents require 56 Gb of memory.

Apache & MySQL with Persistent Disks to Multiple Instances

I plan on mount persistent disks into folders Apache(/var/www) and Mysql (/var/lib/mysql) to avoid having to replicate information between servers.
Anyone has done tests to know the I/O performance of persistent disk is similar when attaching the same disk to 100 instances as well as only 2 instances? Also has a limit of how many instances can be attach one persistent disk?
I'm not sure exactly what setup you're planning to use, so it's a little hard to comment specifically.
If you plan to attach the same persistent disk to all servers, note that a disk can only be attached to multiple instances in read-only mode, so you may not be able to use temporary tables, etc. in MySQL without extra configuration.
It's a bit hard to give performance numbers for a hypothetical configuration; I'd expect performance would depend on amount of data stored (e.g. 1TB of data will behave differently than 100MB), instance size (larger instances have more memory for page cache and more CPU for processing I/O), and access pattern. (Random reads vs. sequential reads)
The best option is to set up a small test system and run an actual loadtest using something like apachebench, jmeter, or httpperf. Failing that, you can try to construct an artificial load that's similar to your target benchmark.
Note that just running bonnie++ or fio against the disk may not tell you if you're going to run into problems; for example, it could be that a combination of sequential reads from one machine and random reads from another causes problems, or that 500 simultaneous sequential reads from the same block causes a problem, but that your application never does that. (If you're using Apache+MySQL, it would seem unlikely that your application would do that, but it's hard to know for sure until you test it.)

MySQL schema size

I have a development MySQL (InnoDB only) server with several users. Each user has access on one exclusive schema. How can I limit the schema size so that each user can use only 1GB (for example)?
MySQL itself does not offer a quota system. Using the method suggested by James McNellis would probably work, however having InnoDB reach a hard quota limit suddenly would certainly not benefit stability; all data files are still connected via the system table space which cannot get rid of.
Unfortunately I do not see a practical way to achieve what you want. If you are concerned about disk space usage exceeding predefined limits and do not want to go the way of external quota regulations, I suggest staying with the combined table space settings (i. e. no innodb_file_per_table) and remove the :autoextend from the configuration.
That way you still will not get user or schema specific limits, but at least prevent the disk from being filled up with data, because the table space will not grow past its initial size in this setup. With innodb_file_per_table there unfortunately is no way to configure each of them to stop at a certain maximum size.
This is one of the aspects MySQL differs from other, supposedly more enterprise-level databases. Don't get me wrong though, we use InnoDB with lots of data in several thousand installations, so it has certainly proven to be ready for production grade. Only the management features are a little lacking at times.