Coherence Topology Suggestion - topology

Data to be cached:
100 Gb data
Objects of size 500-5000 bytes
1000 objects updated/inserted in average per minute (peak 5000)
Need suggestion for Coherence topology in production and test (distributed with backup)
number of servers
nodes per server
heap size per node
Questions
How much free available memory is needed per node compared to memory used by cached data (assume 100% usage is not possible)
How much overhead will 1-2 additional indexes per cache element generate?
We do not know how many read operations will be done, the solution will be used by clients where low response times are critical (more than data consistency) and depend on each use-case. The cache will be updated from DB by polling at a fixed frequency and populating the cache (since cache is data master, not the system using the cache).

The rule of thumb for sizing a JVM for Coherence is that the data is 1/3 the heap assuming 1 backup: 1/3 for cache data, 1/3 for backup, and 1/3 for index and overhead.
The biggest difficulty in sizing is that there are no good ways to estimate index sizes. You have to try with real-world data and measure.
A rule of thumb for JDK 1.6 JVMs is start with 4GB heaps, so you would need 75 cache server nodes. Some people have been successful with much larger heaps (16GB), so it is worth experimenting. With large heaps (e.g, 16GB) you should not need as much as 1/3 for overhead and can hold more than 1/3 for data. With heaps greater than 16GB, garbage collector tuning becomes critical.
For maximum performance, you should have 1 core per node.
The number of server machines depends on practical limits of manageability, capacity (cores and memory), and failure. For example, even if you have a server that can handle 32 nodes, what happens to your cluster when a machine fails? The cluster will be machine safe (backups are not on the same machine) but recovery would be very slow given the massive amount of data to be moved to new backups. On the other hand 75 machines is hard to manage.
I've seen Coherence have latencies of 250 micro seconds (not millis) for a 1K object put, including network hops and backup. So, the number of inserts and updates you are looking for should be achievable. Test with multiple threads inserting/updating and make sure your test client is not the bottleneck.

A few more "rules of thumb":
1) For high availability, three nodes is a good minimum.
2) With Java 7, you can use larger heaps (e.g. 27GB) and the G1 garbage collector.
3) For 100GB of data, using David's guidelines, you will want 300GB total of heap. On servers with 128GB of memory, that could be done with 3 physical servers, each running 4 JVMs with 27GB heap each (~324GB total).
4) Index memory usage varies significantly with data type and arity. It is best to test with a representative data set, both with and without indexes, to see what the memory usage difference is.

Related

Could MySQL innoDB pages get fragmented inside hard disk? Or InnoDB prevents that from happening to speed up queries?

By pages i mean:
https://dev.mysql.com/doc/internals/en/innodb-page-structure.html
Could these 16KB MySQL pages get fragmented inside memory or disk? meaning if we take a disk image or memory image, could there be a chance that these 16KB pages be fragmented? Meaning if I take an Image of the MySQL folder, will the 16KB pages be in continuous blocks or some of them could get fragmented?
or MySQL implement them in a way that they don't get fragmented? if so, how?
This is more of a filesystem or operating system question than MySQL or InnoDB. InnoDB is writing to a file, and it is indeed possible that the filesystem fragments the writes to a single 16 KiB InnoDB page so that it is not contiguous on the physical disk. However, with modern servers and SSD storage, this basically happens 100% of the time anyway on the underlying media.
It is generally not a concern for me, at least.
(I already discussed disk a length in your similar question. I'll address RAM here.)
Perhaps two decades ago, more than half the CPUs had moved to "virtual" addresses being distinct from "physical" addresses. To achieve this, the hardware guys implemented a "translation" mechanism that mapped a 32-bit (or 64-bit) virtual address that the program uses into a physical address for accessing RAM. This was done at a relatively low level in the hardware. Along with the hardware came the handling of the exception when the physical address is swapped to disk or had not yet been allocated for the user program.
The manufacturers mostly settled on 4KB page size for the precision of the translation. That is the bottom 12 bits (2^12=4K) are fed through the translation untouched; the top bits are mapped (via an on-chip lookup table, backed by an array in RAM), from virtual to physical. This is done for every memory access (except maybe during booting).
With that mechanism, user programs can be totally oblivious of where the 4KB pages are in RAM, and also whether they are scattered or not.
Bottom line: Forget about the memory image. Fragmentation is really a non-issue.
On the other hand, swapping can be an issue. I think that MySQL and InnoDB are designed with the assumption that everything in RAM stays in RAM. Notice how they go to the effort to cache data blocks, index blocks, table definitions, etc. So, do not tune MySQL such that the system needs to swap.

Resize Amazon RDS storage

We are currently working with a 200 GB database and we are running out of space, so we would like to increment the allocated storage.
We are using General Purpose (SSD) and a MySQL 5.5.53 database (without Multi-AZ deployment).
If I go to the Amazon RDS menu and change the Allocated storage to a bit more (from 200 to 500) I get the following "warnings":
Deplete the initial General Purpose (SSD) I/O credits, leading to longer conversion times: What does this mean?
Impact instance performance until operation completes: And this is the most important question for me. Can I resize the instance with 0 downtime? I mean, I dont care if the queries are a bit slower if they work while it's resizing, but what I dont want to to is to stop all my production websites, resize the instance, and open them again (aka have downtime).
Thanks in advance.
You can expect degraded performance but you should really test the impact in a dev environment before running this on production so you're not caught off guard. If you perform this operation during off-peak hours you should be fine though.
To answer your questions:
RDS instances can burst with something called I/O credits. Burst means its performance can go above the baseline performance to meet spikes in demand. It shouldn't be a big deal if you burn through them unless your instance relies on them (you can determine this from the rds instance metrics). Have a read through I/O Credits and Burst Performance.
Changing the disk size will not result in a complete rds instance outage, just performance degradation so it's better to do it during off-peak hours to minimise the impact as much as possible.
First according to RDS FAQs, there should be no downtime at all as long as you are only increasing storage size but not upgrading instance tier.
Q: Will my DB instance remain available during scaling?
The storage capacity allocated to your DB Instance can be increased
while maintaining DB Instance availability.
Second, according to RDS documentation:
Baseline I/O performance for General Purpose SSD storage is 3 IOPS for
each GiB, which means that larger volumes have better performance....
Volumes below 1 TiB in size also have ability to burst to 3,000 IOPS
for extended periods of time (burst is not relevant for volumes above
1 TiB). Instance I/O credit balance determines burst performance.
I can not say for certain why but I guess when RDS increase the disk size, it may defragment the data or rearrange data blocks, which causes heavy I/O. If you server is under heavy usage during the resizing, it may fully consume the I/O credits and result in less I/O and longer conversion times. However given that you started with 200GB I suppose it should be fine.
Finally I would suggest you to use multi-az deployemnt if you are so worried about downtime or performance impact. During maintenance windows or snapshots, there will be a brief I/O suspension for a few seconds, which can be avoided with standby or read replicas.
The technical answer is that AWS supports no downtime when scaling storage.
However, in the real world you need to factor how busy your current database is and how the "slowdown" will affect users. Consider the possibility that connections might timeout or the site may appear slower than usual for the duration of the scaling event.
In my experience, RDS storage resizing has been smooth without problems. However, we pick the best time of day (least busy) to implement this. We also go thru a backup procedure. We snapshot and bring up a standby server to switch over to manually just in case.

Does couchbase actually support datasets larger than memory?

Couchbase documentation says that "Disk persistence enables you to perform backup and restore operations, and enables you to grow your datasets larger than the built-in caching layer," but I can't seem to get it to work.
I am testing Couchbase 2.5.1 on a three node cluster, with a total of 56.4GB memory configured for the bucket. After ~124,000,000 100-byte objects -- about 12GB of raw data -- it stops accepting additional puts. 1 replica is configured.
Is there a magic "go ahead and spill to disk" switch that I'm missing? There are no suspicious entries in the errors log.
It does support data greater than memory - see Ejection and working set management in the manual.
In your instance, what errors are you getting from your application? When you start to reach the low memory watermark, items need to be ejected from memory to make room for newer items.
Depending on the disk speed / rate of incoming items, this can result in TEMP_OOM errors being sent back to the client - telling it needs to temporary back off before performing the set, but these should generally be rare in most instances. Details on handling these can be found in the Developer Guide.
My guess would be that it's not the raw data that is filling up your memory, but the metadata associated with it. Couchbase 2.5 needs 56 bytes per key, so in your case that would be approximately 7GB of metadata, so much less than your memory quota.
But... metadata can be fragmented on memory. If you batch-inserted all the 124M objects in a very short time, I would assume that you got at least a 90% fragmentation. That means that with only 7GB of useful metadata, space required to hold it has filled up your RAM, with lots of unused parts in each allocated block.
The solution to your problem is to defragment the data. It can either be achieved manually or triggered as needed :
manually :
automatically :
If you need more insights about why compaction is needed, you can read this blog article from Couchbase.
Even if none of your documents is stored in RAM, CouchBase still stores all the documents IDs and metadata in memory(this will change in version 3), and also needs some available memory to run efficiently. The relevant section in the docs:
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota
Note that when you use a replica you need twice as much RAM. The formula is roughly:
(56 + avg_size_of_your_doc_ID) * nb_docs * 2 (replica) * (1 + headroom) / (high_water_mark)
So depending on your configuration it's quite possible 124,000,000 documents require 56 Gb of memory.

How Couchbase Server support high concurrency and high throughput

I am curious to know, how couchbase server support high concurrency and high throughput.
It's a very broad question to answer but I'll try to cover some of the key reasons for why Couchbase is fast and scalable.
Writes in Couchbase are by default asynchronous,replication and persistence happen in the background, and the smart clients (SKD's) are notified of success or failure. So basically any new documents or mutations to documents are written to ram and then asynchronously flushed to disk in the background and replicated to other nodes. This means that there is no waiting time or contention on IO/disk speed. (This means it is possible to write to ram and then the node to fall over before the request has been persisted to disk or replicated to a secondary/third node). It is possible to make writes synchronously but it will slow down throughput considerably.
When dealing with ram, writes and read are VERY fast (we've only pushed our cluster to 20k operations a second) but large companies easily hit upwards of 400k operations a second. LinkedIN sustain this ops rate with only 4 nodes ---> http://www.couchbase.com/customer-stories
In traditional database architectures usually the setup would be a master DB (Mysql/Postgres/Oracle) coupled with a slave DB for data redundancy, also writes/reads can be split between the 2 as load gets higher. Couchbase is meant to be used as a distributed system (Couchbase recommend at least 3 nodes in production). Data is automatically sharded between the nodes in a cluster thus spreading the writes/reads across multiple machines. In the case of needing higher throughput, adding a node in Couchbase is as simple as clicking add node and then rebalance cluster, the data will be automatically partitioned across the new cluster map.
So essentially writing/reading from ram with async disk persistence + distributed reads and writes == high throughput
Hope that helps!
#scalabilitysolved already gave a great overview, but if you want a longer (and more detailed) description take a look at the Couchbase_Server_Architecture_Review on couchbase.com

CPU usage PostgreSQL vs MySQL on windows

Currently i have this server
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2392.149
cache size : 512 KB
My application cause more 96% of cpu usage to MySQL with 200-300 transactions per seconds.
Can anyone assist, provide links me on how
to do benchmark to PostgreSQL
do you think PostgreSQL can improve CPU utilization instead of MySQL
links , wiki that simply present the benchmark comparison
A common misconception for database users is that high CPU use is bad.
It isn't.
A database has exactly one speed: as fast as possible. It will always use up every resource it can, within administrator set limits, to execute your queries quickly.
Most queries require lots more of one particular resource than others. For most queries on bigger databases that resource is disk I/O, so the database will be thrashing your storage as fast as it can. While it is waiting for the hard drive it usually can't do any other work, so that thread/process will go to sleep and stop using the CPU.
Smaller databases, or queries on small datasets within big databases, often fit entirely in RAM. The operating system will cache the data from disk and have it sitting in RAM and ready to return when the database asks for it. This means the database isn't waiting for the disk and being forced to sleep, so it goes all-out processing the data with the CPU to get you your answers quickly.
There are two reasons you might care about CPU use:
You have something else running on that machine that isn't getting enough CPU time; or
You think that given the 100% cpu use you aren't getting enough performance from your database
For the first point, don't blame the database. It's an admin issue. Set operating system scheduler controls like nice levels to re-prioritize the workload - or get a bigger server that can do all the work you require of it without falling behind.
For the second point you need to look at your database tuning, at your queries, etc. It's not a "database uses 100% cpu" problem, it's a "I'm not getting enough throughput and seem to be CPU-bound" problem. Database and query tuning is a big topic and not one I'll get into here, especially since I don't generally use MySQL.