My program uses shared memory as a data storage. This data must be available to any application running, and fetching this data must be fast. But some applications can run on different NUMA nodes, and data access for them is realy expensive. Is data duplication for every NUMA node is the only way for doing this?
There are two primary sources of slowdown that can be attributed to NUMA. The first is the increased latency of remote access which can vary depending on the platform. On the platforms that I work with, there is about a 30% hit in latency.
The other source of performance loss can come from contention over the communication links and controllers between NUMA nodes.
The default allocation scheme for Linux is to allocate the data on the node where it was created. If majority of the data in the application is initialized by a single thread then it'll generate a lot of cross NUMA domain traffic and contention for that one memory node.
If your data is read only, then replication is a good solution.
Otherwise, interleaving the data allocation across all your nodes will distribute the requests across all the nodes and will help relieve congestion.
To interleave the data, you can use set_mempolicy() from numaif.h if you are using Linux.
Related
I have a lot of data needed to stored on disk.
Since it is only key-value pairs, I want to use couchbase to do it.
The data is several GB and I only allocate 1 GB RAM to the bucket.
I though RAM to couchbase is only a cache.
But after inserting a lot of data I got:
Hard Out Of Memory Error. Bucket "test2" on node 100.66.32.169 is full. All memory allocated to this bucket is used for metadata.
when I open the couchbase web console.
Can couchbase be a database to store data on disk? Or it is RAM oritented?
Update:
OK, let me make the question more specific:
In couchbase:
If I allocate the RAM of a bucket to be 1 GB, can I store 10 GB data to that bucket?
If I can do 1. , can I consider that 1 GB RAM is a kind of cache of the 10 GB data (just like CPU L2 cache is a cache of RAM) ?
By default, Couchbase stores all keys (and some metadata) in RAM, and fills whatever remains with values. Starting with version 3.0, you can set your bucket to full-eviction mode, which only keeps the keys of cached documents in RAM. This lets you store much more data than you have memory, but at a cost to performance to some read operations, especially trying to retrieve keys that don't exist.
To solve your specific problem, edit the bucket and set it to full metadata eviction. Note that this will restart the bucket.
Couchbase tries to keep as much of the "live dataset" (ie. most used / requested keys) into the node's memory. This is key to the performance of the database, and part of the design, so good memory sizing of your nodes and quotas of your buckets are key.
It does offer persistence, but I'd say this is not a disk-first oriented database.
Persistence to disk is mainly for two things: making data durable and resilient to node shutdown (of course) and off-load data (in priority least used data) from RAM to disk.
I think you're asking a bunch of different questions here.
Specifically about the error message: looks like your bucket is simply too small to hold all the data you're storing in it.
About persisting to disc: you can force couchbase to write to disc (and even configure the number of nodes that docs are replicated into) but as noted above, that would probably hurt your performance a little.
Have a look, for example, at the persist_to flag in the set() api of the python client for couchbase.
couchbase client for python
Recently i heard about the concept of In memory database.
In any type of database we are finally storing the data in the computer,from there our program will get the data .How in memory database operations are fast when compared to the others.
Will the in memory database load all the data from the database into memory(RAM).
Thanks in advance....
An in-memory database (IMDB; also main memory database system or MMDB or memory resident database) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. Main memory databases are faster than disk-optimized databases since the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.
Applications where response time is critical, such as those running telecommunications network equipment and mobile advertising networks, often use main-memory databases.
In reply to your query, yes it loads the data in RAM of your computer.
On-Disk Databases
All data stored on disk, disk I/O needed to move data into main
memory when needed.
Data is always persisted to disk.
Traditional data structures like B-Trees designed to store tables and
indices efficiently on disk.
Virtually unlimited database size.
Support very broad set of workloads, i.e. OLTP, data warehousing,
mixed workloads, etc.
In-Memory Databases
All data stored in main memory, no need to perform disk I/O to query
or update data.
Data is persistent or volatile depending on the in-memory database
product.
Specialized data structures and index structures assume data is
always in main memory.
Optimized for specialized workloads; i.e. communications
industry-specific HLR/HSS workloads.
Database size limited by the amount of main memory.
MySQL offerings
MySQL has several "Engines". In all engines, actions are performed in RAM. The Engines differ significantly in how good they are at making sure the data "persists" on disk.
ENGINE=MEMORY -- This is not persistent; the data is found only in RAM. It is limited to some preset max size. On a power failure, all data (in a MEMORY table) is lost.
ENGINE=MyISAM -- This is an old engine; it persists data to disk, but in the case of power failure, sometimes the indexes are corrupted and need 'repairing'.
ENGINE=InnoDB -- This is the preferred engine. It not only persists to disk but 'guarantees' consistency even across power failures.
In-memory db usually have the whole database in memory. (like MySQL DB Engine MEMORY)
This is a huge performance boost, but RAM is expensive and often not persistent, so you would loose data on restart.
There are some ways to reduce the last issue, e.g. by timed snapshots, or replication on a disk database.
Also there are some hybrid types, with just a part of the db in memory.
There are also in-memory databases like Tarantool that can work with data sets larger than available RAM. Tarantool is able to work with these sets because it is optimized for fast random writes, the main bottleneck that arises.
I had a cluster which inculdes three nodes. We created a bucket inside and set the number of bucket replicas to be 2. Besides the RAM quota is set to be 10G per node, that is, the total RAM quota is 30G.
I used client-side to save data into this bucket. Hours later, the client-side printed Temporary failure error. and Couchbase web console showed that the bucket RAM reached 29G.Repeated data compression but the RAM didn't reduce anymore.
My questions is organized as follows.
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
In response to your specific questions:
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
If by hardware you mean disk; then yes, currently Couchbase must hold all document keys (along with some additional metadata) in RAM. This is to ensure that any request for a key can be answered immediately, both in the positive ("yes, this key exists and here's it's value) and the negative ("no, such a key doesn't exist.)"
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
Some of this is probably the metadata. If you go to the Bucket tab and display it's statistics by clicking on it's name, you can see the amount of memory used - specifically under the VBucket Resources tab to see how much is used for metadata and user data. See the Couchbase Admin Guide - Viewing Bucket and cluster statistics for more details.
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
The replica metadata is also always kept in RAM, but the replica values (like active values) can be ejected to disk to free up memory.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
Yes the vbucket hashing is done on the client - see the Architecture and Concepts - Vbuckets section in the Admin guide.
In general you may want to review the Sizing chapter in the Admin guide to determine how much of you memory is being used for storing key metadata - specifically the Memory Sizing section. The exact calculation depends on the version of Couchbase (and so I won't duplicate here).
I am curious to know, how couchbase server support high concurrency and high throughput.
It's a very broad question to answer but I'll try to cover some of the key reasons for why Couchbase is fast and scalable.
Writes in Couchbase are by default asynchronous,replication and persistence happen in the background, and the smart clients (SKD's) are notified of success or failure. So basically any new documents or mutations to documents are written to ram and then asynchronously flushed to disk in the background and replicated to other nodes. This means that there is no waiting time or contention on IO/disk speed. (This means it is possible to write to ram and then the node to fall over before the request has been persisted to disk or replicated to a secondary/third node). It is possible to make writes synchronously but it will slow down throughput considerably.
When dealing with ram, writes and read are VERY fast (we've only pushed our cluster to 20k operations a second) but large companies easily hit upwards of 400k operations a second. LinkedIN sustain this ops rate with only 4 nodes ---> http://www.couchbase.com/customer-stories
In traditional database architectures usually the setup would be a master DB (Mysql/Postgres/Oracle) coupled with a slave DB for data redundancy, also writes/reads can be split between the 2 as load gets higher. Couchbase is meant to be used as a distributed system (Couchbase recommend at least 3 nodes in production). Data is automatically sharded between the nodes in a cluster thus spreading the writes/reads across multiple machines. In the case of needing higher throughput, adding a node in Couchbase is as simple as clicking add node and then rebalance cluster, the data will be automatically partitioned across the new cluster map.
So essentially writing/reading from ram with async disk persistence + distributed reads and writes == high throughput
Hope that helps!
#scalabilitysolved already gave a great overview, but if you want a longer (and more detailed) description take a look at the Couchbase_Server_Architecture_Review on couchbase.com
I searched for an explanation on how Couchbase achieves strong consistency inside a cluster. Is all of this as a result of using membase?
Couchbase IS membase btw. Couchbase is a product and a company, the company is a merge of NorthScale (Membase) and the CouchDB founders, and the resulting name for both company and product was Couchbase.
Update operations (replace and [forced] set) update RAM cache first, and subsequent reads are the new value, this is the consistency model.
Couchbase is an "eventually persisted" (EP) architecture, where CRUD operations update RAM cache first and are inserted into the EP queue for disk i/o. At the same time, when replicas are configured, they go into replica queues and are transferred to the other nodes. The EP architecture is what allows for immediate consistency and super high throughput as disk i/o is the slowest component of all systems.
As WiredPrairie mentioned, a single node is responsible/active for a given key. The key is hashed and the result of the hash is a particular partition it should live in. The partition->couchbase-node map, which the sdk's maintain, allows them to go directly to the active node for each partition. Again, this reduces latency because it doesn't have to go through a load balancer, (it's load balanced by the architecture itself), nor does it go through a "master" node, each node is a master, nor does it go through a "shard master" whose job is to redirect clients to a particular shard. By bypassing all those, latency is reduced to a minimum.
Couchbase guarantees strong consistency by enforcing that all reads for a particular piece of data go to a single node in a cluster. You cannot read from a replica. If you could, you might end up with inconsistent data.
When using the 2.0 XDCR, Couchbase provides only eventual consistency.
I wouldn't say it's a "result" of anything other than a specific design requirement they had for their software.
There's some additional information in this blog post.
I don't think it is strong consistency, since if one node with active vbucket reboot, data has not been replicated or not persisted yet, it will lose the data ;
strong consistency need W+R>N, and R=1 here, so we need W=N which means all replicas should be ACID;
we could call it fake strong consistency