Can I really persist data on disk using couchbase - couchbase

I have a lot of data needed to stored on disk.
Since it is only key-value pairs, I want to use couchbase to do it.
The data is several GB and I only allocate 1 GB RAM to the bucket.
I though RAM to couchbase is only a cache.
But after inserting a lot of data I got:
Hard Out Of Memory Error. Bucket "test2" on node 100.66.32.169 is full. All memory allocated to this bucket is used for metadata.
when I open the couchbase web console.
Can couchbase be a database to store data on disk? Or it is RAM oritented?
Update:
OK, let me make the question more specific:
In couchbase:
If I allocate the RAM of a bucket to be 1 GB, can I store 10 GB data to that bucket?
If I can do 1. , can I consider that 1 GB RAM is a kind of cache of the 10 GB data (just like CPU L2 cache is a cache of RAM) ?

By default, Couchbase stores all keys (and some metadata) in RAM, and fills whatever remains with values. Starting with version 3.0, you can set your bucket to full-eviction mode, which only keeps the keys of cached documents in RAM. This lets you store much more data than you have memory, but at a cost to performance to some read operations, especially trying to retrieve keys that don't exist.
To solve your specific problem, edit the bucket and set it to full metadata eviction. Note that this will restart the bucket.

Couchbase tries to keep as much of the "live dataset" (ie. most used / requested keys) into the node's memory. This is key to the performance of the database, and part of the design, so good memory sizing of your nodes and quotas of your buckets are key.
It does offer persistence, but I'd say this is not a disk-first oriented database.
Persistence to disk is mainly for two things: making data durable and resilient to node shutdown (of course) and off-load data (in priority least used data) from RAM to disk.

I think you're asking a bunch of different questions here.
Specifically about the error message: looks like your bucket is simply too small to hold all the data you're storing in it.
About persisting to disc: you can force couchbase to write to disc (and even configure the number of nodes that docs are replicated into) but as noted above, that would probably hurt your performance a little.
Have a look, for example, at the persist_to flag in the set() api of the python client for couchbase.
couchbase client for python

Related

Elasticseach stuck at 213 documents and lost data

I have an elasticsearch setup with 1 node and no replica nodes sharing a droplet with a Kibana setup in digital ocean. My droplet has 2GB ram and enough CPU. My elasticsearch JVM is set to use 768MB RAM (so kibana can have its share).
My problem is that I seem to be losing data since my node is stuck at 213 documents and I have already noticed that some important documents are gone.
I couldn't find documentation on how this works. The only thing I found about this is that more ram is better when dealing with large amounts of data, and that having a secondary node to store replicas is a good practice.
Should I allocate more ram? How can I know if my data is beeing deleted to allocate more? Is this some sort of pagination? Can this be a kibana problem?
I solved it. The problem was my ID generation. I was generating IDs manually and not storing persistently the last id generated. Once the system is restarted ids are lost and elastic search allows id overriding.

couchbase metadata overhead warning. 62% RAM is taken up by keys and metadata

Okay since i don't have 10 repitation I'm unable to post images, but I will try to explain in text.
I have a 7 node Couchbase (Community) cluster with 4 buckets.
Recently I've been getting spammed(constantly) by Metadata overhead warnings for one of the buckets..
The warning pops up and looks like this:
Metadata overhead warning. Over 62% of RAM allocated to bucket XXXX on node "xxx" is taken up by keys and metadata.
And I've read that it is usually a sign that the bucket needs more ram. But I don't thing that is the issue for me. I simply have a lot of metadata I would guess.
When I look at the Data Buckets tab this bucket has RAM/Quota Usage 64GB/75GB. So for me it looks that there is around 11GB(75-64GB) available.
If i look at the Bucket Analytics VBUCKET RESOURCES metrics I see that there is 59GB user data in RAM and 46GB metadata in RAM. So to my understanding there should be 105GB in RAM on a bucket that has a total of 75GB!?!
But that doesn't add up for me so clearly there is something that I don't understand here.
And yes 46GB of 75GB is around 62%. But what about the 59GB user data that is supposedly in RAM?
EDIT:
A typical document can look like this:
ID=1:CAESEA---rldZ5PhdV4msSdEchI
CONTENT=z2TjZEzkZ84=
And to my question. What do I do? Is the situation acceptable in my circumstances. If so, do I change the threshold for that warning(which I read is not recommended since the warning is set at 50% for a reason).
Or do I assign more RAM? And if so how does that help me if there is already 11GB free?
Please help me clarify these numbers and suggest if I need to take any actions.
First of all, there isn't necessarily a problem with having a high percentage of memory used by metadata - it just means there's less RAM available for caching actual documents. If your application is working well then it may be fine for your use-case. However, having said that let me try and address your questions on it, and what to change if you do want to improve things:
If i look at the Bucket Analytics VBUCKET RESOURCES metrics I see that there is 59GB user data in RAM and 46GB metadata in RAM. So to my understanding there should be 105GB in RAM on a bucket that has a total of 75GB!?!
IIRC "user data in RAM" is inclusive of "metadata in RAM" - so you have a total of 59 GB data used, of which 46 GB is metadata.
And to my question. What do I do? Is the situation acceptable in my circumstances. If so, do I change the threshold for that warning(which I read is not recommended since the warning is set at 50% for a reason).
Or do I assign more RAM? And if so how does that help me if there is already 11GB free?
So basically you are storing lots of very small documents, so the per-document metadata overhead (~48 bytes plus the length of the key) is very high compared to the actual document size.
The 11GB free is mainly made up of the difference between the bucket quota and the high watermark.
Here are a few options to improve this:
Allocate more RAM to the bucket (as you mentioned) - if there's any unallocated in the Server Quota.
Add more memory to the nodes (and allocate to the server quota and bucket).
Reduce the number of replicas (if that's acceptable to you) - at the moment you are essentially storing each object (and it's metadata) three times - once for the active vBuckets and twice for the two replica vBucket sets.
Change your documents to have shorter keys - This will reduce the average metadata per document.
Consolidate multiple documents into one - This will reduce the number of documents, and hence the overall metadata overhead.

couchbase RAM quota and vbucket's detail questions

I had a cluster which inculdes three nodes. We created a bucket inside and set the number of bucket replicas to be 2. Besides the RAM quota is set to be 10G per node, that is, the total RAM quota is 30G.
I used client-side to save data into this bucket. Hours later, the client-side printed Temporary failure error. and Couchbase web console showed that the bucket RAM reached 29G.Repeated data compression but the RAM didn't reduce anymore.
My questions is organized as follows.
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
In response to your specific questions:
1, I guess the key in bucket can only be saved into the RAM but not in hardware, right or wrong?
If by hardware you mean disk; then yes, currently Couchbase must hold all document keys (along with some additional metadata) in RAM. This is to ensure that any request for a key can be answered immediately, both in the positive ("yes, this key exists and here's it's value) and the negative ("no, such a key doesn't exist.)"
2,Wheter the 29G data, which can not be compressed into hardware ,is key or not?
Some of this is probably the metadata. If you go to the Bucket tab and display it's statistics by clicking on it's name, you can see the amount of memory used - specifically under the VBucket Resources tab to see how much is used for metadata and user data. See the Couchbase Admin Guide - Viewing Bucket and cluster statistics for more details.
3,Wheter each node that saves others node's replica information is saved in hardware or not? If not, how could it be saved.
The replica metadata is also always kept in RAM, but the replica values (like active values) can be ejected to disk to free up memory.
4,Every time the client-side saves data, it will make use of hash function to evaluate vbucket in order to judge which nodes that the data will be saved in. Is the process carried on the client-side?
Yes the vbucket hashing is done on the client - see the Architecture and Concepts - Vbuckets section in the Admin guide.
In general you may want to review the Sizing chapter in the Admin guide to determine how much of you memory is being used for storing key metadata - specifically the Memory Sizing section. The exact calculation depends on the version of Couchbase (and so I won't duplicate here).

Does couchbase actually support datasets larger than memory?

Couchbase documentation says that "Disk persistence enables you to perform backup and restore operations, and enables you to grow your datasets larger than the built-in caching layer," but I can't seem to get it to work.
I am testing Couchbase 2.5.1 on a three node cluster, with a total of 56.4GB memory configured for the bucket. After ~124,000,000 100-byte objects -- about 12GB of raw data -- it stops accepting additional puts. 1 replica is configured.
Is there a magic "go ahead and spill to disk" switch that I'm missing? There are no suspicious entries in the errors log.
It does support data greater than memory - see Ejection and working set management in the manual.
In your instance, what errors are you getting from your application? When you start to reach the low memory watermark, items need to be ejected from memory to make room for newer items.
Depending on the disk speed / rate of incoming items, this can result in TEMP_OOM errors being sent back to the client - telling it needs to temporary back off before performing the set, but these should generally be rare in most instances. Details on handling these can be found in the Developer Guide.
My guess would be that it's not the raw data that is filling up your memory, but the metadata associated with it. Couchbase 2.5 needs 56 bytes per key, so in your case that would be approximately 7GB of metadata, so much less than your memory quota.
But... metadata can be fragmented on memory. If you batch-inserted all the 124M objects in a very short time, I would assume that you got at least a 90% fragmentation. That means that with only 7GB of useful metadata, space required to hold it has filled up your RAM, with lots of unused parts in each allocated block.
The solution to your problem is to defragment the data. It can either be achieved manually or triggered as needed :
manually :
automatically :
If you need more insights about why compaction is needed, you can read this blog article from Couchbase.
Even if none of your documents is stored in RAM, CouchBase still stores all the documents IDs and metadata in memory(this will change in version 3), and also needs some available memory to run efficiently. The relevant section in the docs:
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota
Note that when you use a replica you need twice as much RAM. The formula is roughly:
(56 + avg_size_of_your_doc_ID) * nb_docs * 2 (replica) * (1 + headroom) / (high_water_mark)
So depending on your configuration it's quite possible 124,000,000 documents require 56 Gb of memory.

How to check a particular bucket capacity in couchbase?

How to check a particular bucket capacity in couchbase? So, if its going to reach its capacity make a new bucket and start inserting data into that bucket .. Can that be done .. Java API..
My question is .. If we have a big data set that we need to insert into couchbase and if the bucket size is not enough ... and I am using rest API to make that happen .. Is there a way I can do something like ..
if(bucket has reached capacity)
create another bucket dynamically createbucket( )
and insert data now into this newly created bucket
In addition to #m03geek answer:
So the first answer, do not use "buckets" to manage space. The DD are the same for all buckets. (You can just select DD for data and another DD for indexes).
So the limit is the size of your disk, remember that you distribute the data on many nodes, so we can say that the space for you database is the sum of the free space on all nodes.
It is also important to remember that couchbase has two types of files:
data
index
(and some replicas)
All data in files are managed using a append only approach, this means that the files grow, then it is compacted. You can find more information about this here:
http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-admin-tasks-compaction.html
http://blog.couchbase.com/compaction-magic-couchbase-server-20
Also when you are creating a bucket you have to set the RAM quota, to limit the size of the cache that is used to store all metadata, and cache the values. Once again this is distributed on all nodes of the cluster, for example if you have a 5 nodes cluster and you put 2GB or RAM quotas for your bucket, you have 10GB or RAM available for this bucket.
The space is managed automatically by Couchbase that removes (after data has been persisted on disk) data from the RAM when necessary.
Finally, if you are looking for some stats from your cluster you can access many of them using the REST API documented here:
http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-admin-restapi.html
Couchbase bucket size is equal to free disk space. So if you have i.e. 1Tb HDD and create one bucket it's capacity will be 1Tb. If you create 2 buckets or 100 buckets, the capacity of your HDD will not change. So if you run out of bucket size - buy more HDDs. The bucket size that you've write in admin console is how much RAM memory couchbase can consume to store frequently asked keys, but not to store data.