I am setting up a Geth archive node, and it take a lot of space, I have a few SSDs (each 1 tb), I want to configure get in a way that when SSD1 is full it should automatically continue storing new data into SSD2 and so on.
Any help?
Try to set up a zfs pool or raid array. This will combine the drives in a given configuration.
If you don't care about data loss in the event of a disk failure, you can use the 0 configuration, although it is not recomended as one disk will result in a total loss of your data.
Related
I have a lot of data needed to stored on disk.
Since it is only key-value pairs, I want to use couchbase to do it.
The data is several GB and I only allocate 1 GB RAM to the bucket.
I though RAM to couchbase is only a cache.
But after inserting a lot of data I got:
Hard Out Of Memory Error. Bucket "test2" on node 100.66.32.169 is full. All memory allocated to this bucket is used for metadata.
when I open the couchbase web console.
Can couchbase be a database to store data on disk? Or it is RAM oritented?
Update:
OK, let me make the question more specific:
In couchbase:
If I allocate the RAM of a bucket to be 1 GB, can I store 10 GB data to that bucket?
If I can do 1. , can I consider that 1 GB RAM is a kind of cache of the 10 GB data (just like CPU L2 cache is a cache of RAM) ?
By default, Couchbase stores all keys (and some metadata) in RAM, and fills whatever remains with values. Starting with version 3.0, you can set your bucket to full-eviction mode, which only keeps the keys of cached documents in RAM. This lets you store much more data than you have memory, but at a cost to performance to some read operations, especially trying to retrieve keys that don't exist.
To solve your specific problem, edit the bucket and set it to full metadata eviction. Note that this will restart the bucket.
Couchbase tries to keep as much of the "live dataset" (ie. most used / requested keys) into the node's memory. This is key to the performance of the database, and part of the design, so good memory sizing of your nodes and quotas of your buckets are key.
It does offer persistence, but I'd say this is not a disk-first oriented database.
Persistence to disk is mainly for two things: making data durable and resilient to node shutdown (of course) and off-load data (in priority least used data) from RAM to disk.
I think you're asking a bunch of different questions here.
Specifically about the error message: looks like your bucket is simply too small to hold all the data you're storing in it.
About persisting to disc: you can force couchbase to write to disc (and even configure the number of nodes that docs are replicated into) but as noted above, that would probably hurt your performance a little.
Have a look, for example, at the persist_to flag in the set() api of the python client for couchbase.
couchbase client for python
I have a couchbase server to storing a huge data.
This data growing daily, but i also daily delete after process it.
Current, this data has about 1320168 items count, with 2.97G of Data Usage
But why Disc Usage is very large with 135G ???
My disc is lowing space to store more data.
Could delete this data log files to reduce disc Usage?
Couchbase uses an append-only format for storage. This means that every update or delete operation is actually stored as a new entry in the storage file and consumes more disk space.
Then a process called compaction occurs, that will reclaim the unnecessary used space. Compaction can either be configured to run automatically, when a certain fragmentation % is reached in your cluster, or manually on each node.
IIRC auto-compaction is not on by default.
So what you probably want to do is run compaction on your cluster. Note that it may require quite a large amount of diskspace, as noted here...
See the doc on how to perform compaction (in your case at the end of business day I guess you have an "off-peak" window where you currently delete and could perform compaction).
PS: Maybe guys in the official forums may have more insight and recommandations to offer.
I'm going to use RabbitMQ in a project where large amounts of data (~2*10^7 messages, 800 bytes each) need to be stored and processed. Of course, all this data won't fit in RAM, so I have a question: how to configure RabbitMQ to save only part of messages in RAM, and another part -- on disk?
Thank you.
Oops, found answer on my own question, let me share it:
Accordingly to http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2/ :
When queues are small(ish) they will reside entirely within memory. Persistent messages will also get written to disc, but they will only get read again if the broker restarts. But when queues get larger, they will get paged to disc, persistent or not.
Couchbase documentation says that "Disk persistence enables you to perform backup and restore operations, and enables you to grow your datasets larger than the built-in caching layer," but I can't seem to get it to work.
I am testing Couchbase 2.5.1 on a three node cluster, with a total of 56.4GB memory configured for the bucket. After ~124,000,000 100-byte objects -- about 12GB of raw data -- it stops accepting additional puts. 1 replica is configured.
Is there a magic "go ahead and spill to disk" switch that I'm missing? There are no suspicious entries in the errors log.
It does support data greater than memory - see Ejection and working set management in the manual.
In your instance, what errors are you getting from your application? When you start to reach the low memory watermark, items need to be ejected from memory to make room for newer items.
Depending on the disk speed / rate of incoming items, this can result in TEMP_OOM errors being sent back to the client - telling it needs to temporary back off before performing the set, but these should generally be rare in most instances. Details on handling these can be found in the Developer Guide.
My guess would be that it's not the raw data that is filling up your memory, but the metadata associated with it. Couchbase 2.5 needs 56 bytes per key, so in your case that would be approximately 7GB of metadata, so much less than your memory quota.
But... metadata can be fragmented on memory. If you batch-inserted all the 124M objects in a very short time, I would assume that you got at least a 90% fragmentation. That means that with only 7GB of useful metadata, space required to hold it has filled up your RAM, with lots of unused parts in each allocated block.
The solution to your problem is to defragment the data. It can either be achieved manually or triggered as needed :
manually :
automatically :
If you need more insights about why compaction is needed, you can read this blog article from Couchbase.
Even if none of your documents is stored in RAM, CouchBase still stores all the documents IDs and metadata in memory(this will change in version 3), and also needs some available memory to run efficiently. The relevant section in the docs:
http://docs.couchbase.com/couchbase-manual-2.5/cb-admin/#memory-quota
Note that when you use a replica you need twice as much RAM. The formula is roughly:
(56 + avg_size_of_your_doc_ID) * nb_docs * 2 (replica) * (1 + headroom) / (high_water_mark)
So depending on your configuration it's quite possible 124,000,000 documents require 56 Gb of memory.
I plan on mount persistent disks into folders Apache(/var/www) and Mysql (/var/lib/mysql) to avoid having to replicate information between servers.
Anyone has done tests to know the I/O performance of persistent disk is similar when attaching the same disk to 100 instances as well as only 2 instances? Also has a limit of how many instances can be attach one persistent disk?
I'm not sure exactly what setup you're planning to use, so it's a little hard to comment specifically.
If you plan to attach the same persistent disk to all servers, note that a disk can only be attached to multiple instances in read-only mode, so you may not be able to use temporary tables, etc. in MySQL without extra configuration.
It's a bit hard to give performance numbers for a hypothetical configuration; I'd expect performance would depend on amount of data stored (e.g. 1TB of data will behave differently than 100MB), instance size (larger instances have more memory for page cache and more CPU for processing I/O), and access pattern. (Random reads vs. sequential reads)
The best option is to set up a small test system and run an actual loadtest using something like apachebench, jmeter, or httpperf. Failing that, you can try to construct an artificial load that's similar to your target benchmark.
Note that just running bonnie++ or fio against the disk may not tell you if you're going to run into problems; for example, it could be that a combination of sequential reads from one machine and random reads from another causes problems, or that 500 simultaneous sequential reads from the same block causes a problem, but that your application never does that. (If you're using Apache+MySQL, it would seem unlikely that your application would do that, but it's hard to know for sure until you test it.)