Couchbase What is ideal approach to archive and delete documents - couchbase

What will be suggested approach for archiving couchbase documents based on certain criteria to secondary DB considering secondary cluster is also couchbase.
After moving those document to secondary cluster it should be deleted from primary database that is the requirement.
What will efficient approach to move millions of documents to other couchbase cluster (secondary) and then delete from primary database.

If you want the archiving to be continuous, you could configure Cross Data Center Replication (XDCR) to the secondary cluster, with a filter that ignores document expiry and deletion.
Specifically, enable:
Do not replicate document expirations
Remove TTL from replicated items
Do not replicate DELETE operations
Then set a max-TTL on the bucket in the source cluster so the documents there expire automatically. Alternatively, delete documents from the source cluster on your own schedule.
CAVEAT: You'll want to keep a close eye on the XDCR status to make sure documents are replicated before they are deleted/expired.

Related

Why cloning an RDS cluster is faster and more space efficient than snapc

I want to create a Duplicate (clone) of an Aurora DB cluster.
Both the source and the copy are in the same region and are both for dev purposes .
Both are MySql.
I want to access each cluster via a different url .
Reading about Copy-on-write protocol for Aurora cloning.
and SQL snapshot
The aws docs state that :"Creating a clone is faster and more space-efficient than physically copying the data using a different technique such as restoring a snapshot." (source)
Yet , I don't quite understand why using a snapshot is an inferior solution ?
Snapshot is slower, because first snapshot copies entire db storage:
The amount of time it takes to create a DB cluster snapshot varies with the size your databases. Since the snapshot includes the entire storage volume, the size of files, such as temporary files, also affects the amount of time it takes to create the snapshot.
So if you database has, lets say 100GB, the first snapshot of it will require copying 100 GBs. This operation can take time.
In contrast, when you clone, there is no copy done at first. Both original and the new database use same storage. Only when a write operation is performed, they start to diverge.

Connecting 3rd party reporting tools to MySQL

I have an application that runs on a MySQL database, the application is somewhat resource intensive on the DB.
My client wants to connect Qlikview to this DB for reporting. I was wondering if someone could point me to a white paper or URL regarding the best way to do this without causing locks etc on my DB.
I have searched the Google to no avail.
Qlikview is in-memory tool with preloaded data so your client have to get data only during periodical reloads not all the time.
The best way is that your client will set reload once per night and make it incremental. If your tables have only new records load every night only records bigger than last primary key loaded.
If your tables have modified records you need to add in mysql last_modified_time field and maybe also set index on that field.
last_modified_time TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
If your fields are get deleted the best is set it as deleted=1 in mysql otherwise your client will need to reload everything from that tables to get to know which rows were deleted.
Additionally your client to save resources should load only data in really simple style per table without JOINS:
SELECT [fields] FROM TABLE WHERE `id` > $(vLastId);
Qlikview is really good and fast for data modelling/joins so all data model your client can create in QLikview.
Reporting can indeed cause problems on a busy transactional database.
One approach you might want to examine is to have a replica (slave) of your database. MySQL supports this very well and your replica data can be as up to date as you require. You could then attach any reporting system to your replica to run heavy reports that won't affect your main database. This also gives you a backup (2nd copy) and the backup can further be used to create offline backups of your data also without affecting your main database.
There's lots of information on the setup of MySQL replicas so that's not too hard.
I hope that helps.

Where can I find the clear definitions for a Couchbase Cluster, Couchbase Node and a Couchbase Bucket?

I am new to Couchbase and NoSQL terminologies. From my understanding a Couchbase node is a single system running a Couchbase Server application and a collection of such nodes having the same data by replication form a Couchbase Cluster.
Also, a Couchbase Bucket is somewhat like a table in RDBMS wherein you put your documents. But how can I relate the Node with the Bucket? Can someone please explain me about it in simple terms?
a Node is a single machine (1 IP/ hostname) that executes Couchbase Server
a Cluster is a group of Nodes that talk together. Data is distributed between the nodes automatically, so the load is balanced. The cluster can also provides replication of data for resilience.
a Bucket is the "logical" entity where your data is stored. It is both a namespace (like a database schema) and a table, to some extent. You can store multiple types of data in a single bucket, it doesn't care what form the data takes as long as it is a key and its associated value (so you can store users, apples and oranges in a same Bucket).
The bucket acts gives the level of granularity for things like configuration (how much of the available memory do you want to dedicate to this bucket?), replication factor (how many backup copies of each document do you want in other nodes?), password protection...
Note that I said that Buckets where a "logical" entity? They are in fact divided into 1024 virtual fragments which are spread between all the nodes of the cluster (that's how data distribution is achieved).

How to ignore delete in couchbase unidirectional replication?

I have bucket1 at datacenter1 and i am replicating it on datacenter2 via XDCR (unidirectional)
Now i just want that if there is any update or creation of document occur on bucket1 on datacenter1 then it should get replicated on datacenter2 but if there is any delete operation on any document of bucket1 on datacenter1 it should not replicate on datacenter2 .
In simple terms , i want to ignore delete operation on documents while unidirectional replication .
How can i do that ? Is there any setting in couchbase (i am using couchbase server 3.0) or any other way possible ?
At the moment, there is no way to do that with XDCR in Couchbase. Is your goal to archive all data in one location but have just operational data in the other? It may be worth filing a feature request over at https://issues.couchbase.com.
Since the source is open and what you're talking about isn't a super hard feature, you could try implementing it at the receiving side. Most of the hard work is configuration/UI kinds of things.

Mysql cluster for dummies

So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.
1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field
1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!