Creating couchbase full replica - couchbase

I have an environment with couchbase server that contains a lot of information (buckets, documents, indexes...). I want to Copy all this information to another environment's couchbase server. Is there any way to achive that? Maybe saving something locally to my computer and then upload it?

cbbackupmgr is probably what you are looking for (depending on which version of Couchbase you're using). It backs up (and restores) bucket settings, view definitions, GSI definitions, FTS index definitions, and key-value data.

Related

Storing in a JSON file vs. JSON object vs. MYSQL database

I've programmed a chat application using nodejs, expressjs and socket.io.
So, When I used MYSQL database the application slows down then I replaced it with storing the data using JSON objects (in the server-side of nodejs).
Now everything is ok and the application is working well, but If I want to release an update of the nodejs' app.js file it should be restarted, so, everything in the JSON objects will be deleted!
How can I fix this problem? and can I store in a JSON file to fix it? and will the application stay at the same speed or what?
Storing data in RAM will be always faster than writing to a database, but the problem in your case you need to persist it somewhere.
There are many solutions to this problem but if you are using JSON's, I recommend to look at mongodb database.
MongoDb supports multiple storage engines, but the one that you are interested is in memory.
An interesting architecture for you can be the following replica set configuration:
Primary with in-memory storage engine
Secondary with in-memory storage engine
An other Secondary with wiredtiger storage engine.
You will have the benefits of speed by storing in RAM, and also it will be persisted in the database.
A simpler possibility will be to use a key-store db like redis. Easier to configure.

Couchbase -> PouchDB deleted document issue

I have this setup: Couchbase -> Sync Gateway -> PouchDB.
I had a document in couchbase bucket which was deleted (manual delete from bucket) at a later point of time,
Problem is when the sync happens on a new client., I get with other documents that specific deleted document also (Can see doc id and other data).
Strange thing is I cannot find that doc anywhere in couchbase buckets neither in _sync bucket.
I also used cbft(Couchbase Full Text Search)., the max it gives is the _sync information of that document.
Could it be possible that it exists in rev cache and trying to replicate?
Any help is appreciated. Thanks in Advance.
Currently, in general, you should avoid manipulating documents in Couchbase Server if you're using it with Sync Gateway. The reason is Sync Gateway (and Couchbase Lite) need extra meta-data to syncing, versioning, and conflict resolution.
If you set up bucket shadowing (which is deprecated), there is a "shadow bucket" that works along with a normal bucket to allow accessing a db through CB Server and Sync Gateway. It sounds like your doc is still in the shadow bucket.
Best practice is to run everything through Sync Gateway, and not manipulate documents directly on CB Server (meaning treat CB Server as a read-only source).

Mysql Data Directory on S3

I want Mysql to store it's data on Amazone S3, So I mounted an S3 bucket to my server and changed the path of data dir to mounted directory in my.cnf.
After doing this, I restarted the server and created the database and it caused no problem but when I try to create a table (say test), it gives me the following error.
ERROR 1033 (HY000): Incorrect information in file: './test/t.frm'
Can any one please tell me, what I am trying to oo is actually possible?
If yes, where am I going wrong?
If no, Why?
There is no viable solution for storing MySQL databases on S3. None.
There's nothing wrong with using s3fs in limited applications where it is appropriate, but it's not appropriate here.
S3 is not a filesystem. It is an object store. To modify a single byte of a multi-gigabyte "file" in S3 requires that the entire file be copied over itself.
Now... there are tools like s3nbd and s3backer that take a different approach to using S3 for storage. These use S3 to emulate a block device over which you can create a filesystem, and these would come closer than s3fs to being an appropriate bridge between what S3 is and what MySQL would need, but still this approach cannot reliably be used either, for one reason.
Consistency.
When MySQL writes data to a file, it needs absolute assurance that if it reads that same data, that it will get back what it wrote. S3 does not guarantee this.
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
https://aws.amazon.com/s3/faqs/
When an object in S3 is "modified" (that's done with an overwrite PUT), there is no guarantee that a read of that file won't return a previous version for a short time after the write occurred.
In short, you are pursuing an essentially impossible objective trying to use S3 for something it isn't designed to do.
There, is, however, a built-in mechanism in MySQL that can save on storage costs: InnoDB natively supports on-the-fly table compression.
Or if you have large, read-only MyISAM tables, those can also be compressed with myisampack.
Some EC2 instances include the ephemeral disks/instance store, which are zero-cost, but volatile, hard drives that should never be used for critical data, but that might be a good option to consider if the database in question is a secondary database that can easily be rebuilt from authoritative sources in the event of data loss. They can be quite nice for "disposable" databases, like QA databases or log analytics, where the database is not the authoritative store for the data.
Actually s3 is not really a file system so it will not work as data directory in normal scenario.
May be you can use it as data directory after mounting it with data directory like /var/lib/mysql but still it will perform slow. So I don't think that it is a good idea.
S3 bucket is a storage directory where you can store your images, files, backup files etc.
If still you want to use it as data directory then you can take help from here.
http://centosfaq.org/centos/s3-as-mysql-directory/
files cannot be appended/modified in AWS S3 once created. It might be not be possible to store Mysql DB on S3.
MySQL with RocksDB engine can possibly do this:
Run MyRocks on S3FS, or
Use rockset's RocksDB-Cloud and modify MyRocks to support RocksDB-Cloud.
Both solutions might do some modification on MyRocks.
See source codes:
MyRocks
RocksDB-cloud

With Solr, Do I Need A SQL db as well?

I'm thinking about using solr to implement spatial and text indexing. At the moment, I have entries going in to a MYSQL database as well as solr. When solr starts, it reads all the data from MYSQL. As new entries come in, my web servers write them to MYSQL and, at the same time, adds documents to solr. More and more, it seems that my MYSQL implementation is just becoming a write-only persisten store (more or less, a backup for the data in solr) - all of the reading of entries are done via solr queries. Really the only data being read from MYSQL is user info, which doesn't need to be indexed/searched.
A few questions:
Do I really need the MYSQL implementation or could I simply store all of my data in solr?
If solr only, what are the risks associated with this solution?
Thanks!
Almost always, the answer is yes. It needn't be a database necessarily, but you should retain the original data somewhere outside of Solr in the event you alter how you index the data in Solr. Unlike most databases, which Solr is not, Solr can't simple re-index itself. You could hypothetically configure your schema so that all your original data is marked as "stored" and then perhaps to a CSV dump and re-index that way, but I wouldn't recommend this approach.
Shameless plug: For any information on using Solr, I recommend my book.
I recommend a separate repository. MySQL is one choice. Some people use the filesystem.
You often want a different schema for searching than for storing. That is easy to do with a separate repository.
When you change the Solr schema, you need to reload the content. Unloading all the content from Solr can be slow. If it is already in a separate repository, then you don't need to dump it from Solr, you can overwrite what is there.
In general, making Solr be both a search engine and a repository really reduces your flexibility and options for making search the best it can be.

Is you're mainly archiving binary data and not serving it, is it better to store as BLOBs in MySQL or on S3?

I have an application where customers upload files like Powerpoints and Excel spreadsheets to the application through a web UI. The files then have meta data associated with them and they are stored as BLOBs in a MySQL database. The users may download these files occasionally, but not very often. The emphasis here is on archiving. Security of data is also important.
If that is the case, what are the pros and cons of storing the files as BLOBs in MySQL as opposed to putting them on Amazon S3? I've never used S3 before but hear that it's popular for storing files.
The main advantage of relational databases (such as MySQL) is the elegance it permits you to query for data. BLOB columns, however, offer very little in terms of rich query semantics compared to other column types, so If that's your main use case, there's hardly any reason to use a relational database at all, it doesn't offer much above and beyond a regular filesystem or simple key-value datastore (such as s3).
Dollars to bytes, s3 is likely much more cost effective.
On the other hand, there are some things that a relational database can bring that would be worhtwhile. The most obvious is transactional semantics (only on the InnoDB engine, not available with MyISAM), so that you can safely know that whole groups of uploads or modifications take place consistencly. Another advantage is that you can still add metadata about your blobs (even if it's only over time, as your application improves) so you can still benefit some from the rich queries MySQL supports.
storing binary data into blob
make your database fat
size limitation (is overcome at the later version in mysql)
data portability is not there (you need a mysql api/client to access the data)
there is no true security
If you are archiving the binary data,
store into normal disk file
If security is important,
consider separate between your UI server and storage server,
but is hard to archive,
you can always consider to embed password / encryption into these binary files
security over amazon s3
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingAuthAccess.html
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?S3_QSAuth.html
Security of data is also important.
Do note that files on S3 are not stored on encrypted disks, so you may have to encrypt client-side or on your servers before sending it up to S3.
I've been storing data in S3 for years and completely love it! What I do is upload the file to S3 (where its copied multiple times by the way) and then store a reference to the file path and name into my MySQL files table. If anything else, it takes that much load off of the MySQL DB and S3 now offers AES256 bit encryption with revolving master keys so you know its secure!