Couchbase sync gateway buckets and database - couchbase

I have an app I'm developing where each user has their own database and I want them to be able to sync across devices. It seems as if buckets and databases should have a one to one relationship (I tried putting two databases I created via the sync-gateway admin api in one bucket and the item count returned was the total of both databases combined). Creating a bucket per database seems a bit much because you have to define the amount of ram per bucket in advance which is less than ideal.
I'm trying to figure out how do buckets fit into the architecture? Do I need to create a bucket per database?

It's not recommended to have one database per user. Have a look at the "channels" feature to SyncGateway in the Couchbase documentation: http://docs.couchbase.com/sync-gateway/#developing

Related

Where can I find the clear definitions for a Couchbase Cluster, Couchbase Node and a Couchbase Bucket?

I am new to Couchbase and NoSQL terminologies. From my understanding a Couchbase node is a single system running a Couchbase Server application and a collection of such nodes having the same data by replication form a Couchbase Cluster.
Also, a Couchbase Bucket is somewhat like a table in RDBMS wherein you put your documents. But how can I relate the Node with the Bucket? Can someone please explain me about it in simple terms?
a Node is a single machine (1 IP/ hostname) that executes Couchbase Server
a Cluster is a group of Nodes that talk together. Data is distributed between the nodes automatically, so the load is balanced. The cluster can also provides replication of data for resilience.
a Bucket is the "logical" entity where your data is stored. It is both a namespace (like a database schema) and a table, to some extent. You can store multiple types of data in a single bucket, it doesn't care what form the data takes as long as it is a key and its associated value (so you can store users, apples and oranges in a same Bucket).
The bucket acts gives the level of granularity for things like configuration (how much of the available memory do you want to dedicate to this bucket?), replication factor (how many backup copies of each document do you want in other nodes?), password protection...
Note that I said that Buckets where a "logical" entity? They are in fact divided into 1024 virtual fragments which are spread between all the nodes of the cluster (that's how data distribution is achieved).

Magento importing customers(700k) using csv dataflow is too slow

I am building up a Magento eCommerce website, and importing customer profiles from old one. I am using CSV importer, and this process is way slower than I can imagine. It takes almost 4 seconds for just one customer. So far the process has been running for 6+ hours and only 30k customers imported. The CSV file is chunked into several ~10M smaller ones.
For now, I am using Amazon Web Services EC2 instance (micro) for development server. It has 1 vCPU (2.5GHz) and 1GiB memory. But I don't think this can be an issue. I increased php memory limit to 1G.
I've read an article saying that this speed issues of importing products are very common because of the Magento's EAV database system and the heavy PHP API modules [Speeding up Magento Imports]. It says that Magento sends 450 MySQL queries in order to import one single product. I have also seen a workaround using [Magmi] which tries to bypass Magento's API and insert data directly into MySQL tables. However AFAIK it doesn't seem to import customers, but only products and categories. I don't know if they (products and customers) are using the same mechanism.
I disabled caching management and set the option of index management to 'manual update'. Though customer profiles don't really use these processes.
Do you have any suggestion to increase this CSV importing speed?
[Follow-up]
I have found one of the problem sources, Amazon EC2 T2 instances. They use CPU Credits to control maximum CPU usage. For micro instances, the base CPU performance is limited to 10% of its capacity. I used all of the CPU credits, and the server didn't allow me to use the full CPU.
I just checked the importing performance now, and it is importing approximately 2 records per sec. I think this is still too slow, so I will wait more for any other solutions. Perhaps optimizations?
Forget about Magmi, forget about Dataflow. Do it the best practice magento way...
Use this one:
https://github.com/avstudnitz/AvS_FastSimpleImport
You can use any array to import entities product and customer.
If you need to update existing products by csv import than include only those column which you want to update with SKU. yes, sku has required column then you can include columns which are going to update the product attributes.
It imports products so fast!!!!

Mysql cluster for dummies

So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.
1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field
1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!

what is the proper way to separate data in couchbase

I am thinking of working with couchbase for my next web application, and I am wondering how my data should be structured, specifically the use of buckets. For example, assuming each user is going to have a lot of unique data, should a separate bucket be created for each user (maybe even for different categories of data)? Also I am wondering if there is any real advantage/disadvantage to separating data using buckets (aside from the obvious organizational benefit) instead of simply storing everything in one bucket.
You will not get any performance gain from using more or less buckets. The reason that Couchbase has buckets is so that it can be multi-tenant. The best use case I can think of for using multiple buckets is if you are a hosting provider and you want to have different users using the same database server. Buckets can be password protected and would prevent one user from accessing another users data.
Some people create multiple buckets for organizational purposes. Maybe you are running two different applications and you want the data to be separate or as you mentioned maybe you want to split data by category.
In terms of management though it is probably best to create as few buckets as possible for your application since it will simplify your client logic by reducing the amount of connections you need to Couchbase from you web-tier (client). For each bucket you have you must create a separate client connection.

Cloud service for large number of small MySQL databses?

I have an application which is going to be distributed to a hosting platform, most probably phpfog.
It is very similar to how WordPress.com operates, where each customer can host their own individual installation of the app on our servers. We host the 'work' files and provide the database (However, it is NOT WordPress; it's a custom app).
Each user of the application has their own separate MySQL database.
I am wondering what the most cost-effective service would be to provide this. It seems that most cloud services offer, for instance, one massive 50GB database. It is definitely conceivable that instead of an individual database, we have one huge one and prefix all the tables per user. But that seems really bloated and unwieldy. It's also not really possible without major structural changes to have one big database for everyone (And the same tables inside it for everyone) as the app is primarily designed to be standalone.
Each database really won't get that big. We are talking low GB - I'd suggest the biggest would be 5GB. However, there will be a LOT of them as obviously it's one per customer.
What would be the most cost- and performance-effective way of handling this?
Amazon RDS in fact provides a database server rather than an individual sales page; I misunderstood their offerings.
In this case, RDS is a drop-in replacement for existing MySQL databases and will work perfectly.