Couchbase : 160x faster with only one node : why?

Couchbase : 160x faster with only one node : why? - couchbase

On couchbase website, one can see couchbase can easily reach 100 000 requests per second.
As my application needs basically only key/value store, I give a try to couchbase.
So I tried to build a small cluster within my provider.
I use python client, and Couchbase server 2.2.0 community edition.
With a single node into the "cluster" : I can do 16 000 requests per second : nice !
But when there are 2 nodes into the cluster, I got only 100 requests per second for 'set(key,val)', and the same for 'get(key)' (I used the default bucket).
This is for a very small number of keys : 10 000 keys, length : only 10 bytes !
When looking the stats, it seems there is no bottleneck (CPU/disk/RAM).
My hardware :
Core i5 (3.4 Ghz)
32 GB RAM
Disk : SSD 120Go
Network : Gigabit, bandwith limited to 200 Mbps
The only point I see is that I have a 10ms latency between the 2 nodes :
What should be a "good" latency between nodes ?
What performance I can expect with a gigabit connection ?
I used default bucket, should I use another one with specific parameters ?

10ms latency is pretty high if your running both your client and server in the same datacenter so the first thing I would do is try to figure out why your network giving you such high latencies.
As you mentioned you are doing about 100 ops/sec and this makes sense if your network latency is 10ms. This also means that your likely doing synchronous IO over the network. This means your waiting for one request to make a round-trip before sending the next. The python client should have async API's that allow you to send multiple requests without waiting for the responses to come back later. This will vastly improve the amount of ops/sec you can do.
I know the website mentions that Couchbase can do a 100k ops/sec for a single node, but I've gotten up to almost 250k ops/sec. The only things that will really slow you down is the network (which I maxed out in this case) and how many items are resident in memory when you request them since having to go to disk will lower your performance especially if you only have a few connections to the database.
Here's some answers to the questions you posted.
Nodes should be in the same datacenter if they are part of the same cluster. (Use the cross datacenter replication feature if they are in different data centers)
Expect to be able to max out the network connection and that the server will not be the bottleneck when all of your data is resident in memory.
There are no specific parameters that you need to tune in order to get performance from Couchbase.
[EDIT] There is no reason why 1 node would perform better than 2 nodes. In fact having more nodes should cause you to have more throughput.

Related

Perfomance issue (Nginx, NodeJs, Mysql)

I have the following problem.
Using REST, I am getting binary content (BLOBs) from a MySql database via a NodeJS Express app.
All works fine, but I am having issues scaling the solution.
I increased the number of NodeJS instances to 3 : they are running ports 4000,4001,4002.
On the same machine I have Nginx installed and configured to do a load balancing between my 3 instances.
I am using Apache Bench to do some perf testing.
Please see attached pic.
Assuming I have a dummy GET REST that goes to the db, reads the blob (roughly 600KB in size) and returns it back (all http), I am making 300 simultaneous calls. I would have thought that using nginx to distribute the requests would make it faster, but it does not.
Why is this happening?
I am assuming it has to do with MySql?
My NodeJs app is using a connection pool with a limit set to 100 connections. What should be the relation between this value and the max connection value in Mysql? If I increase the connection pool to a higher number of connections, I get worse results.
Any suggestion on how to scale?
Thanks!

"300 simultaneous" is folly. No one (today) has the resources to effectively do more than a few dozen of anything.
4 CPU cores -- If you go much beyond 4 threads, they will be stumbling over each over trying to get CPU time.
1 network -- Have you check to see whether your big blobs are using all the bandwidth, thereby being the bottleneck?
1 I/O channel -- Again, lots of data could be filling up the pathway to disk.
(This math is not quite right, but it makes a point...) You cannot effectively run any faster than what you can get from 4+1+1 "simultaneous" connections. (In reality, you may be able to, but not 300!)
The typical benchmarks try to find how many "connections" (or whatever) leads to the system keeling over. Those hard-to-read screenshots say about 7 per second is the limit.
I also quibble with the word "simultaneous". The only thing close to "simultaneous" (in your system) is the ability to use 4 cores "simultaneously". Every other metric involves sharing of resources. Based on what you say, ...
If you start about 7 each second, some resource will be topped out, but each request will be fast (perhaps less than a second)
If you start 300 all at once, they will stumble over each other, some of them taking perhaps minutes to finish.
There are two interesting metrics:
How many per second you can sustain. (Perhaps 7/sec)
How long the average (and, perhaps, the 95% percentile) takes.
Try 10 "simultaneous" connections and report back. Try 7. Try some other small numbers like those.

Getting a very bad performance with galera as compared to a standalone mariaDB server

I am getting an unacceptable low performance with the galera setup i created. In my setup there are 2 nodes in active-active and i am doing read/writes on both the nodes in a round robin fashion using HA-proxy load balancer.
I was easily able to get over 10000 TPS on my application with the single mariadb server with the below configuration:
36 vpcu, 60 GB RAM, SSD, 10Gig dedicated pipe
With galera i am hardly getting 3500 TPS although i am using 2 nodes(36vcpu, 60 GB RAM) of DB load balanced by ha-proxy. For information, ha-proxy is hosted as a standalone node on a different server. I have removed ha-proxy as of now but there is no improvement in performance.
Can someone please suggest some tuning parameters in my.cnf i should consider to tune this severely under-performing setup.
I am using the below my.cnf file:

I was easily able to get over 10000 TPS on my application with the
single mariadb server with the below configuration: 36 vpcu, 60 GB
RAM, SSD, 10Gig dedicated pipe
With galera i am hardly getting 3500 TPS although i am using 2
nodes(36vcpu, 60 GB RAM) of DB load balanced by ha-proxy.
Clusters based on Galera are not designed to scale writes as I see you intend to do; In fact, as Rick mentioned above: sending writes to multiple nodes for the same tables will end up causing certification conflicts that will reflect as deadlocks for your application, adding huge overhead.
I am getting an unacceptable low performance with the galera setup i
created. In my setup there are 2 nodes in active-active and i am doing
read/writes on both the nodes in a round robin fashion using HA-proxy
load balancer.
Please send all writes to a single node and see if that improves performane; There will always be some overhead due to the nature of virtually-synchronous replication that Galera uses, which literally adds network overhead to each write you perform (albeit true clock-based parallel replication will offset this impact quite a bit, still you are bound to see slightly lower throughput volumes).
Also make sure to keep your transactions short and COMMIT as soon as you are done with an atomic unit of work, since replication-certification process is single-threaded and will stall writes on the other nodes (if you see that your writer node shows transactions wsrep pre-commit stage that means the other nodes are doing certification for a large transaction or that the node is suffering performance problems of some sort -swap, full disk, abusively large reads, etc.
Hope that helps, and let us know how it goes when you move to single node.

Turn off the QC:
query_cache_size = 0 -- not 22 bytes
query_cache_type = OFF -- QC is incompatible with Galera
Increase innodb_io_capacity
How far apart (ping time) are the two nodes?
Suggest you pretend that it is Master-Slave. That is, have HAProxy send all traffic to one node, leaving the other as a hot backup. Certain things can run faster in this mode; I don't know about your app.

How Couchbase Server support high concurrency and high throughput

I am curious to know, how couchbase server support high concurrency and high throughput.

It's a very broad question to answer but I'll try to cover some of the key reasons for why Couchbase is fast and scalable.
Writes in Couchbase are by default asynchronous,replication and persistence happen in the background, and the smart clients (SKD's) are notified of success or failure. So basically any new documents or mutations to documents are written to ram and then asynchronously flushed to disk in the background and replicated to other nodes. This means that there is no waiting time or contention on IO/disk speed. (This means it is possible to write to ram and then the node to fall over before the request has been persisted to disk or replicated to a secondary/third node). It is possible to make writes synchronously but it will slow down throughput considerably.
When dealing with ram, writes and read are VERY fast (we've only pushed our cluster to 20k operations a second) but large companies easily hit upwards of 400k operations a second. LinkedIN sustain this ops rate with only 4 nodes ---> http://www.couchbase.com/customer-stories
In traditional database architectures usually the setup would be a master DB (Mysql/Postgres/Oracle) coupled with a slave DB for data redundancy, also writes/reads can be split between the 2 as load gets higher. Couchbase is meant to be used as a distributed system (Couchbase recommend at least 3 nodes in production). Data is automatically sharded between the nodes in a cluster thus spreading the writes/reads across multiple machines. In the case of needing higher throughput, adding a node in Couchbase is as simple as clicking add node and then rebalance cluster, the data will be automatically partitioned across the new cluster map.
So essentially writing/reading from ram with async disk persistence + distributed reads and writes == high throughput
Hope that helps!

#scalabilitysolved already gave a great overview, but if you want a longer (and more detailed) description take a look at the Couchbase_Server_Architecture_Review on couchbase.com

CPU usage PostgreSQL vs MySQL on windows

Currently i have this server
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2392.149
cache size : 512 KB
My application cause more 96% of cpu usage to MySQL with 200-300 transactions per seconds.
Can anyone assist, provide links me on how
to do benchmark to PostgreSQL
do you think PostgreSQL can improve CPU utilization instead of MySQL
links , wiki that simply present the benchmark comparison

A common misconception for database users is that high CPU use is bad.
It isn't.
A database has exactly one speed: as fast as possible. It will always use up every resource it can, within administrator set limits, to execute your queries quickly.
Most queries require lots more of one particular resource than others. For most queries on bigger databases that resource is disk I/O, so the database will be thrashing your storage as fast as it can. While it is waiting for the hard drive it usually can't do any other work, so that thread/process will go to sleep and stop using the CPU.
Smaller databases, or queries on small datasets within big databases, often fit entirely in RAM. The operating system will cache the data from disk and have it sitting in RAM and ready to return when the database asks for it. This means the database isn't waiting for the disk and being forced to sleep, so it goes all-out processing the data with the CPU to get you your answers quickly.
There are two reasons you might care about CPU use:
You have something else running on that machine that isn't getting enough CPU time; or
You think that given the 100% cpu use you aren't getting enough performance from your database
For the first point, don't blame the database. It's an admin issue. Set operating system scheduler controls like nice levels to re-prioritize the workload - or get a bigger server that can do all the work you require of it without falling behind.
For the second point you need to look at your database tuning, at your queries, etc. It's not a "database uses 100% cpu" problem, it's a "I'm not getting enough throughput and seem to be CPU-bound" problem. Database and query tuning is a big topic and not one I'll get into here, especially since I don't generally use MySQL.

Does MySQL packet size cause slowdown?

I have written a program which uses a MySQL database, and transaction between the database server (a very powerful one) and the client is happening over a ADSL connection (1 Mbit/s).
But I have a very very slow connection between each client and the server. Only approximately 3-4 KB/s data is send through the server. Neither the server nor the clients use the Internet for other purposes, just my program uses the Internet.
I can't figure out why? Is the reason MySQL server packet size?
Any suggestions?

Try using mytop to identify the server low performance cause.
Another one: you may be using SELECT COUNT(*) FROM .. for large InnoDB tables which causes a table scan.
And can you test for some other services whether the exchange data rate between the machines is OK? Even if the even if the output bandwidth is lower for ADSL users 3-4 kB might not be the reason of low performance.

The effective transfer rate is often heavily limited by the number of roundtrips between client and server. Without seeing your code it is sort of difficult to tell, but you should check the number of requests happening.
If you have a single request that results in many records being returned, you should see a better usage of bandwidth than with a higher number of requests which only deliver a few rows each.
In the latter case the actual result transfer is probably quite fast, but the latencies involved in the "control communications" (i. e. the statements themselves, login requests etc.) will add up, effectively lowering overall throughput.
As for the packet size: When it is very small, there is more overhead in the communications, increasing the aforementioned effect. The server's default max_allowed_packet size if 1MB if memory serves, but that should be fine with your connection.

You first have to debug both connections.
What is your upload speed if you upload a file with WinSCP ot equivalent to the MySQL server? It should be near 90 KB/s with ADSL 1 Mbit/s.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008