Couchbase Views in Cluster inconsistency

Couchbase Views in Cluster inconsistency - couchbase

When I query a view from a Couchbase cluster,
will each machine return the same result for the view query?
I would expect the cluster to return the same response regardless of which machine actually responds to the request.

How critically does your application depend upon consistent view results? Remember that Couchbase indices are eventually consistent, meaning that the clusters will not be updated all at the same time, especially when there is a high volume of data changes. So, for data that has been around for a while, you could expect consistent result sets between machines; however, data that has very recently changed may not be reflected in the latest view query. The key is to design your application to deal with this case.

The storage of Couchbase views is spread across the cluster nodes, just like the data is. For example on a three node cluster and with one view, 1/3 of the view would be on each node of the cluster. The part of the view on each cluster is the part that corresponds to the data in the vBuckets on that node. So when you query a Couchbase view, it goes to each node to work on that query. This is all transparent to you as you are using the SDK, but that is what is happening in the background. When a rebalance happens, the views change too since the vBuckets are moving. By default IndexAwareRebalances are set to true.
You also have to realize how often views are updated. By default, it is every 5 seconds AND if there are 5000 data mutations. Not or. These defaults can be tuned, but if for example you only have 1000 mutations, the indexer will not run on its own. Be careful with that stale=false thing too. It can get you into real trouble using it all of the time, especially during rebalances.
Also, know that Couchbase is strongly consistent, but views are eventually consistent.

Related

AWS RDS MySQL Read Replica - do I need to update my API to point to the replica for any URIs that are only reading?

This is the first time I have worked with a read replica so please bare with me. I tried multiple searches on here and google to no avail.
I know how a primary database instance and a read replica on RDS. Replication is working great, I can connect to both instances with no problem. My question is two fold.
Do I need to update my API models so that read only operations are directed to the replica?
Are there any tips to get the best performance from this setup? Can I adjust indexes on the master so that write commands have less indexes than tables I need for read? Lastly, some of my models read a dataset, return the value, then log the activity in an audit log table. Would I need to split this functionality out so the logging activity occurs on the main write database?
Appreciate any guidance here.

It depends. The replica will always be slightly behind its source instance, so if you have queries that have a requirement to read current data always, they need to read from the source instance.
Different queries even in your single app will have different needs in that respect. So you really need to decide on a case-by-case basis which query can read from the read-replica instance.
I wrote a presentation about this: Read Write Splitting in MySQL or the video.
Regarding performance, you may have different secondary indexes on the same tables on each instance. Indexes depend on which queries you will run, so you have to decide first which queries you will run on the source versus the replica.
But consider that if the read-replica goes offline for some reason, or falls too far behind or something, you might want to be able to run the same queries on the source instance until the replica is ready again. In that case, you'd want the same indexes on both, or else your query will run very slowly.
Another possibility is that the source instance dies for some reason. In the cloud, you must always have a plan for resources disappearing on short notice, including databases! It happens. So if the source instance dies, the read-replica could take over as your new source. But if it has different indexes, different tables, different instance size, or whatever, then it may not be prepared to be that substitute.
It's simpler to just make the source and replica the same in as many ways as you can. Less details to track, and you're prepared for failover.

AWS RDS MySQL performance drop after random timespan

QUESTION OUTLINE
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage. (see below the question update for detailed problem description)
Question Update
So after more than one month of investigating and some developer support by AWS, I am not exactly closer to a solution.
Here are a couple of steps which I checked off the list, more or less without any further hint of the problem:
Index / Fragmentation (all tables have correct indexes/keys and have no fragmentation)
MySQL Stats Update (manually updating stats source)
Thread Concurrency (changing innodb_thread_concurrency to various different parameters)
Query Cache Hit Ratio doesn't show problems
EXPLAIN to see if any SELECTs are actually slow or not using indexes/keys
SLOW QUERY LOG (returns no results, because see paragraph below, it's a number of prepared SELECTs)
RDS and EC2 are within one VPC
For explanation, the used PlayFramework (2.3.8) has BoneCP and we are using eBeans to select our data. So basically I am running through a nested object and all those child objects, this produces a couple of hundred prepared SELECTs for the API call in question. This should basically also be fine for the used hardware, neither CPU nor RAM are extensively used by these operations.
I also included NewRelic for more insights on this issue and did some JVM profiling. Obviously, most of the time is consumed by NETTY/eBeans?
Is anyone able to make sense of this?
ORIGINAL QUESTION: Problem Outline
Our AWS RDS instance starts slowing down after about 7-14 days, by a quite large factor (~400% load times for a specific set of queries). RDS monitoring shows no signs of resource shortage.
Infrastructure
We run a PlayFramework backend for a mobile app on AWS EC2 instances, connected to AWS RDS MySQL instances, one PROD environment, one DEV environment. Usually the PROD EC2 instance is pointing to the PROD RDS instance, and the DEV EC2 points to the DEV RDS (hi from captain obvious!); however sometimes we also let the DEV EC2 point to the PROD DB for some testing purposes. The PlayFramework in use is working with BoneCP.
Detailed Problem Description
In a quite essential sync process, our app is making a certain API call many times a day per user. I discussed the backgrounds of the functionality in this SO question, where, thanks to comments, I could nail the problem down to be a MySQL issue of some kind.
In short, the API call is loading a set of data, the maximum is about 1MB of json data, which currently takes about 18s to load. When things are running perfectly fine, this takes about 4s to load.
Curious enough, what "solved" the problem last time was upgrading the RDS instance to another instance type (from db.m3.large to db.m4.large, which is just a very marginal step). Now, after about 2-3 weeks, the RDS instance is once again performing slow as before. Rebooting the RDS instance showed no effect. Also re-launching the EC2 instance shows no effect.
I also checked if the indices of the affected mySQL tables are set properly, which is the case. The API call itself is not eager-loading any BLOB fields or similar, I double-checked this. The CPU-usage of the RDS instances is below 1% most of the time, when I stress tested it with 100 simultaneous API calls, it went to ~5%, so this is not the bottleneck. Memory is fine too, so I guess the RDS instance doesn't start swapping which could slow down the whole process.
Giving hard evidence, a (smaller) public API call on the DEV environment currently takes 2.30s load, on the PROD environment it takes 4.86s. Which is interesting, because the DEV environment has both in EC2 and RDS a much smaller instance type. So basically the turtle wins the race here. (If you are interested in this API call I am happy to share it with you via PN, but I don't really want to post links to API calls, even if they are basically public.)
Conclusion
Concluding, it feels (I wittingly say 'feels') like the DB is clogged after x days of usage / after a certain amount of API calls. Not sure if this a RDS-specific issue, once I 'largely' reset the DB instance by changing the instance type, things run fast and smooth. But re-creating my DB instance from a snapshot every 2 weeks is not an option, especially if I don't understand why this is happening.
Do you have any ideas what further steps I could take to investigate this matter?

(Too long for just a comment) I know you have checked a lot of things, but I would like to look at them with a different set of eyes...
Please provide
SHOW VARIABLES; (probably need post.it or something, due to size)
SHOW GLOBAL STATUS;
how much RAM? Sounds like 7.5G
The query. -- Unclear what query/queries you are using
SHOW CREATE TABLE for the table(s) in the query -- indexes, datatypes, etc
(Some of the above may help with "clogging over time" question.)
Meanwhile, here are some guesses/questions/etc...
Some other customer sharing the hardware is busy.
It could be a network problem?
Shrink long_query_time to 1 so you can catch slow queries.
When are backups done on your instance?
4s-18s to load a megabyte -- what percentage of that is SQL statements?
Do you "batch" the inserts? Is it a single transaction? Are lengthy queries going on at the same time?
What, if any, MySQL tunables did you change from the AWS defaults?
6GB buffer_pool on a 7.5GB partition? That sounds dangerously tight. Can you see if there was any swapping?
Any PARTITIONing involved? (Of course the CREATE will answer that.)

There is one very important bit of information missing from your description: The total allocated space for the database. I/O for RDS is around 3x the allocated space, so for a 100GB allocation, you should get around 300 IOPS. That allocated space also includes logs.
Since you don't really know what's going on, the first step should be to turn on detailed monitoring, which will give you more idea of what is happening on the instance.
Until you have additional stats gathered during a slowdown, you can try increasing the allocated space, which will increase the IOPS available.
Also, check the events for the db - are logs getting purged on a regular basis? That might indicate that there's not enough space.
Finally, you can try going with PIOPS (provisioned IOPS) if you have an idea of what the application needs, though at this point it sounds like that would be a guess.

maybe your burst credit balance is (slowly) being depleted? finally, you end up with baseline performance, which may appear "too slow".
this would also explain why the upgrade to another instance type did help, as you then start with a full burst balance again.
i would suggest to increase the size of the volume, even if you don't need the extra space, as the baseline performance grows linearly with volume size.

Couchbase development view doesn't show results until published to production

I've created a very simple development view on my production Couchbase server
function (doc, meta) {
if (meta.id.indexOf("user:") == 0) emit(meta.id, doc);
}
This view doesn't return any results. Testing the same view on my local couchbase server works fine.
Publishing this view as a production view works fine.
What could be wrong? I'm using Couchbase version: 2.2.0 community edition (build-837)

When developing views, Couchbase will only use a subset of data in your bucket for the results. IIRC this is one vBucket's worth of data. It uses a subset of data as some views can be quite large and when developing a new view you do not want to bog down the whole cluster. So while rare, it is possible that when you are running your development view, the subset of data picked up on that one cluster has no matching results. Once that index is promoted to be a production view, i.e. you are done developing it and are ready to use it in prod, then it'll use the whole data set in that bucket.
On a related note, it looks like you are emitting the entire document if I am not mistaken. Depending on your data set and the selectivity of this view on that data, that could get quite expensive as your data grows. So you will have to watch this. Make sure you have enough disk space for this and enough OS RAM (not taken up by Couchbase) to keep as much of that index in memory since it is treated differently than Couchbase objects. Emitting a lot of data can also slow down your rebalances, especially if you are using the default internal settings like index aware rebalances. I do not know your use case obviously, but it might be better to just emit the IDs and then bulk get those IDs or something. This might be one of those situations where can views do this, yes, but should they and is there a better way to solve the same thing. Anyhow, just something to think about.

Achieve a strong consistent view

I just started using couchbase and hoping to use it as my data store.
One of my requirements in performing a query that will return a certain field about all the documents in the store. This query is done once at the server startup.
For this purpose I need all the documents that exist and can't miss any of them.
I understand that views in couchbase are eventually consistent but I still hope this query can be done (at the cost of performance).
Notes about my configurating:
I have only one couchbase server instance (I dont need sharding or
replication)
I am using the java client (1.4.1)
What I have tried to do is saving my documents this way:
client.set(key, value, PersistTo.ONE).get();
And querying using:
query.setStale(Stale.FALSE);
Adding the PersistTo parameter caused the following exception:
Cause by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: <unknown>
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:167)
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:140)
So I guess I am actually asking 3 questions:
Is it possible to get the consistent results I need?
If so, is what I suggested the correct way of doing that?
How can I prevent those exceptions?
The mapping I'm using:
function (doc,meta) {
if (doc.doc_type && doc.doc_type == "MyType" && doc.myField) {
emit(meta.id,null);
}
}
Thank you

Is it possible to get the consistent results I need?
Yes it is possible to set Couchbase views to be consistent by setting the STALE flag to false as you've done. However there are performance impacts with this, so dependent on your data size the query may be slow, if you are only going to be doing it once a day then it should be ok.
Couchbase is designed to be a distributed system comprising of more than node, it's not really suitable for single node deployments. I have read (but can't find the link) that view performances are much better in larger clusters.
You are also forcing more of a sync processing model onto a system that shines with async requests, PersistTo is ok to use for some requests but not system wide on every call (personal opinion), it'll definitely throttle throughput and performance.
If so, is what I suggested the correct way of doing that?
You say the query is done after your application server is running, is this once per day or more? If once a day then your application should work (I'd consider upping the nodes ;)), if you have to do this query a lot and you are 'hammering' the node over and over with sets then I'd expect to see what you are currently experiencing.
How can I prevent those exceptions?
It could be a variety of reasons, what are the specs of your computer, RAM,CPU,DISK? How much ram is allocated to Couchbase, how much to your bucket, what % of the bucket ram is used?
I've personally seen this when I've hammered some lower end AWS instances on some not so amazing networks. What version of Couchbase are you using? It could be a whole variety of factors that and deserves to be a separate question.
Hope that helps!
EDIT regarding more information on the Stale = false parameter (from official docs)
http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-views-writing-stale
The index is updated before the query is executed. This ensures that any documents updated (and persisted to disk) are included in the view. The client will wait until the index has been updated before the query has executed, and therefore the response will be delayed until the updated index is available.

Mysql cluster for dummies

So what's the idea behind a cluster?
You have multiple machines with the same copy of the DB where you spread the read/write? Is this correct?
How does this idea work? When I make a select query the cluster analyzes which server has less read/writes and points my query to that server?
When you should start using a cluster, I know this is a tricky question, but mabe someone can give me an example like, 1 million visits and a 100 million rows DB.

1) Correct. Every data node does not hold a full copy of the cluster data, but every single bit of data is stored on at least two nodes.
2) Essentially correct. MySQL Cluster supports distributed transactions.
3) When vertical scaling is not possible anymore, and replication becomes impractical :)
As promised, some recommended readings:
Setting Up Multi-Master Circular Replication with MySQL (simple tutorial)
Circular Replication in MySQL (higher-level warnings about conflicts)
MySQL Cluster Multi-Computer How-To (step-by-step tutorial, it assumes multiple physical machines, but you can run your test with all processes running on the same machine by following these instructions)
The MySQL Performance Blog is a reference in this field

1->your 1st point is correct in a way.But i think if multiple machines would share the same data it would be replication instead of clustering.
In clustering the data is divided among the various machines and there is horizontal partitioning means the dividing of the data is based on the rows,the records are divided by using some algorithm among those machines.
the dividing of data is done in such a way that each record will get a unique key just as in case of a key-value pair and each machine also has a unique machine_id related which is used to define which key value pair would go to which machine.
we call each machine a cluster and each cluster consists of an individual mysql-server, individual data and a cluster manager.and also there is a data sharing between all the cluster nodes so that all the data is available to the every node at any time.
the retrieval of data is done through memcached devices/servers for fast retrieval and
there is also a replication server for a particular cluster to save the data.
2->yes, there is a possibility because there is a sharing of all the data among all the cluster nodes. and also you can use a load balancer to balance the load.But the idea of load balancer is quiet common because they are being used by most of the servers. but if you are trying you just for your knowledge then there is no need because you will not get to notice the type of load that creates the requirement of a load balancer the cluster manager itself can do the whole thing.
3->RandomSeed is right. you do feel the need of a cluster when your replication becomes impractical means if you are using the master server for writes and slave for reads then at some time when the traffic becomes huge such that the sever would not be able to work smoothly then you will feel the need of clustering. simply to speed up the whole process.
this is not the only case, this is just one of the scenario this is only just a case.
hope this is helpful for you!!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008