Couchbase Document with abnormal metadata - couchbase

We're running Couchbase Server with Sync Gateway in our project. When I query Couchbase, I encounter documents with Metadata below. It has fields flags and expiration equal to zero.
The problem is, this document is not in the DB yesterday. Is it normal to have a document with metadata like below? Document body has a lastupdate field that points to a date last year. It's possible that we deleted it before, but it reappears now.
When I check couchbase and Sync Gateway logs, I didn't encounter any logs belonging to its id.
{
"flags": 0,
"expiration": 0,
"id": "SomeIDXXX::YYY",
"cas": 1669165471522357248,
"type": "json"
}

Related

Growing number of couchbase binary documents in XDCR destination bucket

I am running couchbase enterprise edition version 6.6.2 on Windows server 2016 standard edition.
I have two buckets called A and B. Bucket A is configured to run with enable_shared_bucket_access = true, my sync gateway creates new documents in bucket A, a bunch of services change and delete these documents.
XDCR replicates documents from bucket A to bucket B. All changes to documents in bucket A are replicated to bucket B, except deletions in bucket A are not replicated to bucket B. When documents in bucket B get older than 62 days they get deleted by an external service.
Over time I noticed that 93% of the documents in bucket B are binary documents! My own documents are in JSON, I don’t use any kind of binary documents in my solution. This leads me to the conclusion that these binary documents are some internal couchbase documents.
Here is a example of these binary documents
{
"$1": {
"cas": 1667520921496387584,
"expiration": 0,
"flags": 50331648,
"id": "_sync:rev:00001abd-1f99-4b4e-a695-d11574ea9ed8:0:",
"type": "base64"
},
"pa": "<binary (1 b)>"
},
{
"$1": {
"cas": 1667484959445614592,
"expiration": 0,
"flags": 50331648,
"id": "_sync:rev:00001abd-1f99-4b4e-a695-d11574ea9ed8:34:2-d3fb2d58672f853d98ce343d3ae84c1d",
"type": "base64"
},
"pa": "<binary (1129 b)>"
}
My issue with these documents is that they increase dramatically over time! and they don’t get cleaned up automatically! So they just grow and consume resources!
What are these documents used for?
Why aren’t these documents cleaned automatically?
Is it safe to simply delete these documents?
Is this a bug or a feature? :-)
Regards,
Siraf
The issue was solved by adding this AND NOT REGEXP_CONTAINS(META().id,"^_sync:rev") to the XDCR replication filter expression. This stopped binary documents do be replicated from bucket A to B.

Couchbase deleted documents reappearing in database

We are experiencing a problem where deleted documents are reappearing on our Couchbase server.
We have a scenario where documents are created on CBL. These documents are synced up to the server. The user realizes an error has been made and flags the document as incorrect. On the server the admin can then view all of the flagged documents and delete them from the server. The sync gateway has been setup to only sync up these types of documents, i.e. once an edit has been made to these documents on the server the changes are not synced back down to CBL.
Here is the process of what is happening:
Document is created on CBL with a TTL of 15 days and synced to sync
Document is updated on CBL and synced to sync gateway.
Document is deleted from the Couchbase Server bucket with a DELETE
N1QL query.
After the document is deleted from the bucket it gets
randomly added again within a few days.
Only documents that are
still on the devices i.e. not older than TTL of 15 days, are added
back to the bucket.
We tried increasing the Metadata Purge Interval to more than 15 days but this did not resolve the problem.
Does anybody have any suggestions or possibly know what could be the problem here?
Couchbase Server Community Edition 6.5.1 build 6299
Sync gateway 2.7.3
Couchbase Lite Android 2.8.1
Thanks in advance!
PS: Here is our Sync Gateway config with the sync function:
"log": [
"*"
],
"adminInterface": "0.0.0.0:4985",
"interface": "0.0.0.0:4984",
"databases": {
"prod": {
"server": "http://localhost:8091",
"bucket": "prod_bukcet",
"username": "sync_gateway",
"password": "XXX",
"enable_shared_bucket_access": true,
"import_docs": "continuous",
"use_views": true,
"users": {
"user_X": {
"password": "XXX",
"admin_channels": ["*"],
"disabled": false
}
},
"sync":`
function sync(doc, oldDoc) {
/* sanity check */
// check if document was removed from server or via SDK
// In this case, just return
if (isRemoved()) {
return;
}
//Only sync down documents that are created on the server
if (doc.deviceDoc == true) {
channel("server");
} else {
if (doc.siteId) {
channel(doc.siteId);
} else {
channel("devices");
}
}
// This is when document is removed via SDK or directly on server
function isRemoved() {
return (isDelete() && oldDoc == null);
}
function isDelete() {
return (doc._deleted == true);
}
}`,
}
}
}
In shared bucket access mode (enable_shared_bucket_access:true),a N1QL delete on a document creates a tombstone. Tombstones are always synced. The metadata purge interval setting on the server determines the period after which the tombstone gets purged on the server. So it is typical to set it to a value that matches the maximum partition window of the client- that is to ensure that all disconnected clients have the opportunity to get the deleted document. So setting it to > 15 days just means that the tombstone will be purged after 15 days and so tombstoned documents will be synced down to clients in the meantime.
In your case, if you don't want documents to be synced down to clients because the lifetime of the document is managed independently on CBL side via the expirationDate(), then purge the document instead of deleting it on server.

sensu client subscriptions non-responding

I have setup sensu-server and client successfully and all is working except one thing . in this image
you can see that there are alerts for mysql and web ports.but I have given only "mysql" subscription right now in my client.json file in my client system. I have removed the "webserver" subscription from client.json (which I added initially before replacing it with "mysql" ) but still the checks associated with the "webserver" subscription are displayed. why is this? and how to display only the checks associated with the given subcription. here is my client.json
{
"client": {
"name": "sensuclient2",
"address": "127.0.0.1",
"keepalive": {
"thresholds": {
"warning": 60,
"critical": 120
},
"handlers": ["default", "mailer", "sns"]
},
"subscriptions": [
"mysql"
]
}
}
It's possible Uchiwa is showing older checks, prior to the change you made to your client configuration file (at least I went through that once!). Try deleting the events. If the API is not running the checks anymore, the events won't come up again.
You can either use sensu-cli to delete the events:
sensu-cli event delete sensuclient2 check_http
https://github.com/agent462/sensu-cli
Or make an API call...
curl -s -i -X DELETE http://yourhost:yourport/events/sensuclient2/check_http
https://sensuapp.org/docs/1.1/api/events-api.html#eventsclientcheck-delete
If the checks do come back you should check both server and and client side checks and client configuration.
Also, the simplest is the best, #vishal.k himself reminded me:
you can always delete the events using Uchiwa's interface. :)

HTTP ReST: update large collections: better approach than JSON PATCH?

I am designing a web service to regularly receive updates to lists. At this point, a list can still be modeled as a single entity (/lists/myList) or an actual collection with many resources (/lists/myList/entries/<ID>). The lists are large (millions of entries) and the updates are small (often less than 10 changes).
The client will get web service URLs and lists to distribute, e.g.:
http://hostA/service/lists: list1, list2
http://hostB/service/lists: list2, list3
http://hostC/service/lists: list1, list3
It will then push lists and updates as configured. It is likely but undetermined if there is some database behind the web service URLs.
I have been researching and it seems a HTTP PATCH using the JSON patch format is the best approach.
Context and examples:
Each list has an identifying name, a priority and millions of entries. Each entry has an ID (determined by the client) and several optional attributes. Example to create a list "requiredItems" with priority 1 and two list entries:
PUT /lists/requiredItems
Content-Type: application/json
{
"priority": 1,
"entries": {
"1": {
"color": "red",
"validUntil": "2016-06-29T08:45:00Z"
},
"2": {
"country": "US"
}
}
}
For updates, the client would first need to know what the list looks like now on the server. For this I would add a property "revision" to the list entity.
Then, I would query this attribute:
GET /lists/requiredItems?property=revision
Then the client would see what needs to change between the revision on the server and the latest revision known by the client and compose a JSON patch. Example:
PATCH /list/requiredItems
Content-Type: application/json-patch+json
[
{ "op": "test", "path": "revision", "value": 3 },
{ "op": "add", "path": "entries/3", "value": { "color": "blue" } },
{ "op": "remove", "path": "entries/1" },
{ "op": "remove", "path": "entries/2/country" },
{ "op": "add", "path": "entries/2/color", "value": "green" },
{ "op": "replace", "path": "revision", "value": 10 }
]
Questions:
This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH. Is there a more compatible approach without sacrificing HTTP compatibility (idempotency et cetera)?
Modelling the individual list entries as separate resources and using PUT and DELETE (perhaps with ETag and/or If-Match) seems an option (PUT /lists/requiredItems/entries/3, DELETE /lists/requiredItems/entries/1 PUT /lists/requiredItems/revision), but how would I make sure all those operations are applied when the network drops in the middle of an update chain? Is a HTTP PATCH allowed to work on multiple resources?
Is there a better way to 'version' the lists, perhaps implicitly also improving how they are updated? Note that the client determines the revision number.
Is it correct to query the revision number with GET /lists/requiredItems?property=revision? Should it be a separate resource like /lists/requiredItems/revision? If it should be a separate resource, how would I update it atomically (i.e. the list and revision are both updated or both not updated)?
Would it work in JSON patch to first test the revision value to be 3 and then update it to 10 in the same patch?
This approach has the drawback of slightly less client support due to the not-often-used HTTP verb PATCH.
As far as I can tell, PATCH is really only appropriate if your server is acting like a dumb document store, where the action is literally "please update your copy of the document according to the following description".
So if your resource really just is a JSON document that describes a list with millions of entries, then JSON-Patch is a great answer.
But if you are expecting that the patch will, as a side effect, update an entity in your domain, then I'm suspicious.
Is a HTTP PATCH allowed to work on multiple resources?
RFC 5789
The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources
I'm not keen on querying the revision number; it doesn't seem to have any clear advantage over using an ETag/If-Match approach. Some obvious disadvantages - the caches between you and the client don't know that the list and the version number are related; a cache will happily tell a client that version 12 of the list is version 7, or vice versa.
Answering my own question. My first bullet point may be opinion-based and, as has been pointed out, I've asked many questions in one post. Nevertheless, here's a summary of what was answered by others (VoiceOfUnreason) and my own additional research:
ETags are HTTP's resource 'hashes'. They can be combined with If-Match headers to have a versioning system. However, ETag-headers are normally not used to declare the ETag of a resource that is being created (PUT) or updated (POST/PATCH). The server storing the resource usually determines the ETag. I've not found anything explicitly forbidding this, but many implementations may assume that the server determines the ETag and get confused when it is provided with PUT or PATCH.
A separate revision resource is a valid alternative to ETags for versioning. This resource must be updated at the same time as the resource it is the revision of.
It is not semantically enforceable on a HTTP level to have commit/rollback transactions, unless by modelling the transaction itself as a ReST resource, which would make things much more complicated.
However, some properties of PATCH allow it to be used for this:
A HTTP PATCH must be atomic and can operate on multiple resources. RFC 5789:
The server MUST apply the entire set of changes atomically and never provide (e.g., in response to a GET during this operation) a partially modified representation. If the entire patch document cannot be successfully applied, then the server MUST NOT apply any of the changes.
The PATCH method affects the resource identified by the Request-URI, and it also MAY have side effects on other resources; i.e., new resources may be created, or existing ones modified, by the application of a PATCH. PATCH is neither safe nor idempotent
JSON PATCH can consist of multiple operations on multiple resources and all must be applied or none must be applied, making it an implicit transaction. RFC 6902: Operations are applied sequentially in the order they appear in the array.
Thus, the revision can be modeled as a separate resource and still be updated at the same time. Querying the current revision is a simple GET. Committing a transaction is a single PATCH request containing first a test of the revision, then the operations on the resource(s) and finally the operation to update the revision resource.
The server can still choose to publish the revision as ETag of the main resource.

Context Broker, ONTIMEINTERVAL subscribe immediatelly sends request to reference

The problem is even if I put condValues to PT10S, when I send request to contextBroker it requests back the reference url rigth away, not after 10 sec, and then it continues to send requests at 10 sec.
My question: is there a way to avoid the first initial request?
Here is a body of the request that I send to server where contextBroker is installed.
{
"entities": [{
"type": "Cycle",
"isPattern": "false",
"id": "someid"
}],
"attributes": [
...
],
"reference": "someurl"
"duration": "P1M",
"notifyConditions": [{
"type": "ONTIMEINTERVAL",
"condValues": [
"PT10S"
]
}]
}
At the present moment (Orion 1.1) initial notification cannot be avoided. However, being able to configure that behaviour would be an interesting feature to develop in the future and, consecuently, a github issue was created time ago about it.
In addition, note that ONTIMEINTERVAL subscriptions are no longer supported so you should avoid to use them:
ONTIMEINTERVAL subscriptions have several problems (introduce state in CB, thus making horizontal scaling configuration much harder, and makes it difficult to introduce pagination/filtering). Actually, they aren't really needed, as any use case based on ONTIMEINTERVAL notification can be converted to an equivalent use case in which the receptor runs queryContext at the same frequency (and taking advantage of the features of queryContext, such as pagination or filtering).
EDIT: the posibility of avoiding initial notification has been finally implemented at Orion. Details are at this section of the documentation. It is now in the master branch (so if you use fiware/orion:latest docker you will get it) and will be include in next Orion version (2.2.0).