Possible to chain results in N1ql? - couchbase

I'm currently trying to do a bit of complex N1QL for a project I'm working on, theoretically I could do all of this processing in multiple N1QL calls and by parsing the results each time, however if possible I'd like for this to contained in one call.
What I would like to do is:
filter all documents that contain a "dataSync.test.id" field with more than 1 id
Read back all other ids in that list
Use that list to get other documents containing those ids
Get the "dataSync.test._channels" field for those documents (optionally a filter by docType might help parsing)
This would probably return a list of "dataSync.test._channels"
Is this possible in N1QL? It appears like it might be but I can't get the syntax right.
My data structures look a little like
{
"dataSync": {
"test": {
"_channels": [
"RP"
],
"id": [
"dataSync_user_1015",
"dataSync_user_1010",
"dataSync_user_1005"
],
"_lastUpdatedBy": "TEST"
}
},
...
}
{
"dataSync": {
"test": {
"_channels": [
"RSD"
],
"id": [
"dataSync_user_1010"
],
"_lastUpdatedBy": "TEST"
}
},
...
}

Yes. I think you can do all these.
Initial set of IDs with filtering can be retrieved as a subquery and then you can get subsquent documents by joins.
SELECT fulldoc
FROM (select meta().id as dockey from doc where a=1) as mydoc
INNER JOIN doc fulldoc ON KEYS mydoc.dockey;
There are optimizations that can be done here. Try the sequencing first to ensure you're get the job done.

Related

Goessner JSON Query Syntax and Filtering

Looking to filter this json body for specific key/values for when a certain condition is met.
For this body - I'd like to retrieve ONLY the recipient ID and Tracking Number for when the requester ID is 67890.
{
"metadata": "someinformation",
"access": "XXXX",
"recipient": {
"id": "12345"
},
"requester": {
"id": "67890"
},
"trackingNumber": "ABCDEF"
}
This would be using Goessner https://goessner.net/articles/JsonPath/index.html
I am able to get the attributes mostly using: $..[trackingNumber,requester,recipient] but it removes the key of "trackingNumber" and only does a value.
Also the filter I want to use alongside that would be: [?($.requester.id=="67890")]
The expectation is other requester ID's will be in other json bodies - but we only want to filter for the ones that have this present and select the specific attributes.
You going to need to do this in two queries, one for each value that you want back.
For recipient:
$[?(#.requester.id == '67890')].recipient.id
For tracking number:
$[?(#.requester.id == '67890')].trackingNumber
I don't think Goessner's implementation supports returning multiple values like you want. It's not something that will be supported in the upcoming spec, either.

How to merge two collections keeping the document with highest timestamp in MongoDB

I'm creating a MongoDB client for a Go Application, using the MongoDB Go Driver. In particular, I have two databases with one collection each. These collection can be modified asynchronously by different clients, so i need to periodically synchronize them, keeping the most recently edited document, among those with the same id field
The two databases are stored on different hosts, so i need to export the collection from one host using mongoexport and import into the other host using mongoimport.
I already tried using mongoimport --collection=myColl --mode=merge, but this doesn't fit my goal because simply overrides the conflicting documents from myColl with the imported ones.
My idea is to import the json into a temp collection, but i don't know how to compare the timestamps during the aggregation/merge process.
My collections are structured Like this, any idea?
Collection 1
{"_id":"K1","value":"VAL1","timest":{"$date":"2021-09-26T09:05:09.942Z"}}
{"_id":"K2","value":"VAL2","timest":{"$date":"2021-09-26T09:05:10.234Z"}}
Collection 2
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}
{"_id":"K3","value":"VAL4","timest":{"$date":"2021-09-26T09:15:10.234Z"}}
Desired Behaviour
Conflict
{"_id":"K2","value":"VAL2","timest":{"$date":"2021-09-26T09:05:10.234Z"}}
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}[LATEST]
Output
{"_id":"K1","value":"VAL1","timest":{"$date":"2021-09-26T09:05:09.942Z"}}
{"_id":"K2","value":"VAL3","timest":{"$date":"2021-09-26T09:15:09.942Z"}}
{"_id":"K3","value":"VAL4","timest":{"$date":"2021-09-26T09:15:10.234Z"}}
You can use $merge
The bellow merges testdb1.coll to testdb2.coll based on same _id
And keeps the document with the latest date. If _id is not found, then document is inserted.
Data in
testdb1.coll
[{"_id" "K2","value" "VAL3","timest" (date "2021-09-26T09:15:09.942Z")}
{"_id" "K3","value" "VAL4","timest" (date "2021-09-26T09:15:10.234Z")}]
testdb2.coll
[{"_id" "K1","value" "VAL1","timest" (date "2021-09-26T09:05:09.942Z")}
{"_id" "K2","value" "VAL2","timest" (date "2021-09-26T09:05:10.234Z")}]
Results
testdb2.coll (after the merge)
{"_id": "K1", "value": "VAL1", "timest": {"$toDate": "2021-09-26T09:05:09.942Z"}}
{"_id": "K2", "value": "VAL3", "timest": {"$toDate": "2021-09-26T09:15:09.942Z"}}
{"_id": "K3", "value": "VAL4", "timest": {"$toDate": "2021-09-26T09:15:10.234Z"}}
Query
(instead of $let you could use $$new)
client.db("testdb1").collection("coll").aggregate(
[
{
"$merge": {
"into": {
"db": "testdb2",
"coll": "coll"
},
"on": [
"_id"
],
"let": {
"p_ROOT": "$$ROOT"
},
"whenMatched": [
{
"$replaceRoot": {
"newRoot": {
"$cond": [
{
"$gt": [
"$$p_ROOT.timest",
"$timest"
]
},
"$$p_ROOT",
"$$ROOT"
]
}
}
}
],
"whenNotMatched": "insert"
}
}
])
You can do following in an aggregation pipeline:
use $unionWith to combine the 2 collections
$sort to order them by timest
use $first to get the latest document
use $replaceRoot to get the final form your want
Here is the Mongo playground for your reference.

Creating nodes and relations from JSON (dynamically)

I've got a couple hundred JSONs in a structure like the following example:
{
"JsonExport": [
{
"entities": [
{
"identity": "ENTITY_001",
"surname": "SMIT",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
},
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_002"
}
],
"entityEntityRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "FRIENDS_WITH",
"childIdentification": "ENTITY_002"
}
]
},
{
"identity": "ENTITY_002",
"surname": "JACKSON",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_002",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
}
]
},
{
"identity": "ENTITY_003",
"surname": "JOHNSON"
}
],
"identification": "REGISTRATION_001",
"locations": [
{
"city": "LONDON",
"identity": "LOCATION_001"
},
{
"city": "PARIS",
"identity": "LOCATION_002"
}
]
}
]
}
With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})
All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:
MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)
However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.
Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.
Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.
This may work for you:
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
MERGE (e:Entities {identity: ent.identity})
ON CREATE SET e.name = ent.surname
MERGE (e)-[:REGISTERED_ON]->(r)
FOREACH(loc1 IN ent.entityLocationRelation |
MERGE (l1:Locations {identity: loc1.locationIdentity})
MERGE (e)-[:SEEN_AT]->(l1))
FOREACH(ent2 IN ent.entityEntityRelation |
MERGE (e2:Entities {identity: ent2.childIdentification})
MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
MERGE (l:Locations{identity:loc.identity})
ON CREATE SET l.name = loc.city
MERGE (l)-[:REGISTERED_ON]->(r)
)
For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.
So playing with neo4j/cyper I've learned some new stuff and came to another solution for the problem. Based on the given example data, the following can create the nodes and edges dynamically.
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
CALL apoc.merge.node(['Registration'], {id:data.identification}, {},{}) YIELD node AS vReg
UNWIND data.entities AS ent
CALL apoc.merge.node(['Person'], {id:ent.identity}, {}, {id:ent.identity, surname:ent.surname}) YIELD node AS vPer1
UNWIND ent.entityEntityRelation AS entRel
CALL apoc.merge.node(['Person'],{id:entRel.childIdentification},{id:entRel.childIdentification},{}) YIELD node AS vPer2
CALL apoc.merge.relationship(vPer1, entRel.typeRelation, {},{},vPer2) YIELD rel AS ePer
UNWIND data.locations AS loc
CALL apoc.merge.node(['Location'], {id:loc.identity}, {name:loc.city}) YIELD node AS vLoc
UNWIND ent.entityLocationRelation AS locRel
CALL apoc.merge.relationship(vPer1, locRel.typeRelation, {},{},vLoc) YIELD rel AS eLoc
CALL apoc.merge.relationship(vLoc, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg1
CALL apoc.merge.relationship(vPer1, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg2
CALL apoc.merge.relationship(vPer2, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg3
RETURN vPer1,vPer2, vReg, vLoc, eLoc, eReg1, eReg2, eReg3

How to query documents which are arrays

I'm a novice Couchbase user and I have a bucket which I've created that contains documents which are actually arrays in the form of:
{
"key": [
{
"data1": "somedata1"
},
{
"data2": "somedata2"
}
]
}
I want to query these documents via N1QL statements and have yet to find a solution to how to do this properly. More specifically, I would like to select fields inside each sub-document that is in an array of a certain key. For example, I would like to access: key.[1].data2 or key.[0].data1
How should I do it?
Couchbase has some reserved keywords that need to be escaped. In this case, key needs to be escaped. For example, if you're querying against my_bucket, then
SELECT my_bucket.`key`[0].data1 FROM my_bucket;
should return somedata1

How do I query a complex JSONB field in Django 1.9

I have a table item with a field called data of type JSONB. I would like to query all items that have text that equals 'Super'. I am trying to do this currently by doing this:
Item.objects.filter(Q(data__areas__texts__text='Super'))
Django debug toolbar is reporting the query used for this is:
WHERE "item"."data" #> ARRAY['areas', 'texts', 'text'] = '"Super"'
But I'm not getting back any matching results. How can I query this using Django? If it's not possible in Django, then how can I query this in Postgresql?
Here's an example of the contents of the data field:
{
"areas": [
{
"texts": [
{
"text": "Super"
}
]
},
{
"texts": [
{
"text": "Duper"
}
]
}
]
}
try Item.objects.filter(data__areas__0__texts__0__text='Super')
it is not exact answer, but it can clarify some jsonb filter features, also read django docs
I am not sure what you want to achieve with this structure, but I was able to get the desired result only with strange raw query, it can look like this:
Item.objects.raw("SELECT id, data FROM (SELECT id, data, jsonb_array_elements(\"table_name\".\"data\" #> '{areas}') as areas_data from \"table_name\") foo WHERE areas_data #> '{texts}' #> '[{\"text\": \"Super\"}]'")
Dont forget to change table_name in query (in your case it should be yourappname_item).
Not sure you can use this query in real programs, but it probably can help you to find a way for a better solution.
Also, there is very good intro to jsonb query syntax
Hope it will help you