Updating JSON arrays in MarkLogic 9 - json

I'm having trouble working out how to write a bit of XQuery. I have a JSON structure in MarkLogic that looks like:
{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"really-basic-stuff",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}
The exclude array contains the identifiers for slides in decks (presentations are made up of one or more decks of slides). I'm trying to write a piece of code that will look for a slide id in that exclude list and remove it if present or add it if not (a toggle).
I can obtain the array node itself using:
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
but I can't for the life of me see how I then update that array to either remove an item or add it. The intention is call a function something like:
local:toggle-slide($presentation) as object-node()
{
(: xdmp:node-update(...) goes here :)
};
So, how do I update that array?

In memory JSON node trees (and XML trees, for that matter) are immutable.
The way to modify a tree is to construct a new tree, copying the nodes that haven't changed and creating the parent node and ancestor node with the changes.
That said, there's an easier way to modify JSON. If you call xdmp:from-json() on the root node, you will get a mutable in-memory map / array structure.
You can then navigate to the array using map:get() on the maps and [ITEM_NUMBER] on the arrays and delete or insert items FOR the appropriate json:array object.
When you're done, call xdmp:to-json() to turn the root map back into a node.
Hoping that helps,

If you need to update the json in the database, you can use xdmp:node-replace. The catch with node-replace is though, that you have to feed it with a named node. To do that, you need to wrap the array-node in an object-node, and then grab the array-node inside the object-node on the fly. Here a working example:
xquery version "1.0-ml";
(: insert test data :)
xdmp:document-insert("/presentations/presentation.json", xdmp:unquote('{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"markup-intro",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}'
))
;
(: node-replace array-node :)
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
return xdmp:node-replace($exclude, object-node{
"exclude": array-node{ "other", "slides" }
}/node())
;
(: view if changed :)
doc('/presentations/presentation.json')
Note: consider looking at MarkLogic's Server-side JavaScript (SJS) support. Updating JSON might seem more natural that way, particularly if you need to make multiple changes in one go.
HTH!

Related

getDegree()/isOutgoing() funcitons don't work in graphAware/neo4j-to-elasticsearch mapping.json

Neo4j Version: 3.2.2
Operating System: Ubuntu 16.04
I use getDegree() function in mapping.json file, but the return would always be null; I'm using the dataset neo4j tutorial Movie/Actor dataset.
Output from elasticsearch request
mapping.json
{
"defaults": {
"key_property": "uuid",
"nodes_index": "default-index-node",
"relationships_index": "default-index-relationship",
"include_remaining_properties": true
},
"node_mappings": [
{
"condition": "hasLabel('Person')",
"type": "getLabels()",
"properties": {
"getDegree": "getDegree()",
"getDegree(type)": "getDegree('ACTED_IN')",
"getDegree(direction)": "getGegree('OUTGOING')",
"getDegree('type', 'direction')": "getDegree('ACTED_IN', 'OUTGOING')",
"getDegree-degree": "degree"
}
}
],
"relationship_mappings": [
{
"condition": "allRelationships()",
"type": "type",
}
]
}
Also if I use isOutgoing(), isIncoming(), otherNode function in relationship_mappings properties part, elasticsearch would never load the relationship data from neo4j. I think I probably have some misunderstanding of this sentence only when one of the participating nodes "looking" at the relationship is provided on this page https://github.com/graphaware/neo4j-framework/tree/master/common#inclusion-policies
mapping.json
{
"defaults": {
"key_property": "uuid",
"nodes_index": "default-index-node",
"relationships_index": "default-index-relationship",
"include_remaining_properties": true
},
"node_mappings": [
{
"condition": "allNodes()",
"type": "getLabels()"
}
],
"relationship_mappings": [
{
"condition": "allRelationships()",
"type": "type",
"properties": {
"isOutgoing": "isOutgoing()",
"isIncomming": "isIncomming()",
"otherNode": "otherNode"
}
}
]
}
BTW, is there any page that list all of the functions that we can use in mapping.json? I know two of them
github.com/graphaware/neo4j-framework/tree/master/common#inclusion-policies
github.com/graphaware/neo4j-to-elasticsearch/blob/master/docs/json-mapper.md
but it seems there are more, since I can use getType(), which hasn't been listed in any of the above pages.
Please let me know if I can provide any further help to solve the problem
Thanks!
The getDegree() function is not available to use, in contrary to getType(). I will explain why :
When the mapper (the part responsible to create a node or relationship representation as ES document ) is doing its job, it receive a DetachedGraphObject being a detached node or relationship.
The meaning of detached is that it is happening outside of a transaction and thus query operations are not available against the database anymore. The getType() is available because it is part of the relationship metadata and it is cheap, however if we would want to do the same for getDegree() this can be seriously more costly during the DetachedObject creation (which happen in a tx) depending on the number of different types etc.
This is however something we are working on, by externalising the mapper in a standalone java application coupled with a broker like kafa, rabbit,.. between neo and this app. We would not, however offer the possibilty to requery the graph in the current version of the module as it can have serious performance impacts if the user is not very careful.
As last, the only suggestion I can give you is to keep a property on your node with the updates of degrees you need to replicate to ES.
UPDATE
Regarding this part of the documentation :
For Relationships only when one of the participating nodes "looking" at the relationship is provided:
This is used only when not using the json definition, so you can use one or the other. the json definition has been added later as addition and both cannot be used together.
For answering this part, it means that the nodes of the incoming or outgoing side, depending on the definition, should be included in the inclusion policy for nodes, like hasLabel('Employee') || hasProperty('form') || getProperty('age', 0) > 20 . If you have an allNodes policy then it is fine.

getting the parent object from child object in nested objects from JSON with type script

Is it possible to get the parent object from a child object dynamically? Essentially, all I'm trying to accomplish is to dynamically retrieve the value of a property belonging to a child objects' parent. For example, in the following Json, I want to extract the driver of a particular car.:
{
"driver": [
{
"id": 1, |
"name": "Bob", |=> this is the parent
"age": "34", |
"car": [
{
"make": "BMW", |
"model": "3.20", | this is the child
"colour": "Silver",|
"mileage": [
{
"total": "350523",
"year": [
{
"2011": "3535",
"2012": "7852",
"2013": "8045"
}
],
"month": [
{
"december": "966",
"november": "546",
"october": "7657"
}
]
}
]
}
]
}
]
}
Using for loops:
for(let parent of data.driver) {
for(let car of parent.car) {
if(car.make === 'BMW') {
// can do what you like with 'parent'
}
}
}
Using filter() or find() (standard javascript):
drivers_who_drive_bwm = data.driver.filter((parent) => {
// find() will give -1, if no car was found that matched
// car.make === 'BWM'
return parent['car'].find((car) => car.make === 'BWM') !== -1
})
Also:
Your naming conventions are confusing. I would expect driver.car to be a single car, in your code it's array of cars. If it always contains single car, then it would be better not to use array. Same for .driver. Better key would be .drivers to indicate multiple drivers. (but maybe it is XML converted to json, in that case you are stuck with it?)
Whatever strategy you choose, you will basically be iterating through and returning. So I feel the "best" strategy is using what you are most comfortable with.
Typescript is just Javascript. So if you are comfortable with a Javascript-ey "functional programming" way of doing things you can use Array map & filter.
You of course will have to deal with application specific logic you have not specified like "What happens when the same make/model exists across different drivers?".
If you are not comfortable with functional programming you can always build up a series of maps and then perform lookups.
But if you need to get it right, always do what you are comfortable doing.
To answer this question, an Object reference is just a memory location. there is no concept of parents comes here. its may not have any parent (just a logical thinking as parent, so may not any other object have property having reference to it), or may a lot of object have referred to same memory location (i.e. multiple parent by your logic).
1> So Either you can put parent reference to each child element programitically. Note, here you cannot do by parsing a JSON string, because its contains only JSON data, not reference as parse-able.
2> Or else Try to find out the driver object (i.e. parent object) having child object which contains your value according to your condition. you ca use filter , map of array functions in javascript to do so. but whatever you are doing is just iterating and find. in that case underscrore js will be a good library to use

Mongo DB query of complex json structure

Say I have a json structure like so:
{
"A":{
"name":"dog",
"foo":"bar",
"array":[
{"name":"one"},
{"name":"two"}
]
},
"B":{
"name":"cat",
"foo":"bar",
"array":[
{"name":"one"},
{"name":"three"}
]
}
}
I want to be able to do two things.
1: Query for any "name":* within "A.array".
2: Query for any "name":"one" within "*.array".
That is, any object within a specific document's array, and any specific object within any document's array.
I hope I have used proper terminology here, I am just starting to familiarize myself with a lot of these concepts. I have tried searching for an answer but am having trouble finding something like my case.
Thanks.
EDIT:
Since I still haven't really made progress towards this, I'll just explain what I'm trying to do: I want to use the "AllSets" dataset (after I trim it down below 16mb) available on mtgjson.com. I am having problems getting mongo to play nicely though.
In an effort to try and learn what's going on, I have downloaded one set: http://mtgjson.com/json/OGW.json.
Here is a photo of its structure laid out:
I am unable to even get mongo to return an object from within the cards array using:
"find({cards: {$elemMatch: {name:"Deceiver of Form"}}})"
"find({"cards.name":"Deceiver of Form"})"
When I run either of the commands above it just returns the entire document to me.
You could use the positional projection $ operator to limit the contents of an array. For example, if you have a single document like below:
{
"block": "Battle for Zendikar",
"booster": "...",
"translations": "...",
"cards": [
{
"name": "Deceiver of Form",
"power": "8"
},
{
"name": "Eldrazi Mimic",
"power": "2"
},
{
"name": "Kozilek, the Great Distortion",
"power": "12"
}
]
}
You can query for a card name matching "Deceiver of Form", and limit fields to return only the matching array card element(s) using:
> db.collection.find({"cards.name":"Deceiver of Form"}, {"cards.$":1})
{
"_id": ObjectId("..."),
"cards": [
{
"name": "Deceiver of Form",
"power": "8"
}
]
}
Having said the above, I think you should re-consider your data model. MongoDB is a document-oriented database. A record in MongoDB is a document, so having a single record in a database does not bring out the potential of the database i.e. similar to storing all data in a single row in a table.
You should try storing the 'cards' into a collection instead. Where each document is a single card, (depending on your use case) you could add a reference to another collection containing the deck information. i.e: block, type, releaseDate, etc. For example:
// a document in cards collection:
{
"name": "Deceiver of Form",
"power": "8",
"deck_id": 1
}
// a document in decks collection:
{
"deck_id": 1,
"releaseDate": "2016-01-22",
"type": "expansion"
}
For different types of data model designs and examples, please see Data Model Design.

Displaying empty parameters in JSON

I'm building my first API which outputs in JSON, and was wondering: If one of the parameters is empty, is it best to still include that parameter name with an empty value, or not include it at all? For example, if a certain product has batteries it would normally output
"batteries": [
{
"device": "Vehicle",
"number": "4",
"type": "AA",
"included": "Not Included"
},
{
"device": "Remote",
"number": "2",
"type": "AAA",
"included": "Not Included"
}
],
If there are no remote batteries, should I just not include that second section? What if there aren't batteries at all, should I remove the whole battery node?
From the perspective of the json interpreter it won't matter. You should send the JSON however you want the consumer to reconstruct your objects...
Do you want the consumer to have a "Remote" object indicating there are no batteries?
Your example doesn't look like an empty node to me, it looks like meaningful data!
For actually empty nodes it may only matter if you need to keep the serialized object as small as possible (for whatever reason) or if you need to have something else besides JSON look at the serialized object later.
In my personal opinion from an API I like to see all meaningful nodes populated because it gives me an idea of the possibilities of the API.... "Oh, I see, so some of them have remotes and include batteries and this API can tell me that!"
In Javascript, you can treat an absent property in almost the same way you would trean a property set to null:
> a_unset = {}
> a_null = {a: null}
> a_null.a == a_unset.a
true
> a_null.a ? 1 : 0
0
> a_unset.a ? 1 : 0
0
Therefore in JSON, which is based on Javascript and most often consumed by Javascript code, it is customary to omit empty values.
But this is not a hard rule. JSON does provide the null value, so if you think your client code or target users would need to know that a property is there but unset, null might be a good choice. Otherwise just omit it, you will save space.

Get file ID of a given path

is there a direct method to get file ID by giving a path (e.g. /some/folder/deep/inside/file.txt)? I know this can be done by recursively checking folder's contents, but a simple call would be much better.
Thanks
We currently don't have support for this, but the feedback will definitely be considered as we continue building out the v2 API.
An alternative to this would be to extract the target file/folder name from the path and search for it using the search API
like this: https://api.box.com/2.0/search?query=filename.txt
This gives back all the matching entries with their path_collections which provides the whole hierarchy for every entry. Something like this:
"path_collection": {
"total_count": 2,
"entries": [
{
"type": "folder",
"id": "0",
"sequence_id": null,
"etag": null,
"name": "All Files"
},
{
"type": "folder",
"id": "2988397987",
"sequence_id": "0",
"etag": "0",
"name": "dummy"
}
]
}
Path for this entry can be reverse engineered as /dummy/filename.txt
Just compare this path against the path you're looking for. If it matches, then that's the search result you're looking for. This is just to reduce the number of ReST calls you need to make to arrive at the result. Hope it makes sense.
Here is my approach on how to get a folder id based on a path, without recursively going through the whole tree, this can be easily adapted for file as well. This is based on PHP and CURL, but it's very easy to use it in any other application as well:
//WE SET THE SEARCH FOLDER:
$search_folder="XXXX/YYYYY/ZZZZZ/MMMMM/AAAAA/BBBBB";
//WE NEED THE LAST BIT SO WE CAN DO A SEARCH FOR IT
$folder_structure=array_reverse (explode("/",$search_folder));
// We run a CURL (I'm assuming all the authentication and all other CURL parameters are already set!) to search for the last bit, if you want to search for a file rather than a folder, amend the search query accordingly
curl_setopt($curl, CURLOPT_URL, "https://api.box.com/2.0/search?query=".urlencode($folder_structure[0])."&type=folder");
// Let's make a cine array out of that response
$json=json_decode(curl_exec($curl),true);
$i=0;
$notthis=true;
// We need to loop trough the result, till either we find a matching element, either we are at the end of the array
while ($notthis && $i<count($json['entries'])) {
$result_info=$json['entries'][$i];
//The path of each search result is kept in a multidimensional array, so we just rebuild that array, ignoring the first element (that is Always the ROOT)
if ($search_folder == implode("/",array_slice(array_column($result_info['path_collection']['entries'],'name'),1))."/".$folder_structure[0])
{
$notthis=false;
$folder_id=$result_info['id'];
}
else
{
$i++;
}
}
if ($notthis) {echo "Path not found....";} else {echo "Folder id: $folder_id";}