getDegree()/isOutgoing() funcitons don't work in graphAware/neo4j-to-elasticsearch mapping.json - json

Neo4j Version: 3.2.2
Operating System: Ubuntu 16.04
I use getDegree() function in mapping.json file, but the return would always be null; I'm using the dataset neo4j tutorial Movie/Actor dataset.
Output from elasticsearch request
mapping.json
{
"defaults": {
"key_property": "uuid",
"nodes_index": "default-index-node",
"relationships_index": "default-index-relationship",
"include_remaining_properties": true
},
"node_mappings": [
{
"condition": "hasLabel('Person')",
"type": "getLabels()",
"properties": {
"getDegree": "getDegree()",
"getDegree(type)": "getDegree('ACTED_IN')",
"getDegree(direction)": "getGegree('OUTGOING')",
"getDegree('type', 'direction')": "getDegree('ACTED_IN', 'OUTGOING')",
"getDegree-degree": "degree"
}
}
],
"relationship_mappings": [
{
"condition": "allRelationships()",
"type": "type",
}
]
}
Also if I use isOutgoing(), isIncoming(), otherNode function in relationship_mappings properties part, elasticsearch would never load the relationship data from neo4j. I think I probably have some misunderstanding of this sentence only when one of the participating nodes "looking" at the relationship is provided on this page https://github.com/graphaware/neo4j-framework/tree/master/common#inclusion-policies
mapping.json
{
"defaults": {
"key_property": "uuid",
"nodes_index": "default-index-node",
"relationships_index": "default-index-relationship",
"include_remaining_properties": true
},
"node_mappings": [
{
"condition": "allNodes()",
"type": "getLabels()"
}
],
"relationship_mappings": [
{
"condition": "allRelationships()",
"type": "type",
"properties": {
"isOutgoing": "isOutgoing()",
"isIncomming": "isIncomming()",
"otherNode": "otherNode"
}
}
]
}
BTW, is there any page that list all of the functions that we can use in mapping.json? I know two of them
github.com/graphaware/neo4j-framework/tree/master/common#inclusion-policies
github.com/graphaware/neo4j-to-elasticsearch/blob/master/docs/json-mapper.md
but it seems there are more, since I can use getType(), which hasn't been listed in any of the above pages.
Please let me know if I can provide any further help to solve the problem
Thanks!

The getDegree() function is not available to use, in contrary to getType(). I will explain why :
When the mapper (the part responsible to create a node or relationship representation as ES document ) is doing its job, it receive a DetachedGraphObject being a detached node or relationship.
The meaning of detached is that it is happening outside of a transaction and thus query operations are not available against the database anymore. The getType() is available because it is part of the relationship metadata and it is cheap, however if we would want to do the same for getDegree() this can be seriously more costly during the DetachedObject creation (which happen in a tx) depending on the number of different types etc.
This is however something we are working on, by externalising the mapper in a standalone java application coupled with a broker like kafa, rabbit,.. between neo and this app. We would not, however offer the possibilty to requery the graph in the current version of the module as it can have serious performance impacts if the user is not very careful.
As last, the only suggestion I can give you is to keep a property on your node with the updates of degrees you need to replicate to ES.
UPDATE
Regarding this part of the documentation :
For Relationships only when one of the participating nodes "looking" at the relationship is provided:
This is used only when not using the json definition, so you can use one or the other. the json definition has been added later as addition and both cannot be used together.
For answering this part, it means that the nodes of the incoming or outgoing side, depending on the definition, should be included in the inclusion policy for nodes, like hasLabel('Employee') || hasProperty('form') || getProperty('age', 0) > 20 . If you have an allNodes policy then it is fine.

Related

ADF - Data Flow- Json Expression for Property name

I have a requirement to convert the json into csv(or a SQL table) or any other flatten structure using Data Flow in Azure Data Factory. I need to take the property names at some hierarchy and values of the child properties at lower of hierrarchy from the source json and add them both as column/row values in csv or any other flatten structure.
Source Data Rules/Constraints :
Parent level data property names will change dynamically (e.g. ABCDataPoints,CementUse, CoalUse, ABCUseIndicators names are dynamic)
The hierarchy always remains same as in below sample json.
I need some help in defining Json path/expression to get the names ABCDataPoints,CementUse, CoalUse, ABCUseIndicators etc. I am able to figure out how to retrieve the values for the properties Value,ValueDate,ValueScore,AsReported.
Source Data Structure :
{
"ABCDataPoints": {
"CementUse": {
"Value": null,
"ValueDate": null,
"ValueScore": null,
"AsReported": [],
"Sources": []
},
"CoalUse": {
"Value": null,
"ValueDate": null,
"AsReported": [],
"Sources": []
}
},
"ABCUseIndicators": {
"EnvironmentalControversies": {
"Value": false,
"ValueDate": "2021-03-06T23:22:49.870Z"
},
"RenewableEnergyUseRatio": {
"Value": null,
"ValueDate": null,
"ValueScore": null
}
},
"XYZDataPoints": {
"AccountingControversiesCount": {
"Value": null,
"ValueDate": null,
"AsReported": [],
"Sources": []
},
"AdvanceNotices": {
"Value": null,
"ValueDate": null,
"Sources": []
}
},
"XYXIndicators": {
"AccountingControversies": {
"Value": false,
"ValueDate": "2021-03-06T23:22:49.870Z"
},
"AntiTakeoverDevicesAboveTwo": {
"Value": 4,
"ValueDate": "2021-03-06T23:22:49.870Z",
"ValueScore": "0.8351945854483925"
}
}
}
Expected Flatten structure
Background:
After having multiple calls with ADF experts at Microsoft(Our workplace have Microsoft/Azure partnership), they concluded this is not possible with out of the box activities provided by ADF as is, neither by Dataflow(need not to use data flow though) nor Flatten feature. Reasons are Dataflow/Flatten only unroll the Array objects and there are no mapping functions available to pick the property names - Custom expression are in internal beta testing and will in PA in near future.
Conclusion/Solution:
We concluded with an agreement based on calls with Microsoft emps ended up to go multiple approaches but both needs the custom code - with out custom code this is not possible by using out of box activities.
Solution-1 : Use some code to flatten as per requirement using a ADF Custom Activity. The downside of this you need to use an external compute(VM/Batch), the options supported are not on-demand. So it is little bit expensive but works best if have continuous stream workloads. This approach also continuously monitor if input sources are of different sizes because the compute needs to be elastic in this case or else you will get out of memory exceptions.
Solution-2 : Still needs to write the custom code - but in a function app.
Create a Copy Activity with source as the files with Json content(preferably storage account).
Use target as Rest Endpoint of function(Not as a function activity because it has 90sec timeout when called from an ADF activity)
The function app will takes Json lines as input and parse and flatten.
If you use the above way so you can scale the number of lines cane be send in each request to function and also scale the parallel requests.
The function will do the flatten as required to one file or multiple files and store in blob storage.
The pipeline will continue from there as needed from there.
One problem with this approach is if any of the range is failed the copy activity will retry but it will run the whole process again.
Trying something very similar, is there any other / native solution to address this?
As mentioned in the response above, has this been GA yet? If yes, any reference documentation / samples would be of great help!
Custom expression are in internal beta testing and will in PA in near future.

Room object in Revit files

I followed the instruction in the link below to extract Room objects from Revit models:
https://forge.autodesk.com/blog/new-rvt-svf-model-derivative-parameter-generates-additional-content-including-rooms-and-spaces
I made the changes as instructed and tested the sample Revit file (rac_basic_sample_project.rvt). But, still I don't see the rooms or the viewables (phases). Below is fhe request I post. Am I missing anything?
{
"input": {
"urn": "dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6YzQ4ZDUxNDNhMDRiNDAxNmI3ODYxY2NlMzQ2ZDkyNjdfZmFjaWxpdHlfOTUvZWIyYzMzNDgtNDAxYS00ZjQ3LTgwM2EtMjM1OGYwYmI0YjY2LnJ2dA"
},
"output": {
"destination": {
"region": "us"
},
"formats": [
{
"type": "svf",
"views": [
"3d"
],
"advanced": {
"generateMasterViews": true
}
}
]
}
}
I just tested the feature and I can see the room data:
The JSON payload seems ok, so try checking the following things:
Make sure you use the x-ads-force header (explained in the blog post you linked to); if you had already processed your Revit model before, triggering a new Model Derivative job would not do anything unless you force the translation
Try using another design (and from a newer version of Revit if possible); in my screenshot I'm using one of the official samples for Revit 2020, although I remember being able to get the room data from older samples as well
The room data is only available in certain "viewables" so make sure you're looking at the right one; for my sample project, for example, the room data is not available in the "{3D}" viewable but it is available in the "Working Drawings" viewable

Updating JSON arrays in MarkLogic 9

I'm having trouble working out how to write a bit of XQuery. I have a JSON structure in MarkLogic that looks like:
{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"really-basic-stuff",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}
The exclude array contains the identifiers for slides in decks (presentations are made up of one or more decks of slides). I'm trying to write a piece of code that will look for a slide id in that exclude list and remove it if present or add it if not (a toggle).
I can obtain the array node itself using:
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
but I can't for the life of me see how I then update that array to either remove an item or add it. The intention is call a function something like:
local:toggle-slide($presentation) as object-node()
{
(: xdmp:node-update(...) goes here :)
};
So, how do I update that array?
In memory JSON node trees (and XML trees, for that matter) are immutable.
The way to modify a tree is to construct a new tree, copying the nodes that haven't changed and creating the parent node and ancestor node with the changes.
That said, there's an easier way to modify JSON. If you call xdmp:from-json() on the root node, you will get a mutable in-memory map / array structure.
You can then navigate to the array using map:get() on the maps and [ITEM_NUMBER] on the arrays and delete or insert items FOR the appropriate json:array object.
When you're done, call xdmp:to-json() to turn the root map back into a node.
Hoping that helps,
If you need to update the json in the database, you can use xdmp:node-replace. The catch with node-replace is though, that you have to feed it with a named node. To do that, you need to wrap the array-node in an object-node, and then grab the array-node inside the object-node on the fly. Here a working example:
xquery version "1.0-ml";
(: insert test data :)
xdmp:document-insert("/presentations/presentation.json", xdmp:unquote('{
"id": "pres003A10",
"title": "A Course About Something",
"description": "This course teaches people about some things they may not know.",
"author": "A.N. Author",
"updated": "2007-01-19",
"decks": [
{
"id":"markup-intro",
"exclude": ["slide3", "slide12"]
},
{
"id":"cleverer-stuff",
"exclude": []
}
]
}'
))
;
(: node-replace array-node :)
let $exclude := doc('/presentations/presentation.json')/object-node()/decks[id = 'markup-intro']/array-node('exclude')
return xdmp:node-replace($exclude, object-node{
"exclude": array-node{ "other", "slides" }
}/node())
;
(: view if changed :)
doc('/presentations/presentation.json')
Note: consider looking at MarkLogic's Server-side JavaScript (SJS) support. Updating JSON might seem more natural that way, particularly if you need to make multiple changes in one go.
HTH!

Query sync gateway buckets using N1QL

I wanted to know if it's possible to query the sync gateway buckets using N1QL? Does it behave as a normal couchbase bucket or because of the metadata that sync gateway adds, is it possible to query it only through Rest APIs?
Currently I have a webhooks handler, which keeps a replica of the documents residing under sync gateway buckets. I need to do some aggrgations which need to be pushed back to clients. So, can I do all this heavy lifting directly trhough n1ql on sync gateway or using webhooks which does the aggregations and simply pushes the updated docs to sync gateway is the right option?
PS: The webhooks+Rest APIS option works perfectly for me currently. Just wanted to understand if this hop is necessary or not?
Yes, it is possible to query the sync gateway using N1QL - you just can't change it (update/delete/insert), as it would break the revisions' metadata.
You need to ignore the documents with IDs starting with _sync: and the _sync property of each document, which contains internal metadata. The remaining attributes are your usual document.
Example:
select db.* from db where meta().id not like '_sync:%'
Result:
[
{
"_sync": {
"history": {
"channels": [
null,
null
],
"parents": [
-1,
0
],
"revs": [
"1-b7a15ec4afbb8c4d95e2e897d0ec0a2e",
"2-919b17d3f418100df7298a12ef2a84bb"
]
},
"recent_sequences": [
6,
7
],
"rev": "2-919b17d3f418100df7298a12ef2a84bb",
"sequence": 7,
"time_saved": "2016-05-04T18:54:26.952202911Z"
},
"name": "Document with two revisions"
}
]
Ignoring the _sync attribute:
select name from db where meta().id not like '_sync:%'
Result:
[
{
"name": "Document with two revisions"
}
]
In Couchbase 4.5 (BETA as of today) we can use the object_remove function - although I'd avoid it in favor of the previous more explicit syntax.
select object_remove(db, '_sync') from db where meta().id not like '_sync:%'
Result:
[
{
"$1": {
"name": "Document with two revisions"
}
}
]
I don't know what's your setup currently, but AFAIK, it's perfectly fine to keep querying the bucket throught N1QL while using the REST API for the data changes.

Play application.conf : Where to consult the list of all possible variable?

Where can i find the list of all possible variable that is possible to set in play application.conf ?
I can't find this information on playframework website.
Thank you
If you use IDE such as eclipse or IntelliJ, you can inspect Play.application().configuration() at runtime while debugging and it will contain all possble configuration key/value pairs. It briefly looks as follows:
{
"akka":{ },
"application":{ },
"applyEvolutions":{ },
"awt":{ },
"db":{ },
"dbplugin":"disabled",
"evolutionplugin":"enabled",
"file":{ },
"java":{ },
"jline":{ },
"line":{ },
"logger":{ },
"os":{ },
"path":{ },
"play":{ },
"promise":{ },
"report":{ },
"sbt":{ },
"sun":{ },
"user":{ }
}
There is no such list of all possible variables, since the application.conf is arbitrarily extensible by all sorts of tools and components, most of them third party, and can contain any config the user wants.
For example: the configuration detailing Play's thread pools is really just Akka configuration.
The key things (DB config, languages, evolutions) are in the template, either with default values or commented out, when you initialise a new Play application.
The config page on the site discusses some additional configuration you might need, but this mostly relates to concerns external to the application, like launching and logging.