Get last element of array by parsing JSON with Neo4j APOC - json

Short task description: I need to get the last element of an array/list of one of the fields in nested JSON, here the input JSON file:
{
"origin": [{
"label": "Alcohol drinks",
"tag": [],
"type": "string",
"xpath": []
},
{
"label": "Wine",
"tag": ["red", "white"],
"type": "string",
"xpath": ["Alcohol drinks"]
},
{
"label": "Port wine",
"tag": ["Portugal", "sweet", "strong"],
"type": "string",
"xpath": ["Alcohol drinks", "Wine"]
},
{
"label": "Sandeman Cask 33",
"tag": ["red", "expensive"],
"type": "string",
"xpath": ["Alcohol drinks", "Wine", "Port wine"]
}
]
}
I need to get the last element of "xpath" field, in order to create relationship with appropriate "label". Here is the code, which creates connection to all elements mentioned in "xpath", I need just connection to the last one:
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
FOREACH(xpathName IN or.xpath | MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))
Probably there is something like:
apoc.agg.last(or.xpath)
which returns just an array of arrays or all "xpath" from all 4 records of "origin".
I will appreciate any help, probably there some workarounds (not necessary as I proposed) to solve this issue. Thank you in advance!
N.B. All this should be done from an app, not from within Neo4j browser.

Probably the easiest way would be to split this query into two queries if you want to only take the xpath array of the last element in the origin object.
Query: 1
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
Query 2:
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
WITH value.origin[-1] as or
MATCH(label:concept{name:or.label})
FOREACH(xpathName IN or.xpath | MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))
Combining these two queries into a single one feels hacky anyway and I would avoid it, but I guess you can do the following.
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
MERGE(label:concept{name:or.label})
ON CREATE SET label.type = or.type
FOREACH(tagName IN or.tag | MERGE(tag:concept{name:tagName})
MERGE (tag)-[r:link]-(label)
ON CREATE SET r.Weight=1
ON MATCH SET r.Weight=r.Weight+1)
// Any aggregation function will break the UNWIND loop
// and return a single row as we want to write it only once
WITH value.origin[-1] as last, count(*) as agg
FOREACH(xpathName IN last.xpath |
MERGE(label:concept{name:last.label})
MERGE (xpath:concept{name:xpathName})
MERGE (label)-[r:link]-(xpath))

Sounds like you're looking for the last() function? This will return the last element of a list.
In this case, since you UNWIND the origin to 4 rows, you'll get the last element of the list for each of those rows.
WITH "file:///D:/project/neo_proj/input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.origin as or
RETURN last(or.xpath) as last

Related

Update JSON Array in Postgres with specific key

I have a complex array which look like following in a table column:
{
"sometag": {},
"where": [
{
"id": "Krishna",
"nick": "KK",
"values": [
"0"
],
"function": "ADD",
"numValue": [
"0"
]
},
{
"id": "Krishna1",
"nick": "KK1",
"values": [
"0"
],
"function": "SUB",
"numValue": [
"0"
]
}
],
"anotherTag": [],
"TagTag": {
"tt": "tttttt",
"tt1": "tttttt"
}
In this array, I want to update the function and numValue of id: "Krishna".
Kindly help.
This is really nasty because
Updating an element inside a JSON array always requires to expand the array
On-top: The array is nested
The identfier for the elements to update is a sibling not a parent, which means, you have to filter by a sibling
So I came up with a solution, but I want to disclaim: You should avoid doing this as regular database action! Better would be:
Parsing your JSON in the backend and do the operations in your backend code
Normalize the JSON in your database if that would be a common task, meaning: Create tables with appropriate columns and extract your JSON into the table structure. Do not store entire JSON objects in the database! That would make every single task much more easier and incredible more performant!
demo:db<>fiddle
SELECT
jsonb_set( -- 5
(SELECT mydata::jsonb FROM mytable),
'{where}',
updated_array
)::json
FROM (
SELECT
jsonb_agg( -- 4
CASE WHEN array_elem ->> 'id' = 'Krishna' THEN
jsonb_set( -- 3
jsonb_set(array_elem.value::jsonb, '{function}', '"ADDITION"'::jsonb), -- 2
'{numValue}',
'["0","1"]'::jsonb
)
ELSE array_elem::jsonb END
) as updated_array
FROM mytable,
json_array_elements(mydata -> 'where') array_elem -- 1
) s
Extract the nested array elements into one element per row
Replace function value. Note the casts from type json to type jsonb. That is necessary because there's no json_set() function but only jsonb_set(). Naturally, if you just have type jsonb, the casts are not necessary.
Replace numValue value
Reaggregate the array
Replace the where value of the original JSON object with the newly created array object.

How can Postgres extract parts of json, including arrays, into another JSON field?

I'm trying to convince PostgreSQL 13 to pull out parts of a JSON field into another field, including a subset of properties within an array based on a discriminator (type) property. For example, given a data field containing:
{
"id": 1,
"type": "a",
"items": [
{ "size": "small", "color": "green" },
{ "size": "large", "color": "white" }
]
}
I'm trying to generate new_data like this:
{
"items": [
{ "size": "small" },
{ "size": "large"}
]
}
items can contain any number of entries. I've tried variations of SQL something like:
UPDATE my_table
SET new_data = (
CASE data->>'type'
WHEN 'a' THEN
json_build_object(
'items', json_agg(json_array_elements(data->'items') - 'color')
)
ELSE
null
END
);
but I can't seem to get it working. In this case, I get:
ERROR: set-returning functions are not allowed in UPDATE
LINE 6: 'items', json_agg(json_array_elements(data->'items')...
I can get a set of items using json_array_elements(data->'items') and thought I could roll this up into a JSON array using json_agg and remove unwanted keys using the - operator. But now I'm not sure if what I'm trying to do is possible. I'm guessing it's a case of PEBCAK. I've got about a dozen different types each with slightly different rules for how new_data should look, which is why I'm trying to fit the value for new_data into a type-based CASE statement.
Any tips, hints, or suggestions would be greatly appreciated.
One way is to handle the set json_array_elements() returns in a subquery.
UPDATE my_table
SET new_data = CASE
WHEN data->>'type' = 'a' THEN
(SELECT json_build_object('items',
json_agg(jae.item::jsonb - 'color'))
FROM json_array_elements(data->'items') jae(item))
END;
db<>fiddle
Also note that - isn't defined for json only for jsonb. So unless your columns are actually jsonb you need a cast. And you don't need an explicit ... ELSE NULL ... in a CASE expression, NULL is already the default value if no other value is specified in an ELSE branch.

Creating nodes and relations from JSON (dynamically)

I've got a couple hundred JSONs in a structure like the following example:
{
"JsonExport": [
{
"entities": [
{
"identity": "ENTITY_001",
"surname": "SMIT",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
},
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_002"
}
],
"entityEntityRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "FRIENDS_WITH",
"childIdentification": "ENTITY_002"
}
]
},
{
"identity": "ENTITY_002",
"surname": "JACKSON",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_002",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
}
]
},
{
"identity": "ENTITY_003",
"surname": "JOHNSON"
}
],
"identification": "REGISTRATION_001",
"locations": [
{
"city": "LONDON",
"identity": "LOCATION_001"
},
{
"city": "PARIS",
"identity": "LOCATION_002"
}
]
}
]
}
With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})
All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:
MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)
However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.
Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.
Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.
This may work for you:
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
MERGE (e:Entities {identity: ent.identity})
ON CREATE SET e.name = ent.surname
MERGE (e)-[:REGISTERED_ON]->(r)
FOREACH(loc1 IN ent.entityLocationRelation |
MERGE (l1:Locations {identity: loc1.locationIdentity})
MERGE (e)-[:SEEN_AT]->(l1))
FOREACH(ent2 IN ent.entityEntityRelation |
MERGE (e2:Entities {identity: ent2.childIdentification})
MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
MERGE (l:Locations{identity:loc.identity})
ON CREATE SET l.name = loc.city
MERGE (l)-[:REGISTERED_ON]->(r)
)
For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.
So playing with neo4j/cyper I've learned some new stuff and came to another solution for the problem. Based on the given example data, the following can create the nodes and edges dynamically.
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
CALL apoc.merge.node(['Registration'], {id:data.identification}, {},{}) YIELD node AS vReg
UNWIND data.entities AS ent
CALL apoc.merge.node(['Person'], {id:ent.identity}, {}, {id:ent.identity, surname:ent.surname}) YIELD node AS vPer1
UNWIND ent.entityEntityRelation AS entRel
CALL apoc.merge.node(['Person'],{id:entRel.childIdentification},{id:entRel.childIdentification},{}) YIELD node AS vPer2
CALL apoc.merge.relationship(vPer1, entRel.typeRelation, {},{},vPer2) YIELD rel AS ePer
UNWIND data.locations AS loc
CALL apoc.merge.node(['Location'], {id:loc.identity}, {name:loc.city}) YIELD node AS vLoc
UNWIND ent.entityLocationRelation AS locRel
CALL apoc.merge.relationship(vPer1, locRel.typeRelation, {},{},vLoc) YIELD rel AS eLoc
CALL apoc.merge.relationship(vLoc, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg1
CALL apoc.merge.relationship(vPer1, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg2
CALL apoc.merge.relationship(vPer2, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg3
RETURN vPer1,vPer2, vReg, vLoc, eLoc, eReg1, eReg2, eReg3

Importing json in neo4J

[PROBLEM - My final solution below]
I'd like to import a json file containing my data into Neo4J.
However, it is super slow.
The Json file is structured as follow
{
"graph": {
"nodes": [
{ "id": 3510982, "labels": ["XXX"], "properties": { ... } },
{ "id": 3510983, "labels": ["XYY"], "properties": { ... } },
{ "id": 3510984, "labels": ["XZZ"], "properties": { ... } },
...
],
"relationships": [
{ "type": "bla", "startNode": 3510983, "endNode": 3510982, "properties": {} },
{ "type": "bla", "startNode": 3510984, "endNode": 3510982, "properties": {} },
....
]
}
}
Is is similar to the one proposed here: How can I restore data from a previous result in the browser?.
By looking at the answer.
I discovered that I can use
CALL apoc.load.json("file:///test.json") YIELD value AS row
WITH row, row.graph.nodes AS nodes
UNWIND nodes AS node
CALL apoc.create.node(node.labels, node.properties) YIELD node AS n
SET n.id = node.id
and then
CALL apoc.load.json("file:///test.json") YIELD value AS row
with row
UNWIND row.graph.relationships AS rel
MATCH (a) WHERE a.id = rel.endNode
MATCH (b) WHERE b.id = rel.startNode
CALL apoc.create.relationship(a, rel.type, rel.properties, b) YIELD rel AS r
return *
(I have to do it in two times because else their are relation duplication due to the two unwind).
But this is super slow because I have a lot of entities and I suspect the program to search over all of them for each relation.
At the same time, I know "startNode": 3510983 refers to a node.
So the question: does it exists anyway to speed up to import process using ids as index, or something else?
Note that my nodes have differents types. So I did not find a way to create an index for all of them, and I suppose that would be too huge (memory)
[MY SOLUTION]
CALL apoc.load.json('file:///test.json') YIELD value
WITH value.graph.nodes AS nodes, value.graph.relationships AS rels
UNWIND nodes AS n
CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
WITH rels, COLLECT({id: n.id, node: node, labels:labels(node)}) AS nMap
UNWIND rels AS r
MATCH (w{id:r.startNode})
MATCH (y{id:r.endNode})
CALL apoc.create.relationship(w, r.type, r.properties, y) YIELD rel
RETURN rel
[EDITED]
This approach may work more efficiently:
CALL apoc.load.json("file:///test.json") YIELD value
WITH value.graph.nodes AS nodes, value.graph.relationships AS rels
UNWIND nodes AS n
CALL apoc.create.node(n.labels, apoc.map.setKey(n.properties, 'id', n.id)) YIELD node
WITH rels, apoc.map.mergeList(COLLECT({id: n.id, node: node})) AS nMap
UNWIND rels AS r
CALL apoc.create.relationship(nMap[r.startNode], r.type, r.properties, nMap[r.endNode]) YIELD rel
RETURN rel
This query does not use MATCH at all (and does not need indexing), since it just relies on an in-memory mapping from the imported node ids to the created nodes. However, this query could run out of memory if there are a lot of imported nodes.
It also avoids invoking SET by using apoc.map.setKey to add the id property to n.properties.
The 2 UNWINDs do not cause a cartesian product, since this query uses the aggregating function COLLECT (before the second UNWIND) to condense all the preceding rows into one (because the grouping key, rels, is a singleton).
Have you tried indexing the nodes before the LOAD JSON? This may not be tenable since you have multiple node labels. But if they are limited you can create placeholder node, create and index and then delete the placeholder. After this, run the LOAD Json
Create (n:YourLabel{indx:'xxx'})
create index on: YourLabel(indx)
match (n:YourLabel) delete n
The index will speed the matching or merging

how to extract properly when sqlite json has value as an array

I have a sqlite database and in one of the fields I have stored complete json object . I have to make some json select requests . If you see my json
the ALL key has value which is an array . We need to extract some data like all comments where "pod" field is fb . How to extract properly when sqlite json has value as an array ?
select json_extract(data,'$."json"') from datatable ; gives me entire thing . Then I do
select json_extract(data,'$."json"[0]') but i dont want to do it manually . i want to iterate .
kindly suggest some source where i can study and work on it .
MY JSON
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
}]
}
]
}
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('<abovejson in a single line>'));
Simple List
Where your JSON represents a "simple" list of comments, you want something like:
select key, value
from datatable, json_each( datatable.data, '$.ALL' )
where json_extract( value, '$.pod' ) = 'fb' ;
which, using your sample data, returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
The use of json_each() returns a row for every element of the input JSON (datatable.data), starting at the path $.ALL (where $ is the top-level, and ALL is the name of your array: the path can be omitted if the top-level of the JSON object is required). In your case, this returns one row for each comment entry.
The fields of this row are documented at 4.13. The json_each() and json_tree() table-valued functions in the SQLite documentation: the two we're interested in are key (very roughly, the "row number") and value (the JSON for the current element). The latter will contain elements called comment and pod, etc..
Because we are only interested in elements where pod is equal to fb, we add a where clause, using json_extract() to get at pod (where $.pod is relative to value returned by the json_each function).
Nested List
If your JSON contains nested elements (something I didn't notice at first), then you need to use the json_tree() function instead of json_each(). Whereas the latter will only iterate over the immediate children of the node specified, json_tree() will descend recursively through all children from the node specified.
To give us some data to work with, I have augmented your test data with an extra element:
create table datatable ( path string , data json1 );
insert into datatable values("1" , json('
{
"ALL": [{
"comments": "your site is awesome",
"pod": "passcode",
"originalDirectory": "case1"
},
{
"comments": "your channel is good",
"data": ["youTube"],
"pod": "library"
},
{
"comments": "you like everything",
"data": ["facebook"],
"pod": "fb"
},
{
"data": ["twitter"],
"pod": "tw",
"ALL": [{
"data": [{
"codeLevel": "3"
}],
"pod": "mo",
"pod2": "p"
},
{
"comments": "inserted by TripeHound",
"data": ["facebook"],
"pod": "fb"
}]
}
]
}
'));
If we were to simply switch to using json_each(), then we see that a simple query (with no where clause) will return all elements of the source JSON:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' ) limit 10 ;
ALL|[{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"},{"comments":"your channel is good","data":["youTube"],"pod":"library"},{"comments":"you like everything","data":["facebook"],"pod":"fb"},{"data":["twitter"],"pod":"tw","ALL":[{"data":[{"codeLevel":"3"}],"pod":"mo","pod2":"p"},{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}]}]
0|{"comments":"your site is awesome","pod":"passcode","originalDirectory":"case1"}
comments|your site is awesome
pod|passcode
originalDirectory|case1
1|{"comments":"your channel is good","data":["youTube"],"pod":"library"}
comments|your channel is good
data|["youTube"]
0|youTube
pod|library
Because JSON objects are mixed in with simple values, we can no longer simply add where json_extract( value, '$.pod' ) = 'fb' because this produces errors when value does not represent an object. The simplest way around this is to look at the type values returned by json_each()/json_tree(): these will be the string object if the row represents a JSON object (see above documentation for other values).
Adding this to the where clause (and relying on "short-circuit evaluation" to prevent json_extract() being called on non-object rows), we get:
select key, value
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
which returns:
2|{"comments":"you like everything","data":["facebook"],"pod":"fb"}
1|{"comments":"inserted by TripeHound","data":["facebook"],"pod":"fb"}
If desired, we could use json_extract() to break apart the returned objects:
.mode column
.headers on
.width 30 15 5
select json_extract( value, '$.comments' ) as Comments,
json_extract( value, '$.data' ) as Data,
json_extract( value, '$.pod' ) as POD
from datatable, json_tree( datatable.data, '$.ALL' )
where type = 'object'
and json_extract( value, '$.pod' ) = 'fb' ;
Comments Data POD
------------------------------ --------------- -----
you like everything ["facebook"] fb
inserted by TripeHound ["facebook"] fb
Note: If your structure contained other objects, of different formats, it may not be sufficient to simply select for type = 'object': you may have to devise a more subtle filtering process.