I have a set of data for a family tree in Neo4J and am trying to build a Cypher query that produces a JSON data set similar to the following:
{Name: "Bob",
parents: [
{Name: "Roger",
parents: [
Name: "Robert",
Name: "Jessica"
]},
{Name: "Susan",
parents: [
Name: "George",
Name: "Susan"
]}
]}
My graph has a relationship of PARENT between MEMBER nodes (i.e. MATCH (p.Member)-[:PARENT]->(c.Member) ). I found Nested has_many relationships in cypher and neo4j cypher nested collect which ends up grouping all parents together for the main child node I am searching for.
Adding some clarity based on feedback:
Every member has a unique identifier. The unions are currently all associated with the PARENT relationship. Everything is indexed so that performance will not suffer. When I run a query to just get back the node graph I get the results I expect. I'm trying to return an output that I can use for visualization purposes with D3. Ideally this will be done with a Cypher query as I'm using the API to access neo4j from the frontend being built.
Adding a sample query:
MATCH (p:Person)-[:PARENT*1..5]->(c:Person)
WHERE c.FirstName = 'Bob'
RETURN p.FirstName, c.FirstName
This query returns a list of each parent for five generations, but instead of showing the hierarchy, it's listing 'Bob' as the child for each relationship. Is there a Cypher query that would show each relationship in the data at least? I can format it as I need to from there...
Genealogical data might comply with the GEDCOM standard and include two types of nodes: Person and Union. The Person node has its identifier and the usual demographic facts. The Union nodes have a union_id and the facts about the union. In GEDCOM, Family is a third element bringing these two together. But in Neo4j, I found it suitable to also include the union_id in Person nodes. I used 5 relationships: father, mother, husband, wife and child. The family is then two parents with an inward vector and each child with an outward vector. The image illustrates this. This is very handy for visualizing connections and generating hypotheses. For example, consider the attached picture and my ancestor Edward G Campbell, the product of union 1917 where three brothers married three Vaught sisters from union 8944 and two married Gaither sisters from union 2945. Also, in the upper left, how Mahala Campbell married her step-brother John Greer Armstrong. Next to Mahala is an Elizabeth Campbell who is connected by marriage to other Campbell, but is likely directly related to them. Similarly, you can hypothesize about Rachael Jacobs in the upper right and how she might relate to the other Jacobs.
I use bulk inserts which can populate ~30000 Person nodes and ~100,000 relationships in just over a minute. I have a small .NET function that returns the JSon from a dataview; this generic solution works with any dataview so it is scalable. I'm now working on adding other data, such as locations (lat/long), documentation (particularly that linking folks, such as a census), etc.
You might also have a look at Rik van Bruggens Blog on his family data:
Regarding your query
You already create a path pattern here: (p:Person)-[:PARENT*1..5]->(c:Person) you can assign it to a variable tree and then operate on that variable, e.g. returning the tree, or nodes(tree) or rels(tree) or operate on that collection in other ways:
MATCH tree = (p:Person)-[:PARENT*1..5]->(c:Person)
WHERE c.FirstName = 'Bob'
RETURN nodes(tree), rels(tree), tree, length(tree),
[n in nodes(tree) | n.FirstName] as names
See also the cypher reference card: http://neo4j.com/docs/stable/cypher-refcard and the online training http://neo4j.com/online-training to learn more about Cypher.
Don't forget to
create index on :Person(FirstName);
I'd suggest building a method to flatten out your data into an array. If they objects don't have UUIDs you would probably want to give them IDs as you flatten and then have a parent_id key for each record.
You can then run it as a set of cypher queries (either making multiple requests to the query REST API, or using the batch REST API) or alternatively dump the data to CSV and use cypher's LOAD CSV command to load the objects.
An example cypher command with params would be:
CREATE (:Member {uuid: {uuid}, name: {name}}
And then running through the list again with the parent and child IDs:
MATCH (m1:Member {uuid: {uuid1}}), (m2:Member {uuid: {uuid2}})
CREATE m1<-[:PARENT]-m2
Make sure to have an index on the ID for members!
The only way I have found thus far to get the data I am looking for is to actually return the relationship information, like so:
MATCH ft = (person {firstName: 'Bob'})<-[:PARENT]-(p:Person)
RETURN EXTRACT(n in nodes(ft) | {firstName: n.firstName}) as parentage
ORDER BY length(ft);
Which will return a dataset I am then able to morph:
["Bob", "Roger"]
["Bob", "Susan"]
["Bob", "Roger", "Robert"]
["Bob", "Susan", "George"]
["Bob", "Roger", "Jessica"]
["Bob", "Susan", "Susan"]
Related
I am using Firebase and Xamarin Forms to deploy an app. I am trying to figure it out how to get an object (or several) matching one criteria. Let's say I have a collection of characters and each of them has different attributes like name, age, city and the last attribute is an array of string saying what kind of tools they have.
For example, having this three characters in the collection:
{ 'characters':
{
'char001': {
'name': 'John',
"tools":[ "knife", "MagicX", "laser", "fire" ]
},
'char002': {
'name': 'Albert',
"tools":[ "MagicX" ]
},
'char003': {
'name': 'Chris',
"tools":[ "pistol", "knife", "magicX" ]
}
}
}
I want to retrieve the character(s) who has a knife and magicX, so the query will give me as a result: char001, and char003.
That said, I have a large set of data, like +10.000 characters in the collection plus each character can have up to 10 items in tools.
I can retrieve the objects if the attribute tools where just one string, but having tools as an array I have to iterate throw all the items of each character and see how many of them has a knife and then the same procedure looking for the one with magicX, and the do the union of the two queries which is going to give me the result. This, in terms of speed, it's so slow.
I would like to do it on the back-end side directly, and just receive the correct data.
How could I perform the query in firebase?
Thank you so much in advance,
Cheers.
In Firebase, this is easy, assuming that characters is a collection...
If it's the case, one way to do it is to structure your "charachter" documents like so:
'char001': {
name: "John",
tools: {
knife: true,
MagicX: true,
laser: true
}
}
This way, you'll be able to perform compound EQUALITY queries and get back all the characters with the tools you're searching for. Something like:
db.collection('characters').where('tools.knife', '==', true).where('tools.magicX', '==', true)
Mind you, you can combine up to 10 equality clauses in a query.
I hope this helps, search for "firestore compound queries" for more info.
I have a collection of nodes that make up a DAG (directed acyclic graph) with no loops guaranteed. I want to store the nodes in a database and have the database execute a search that shows me all paths between two nodes.
For example, you could think that I have the git history of a complex project.
Each node can be described with a JSON object that has:
{'id':'id',
'outbound':['id1','id2','id3']}
}
So if I had these nodes in the database:
{'id':'id0',
'outbound':['id1','id2']}
}
{'id':'id1',
'outbound':['id2','id3','id4','id5,'id6']}
}
{'id':'id2',
'outbound':['id2','id3'}
}
And if I wanted to know all of the paths connecting id0 and id3, I would want to get three lists:
id0 -> id1 -> id3
id0 -> id2 -> id3
id0 -> id1 -> id2 -> id3
I have thousands of these nodes today, I will have tens of thousands of them tomorrow. However, there are many DAGs in the database, and the typical DAG only has 5-10 nodes, so this problem is tractable.
I believe that there is no way to do this efficiently MySQL (right now all of the objects are stored in a table in a JSON column), however I believe that it is possible to do it efficiently in a graph database like Neo4j.
I've looked at the Neo4J documentation on Path Finding Algorithms and perhaps I'm confused, but the examples don't really look like working examples. I found a MySQL example which uses stored procedures and it doesn't look like it parallelizes very well. I'm not even sure what Amazon Neptune is doing; I think that it is using Spark GraphX.
I'm sort of lost as to where to start on this.
It's perfectly doable with Neo4j.
Importing json data
[
{"id":"id0",
"outbound":["id1","id2"]
},
{"id":"id1",
"outbound":["id2","id3","id4","id5","id6"]
},
{"id":"id2",
"outbound":["id2","id3"]
}
]
CALL apoc.load.json("graph.json")
YIELD value
MERGE (n:Node {id: value.id})
WITH n, value.outbound AS outbound
UNWIND outbound AS o
MERGE (n2:Node {id: o})
MERGE (n)-[:Edge]->(n2)
Apparently the data you provided is not acyclic...
Getting all paths between two nodes
As you are not mentioning shortest paths, but all paths, there is no specific algorithm required:
MATCH p=(:Node {id: "id0"})-[:Edge*]->(:Node {id: "id3"}) RETURN nodes(p)
"[{""id"":id0},{""id"":id1},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id2},{""id"":id2},{""id"":id3}]"
"[{""id"":id0},{""id"":id1},{""id"":id2},{""id"":id2},{""id"":id3}]"
Comparaison with MySql
See how-much-faster-is-a-graph-database-really
The Graph Data Science library pathfinding algorithms are designed to find the shortest weighted paths and use algorithms similar to Dijkstra to find them. In your case, it seems that you are dealing with a directed unweighted graph and you could use the native cypher allShortestPath procedure:
An example would be:
MATCH (n1:Node{id:"A"}),(n2:Node{id:"B"})
MATCH path=allShortestPaths((n1)-[*..10]->(n2))
RETURN [n in nodes(path) | n.id] as outbound_nodes_id
It is always useful to check the Cypher refcard to see what is available with Cypher in Neo4j
Warning:
this is an exercise to understand better JSON database design in Firebase
it is not necessarily realistic
I have got a two ways relationship between users and door keys. I would like to understand:
how to represent this relationship visually (I can imagine it only as two separate trees)
how this would work on Firebase, would both users and door-keys be child of a parent node "myparentnodename"?
If I model the database in this way it feels highly inefficient because every time I would query the child node "users" I would get all the users back. Or am I wrong? Is it possible to only get back the data matching to a specific user? E.g. get user where "user = user1"? Can we do nested queries? e.g. combine the previous condition with some condition on the door keys so the JSON object returned is only relevant to the door-keys contained in the "user1" node?
This is a very long answer as your question was actually about 5 different questions.
root node is: myparentnodename
Your users
users
uid_0
name: "William"
door_keys:
key_0: true
key_3: true
uid_2
name: "Leonard"
door_keys:
key_3: true
key_5: true
and your keys
keys
key_0
uid_0: true
key_3
uid_0: true
uid_2: true
key_5
uid_5: true
With this structure, all of the elements 'point' at each other.
If you query uid_0 you can see that they use keys 0 and 3
If you query key_3 you can see they belong to users 0 and 2
Your question was
every time I would query the child node "users" I would get all the
users back
That's slightly incomplete. When a query is done, you usually query for something specific. With Firebase however, there are two ways to retrieve data: observing a node and a query.
If you want back all users in the users node you would observe that node by .Value (or .ChildAdded for 1 at a time).
ref = myParentNodeName
let usersRef = myParentNodeName.childByAppendingPath("users")
usersRef.observeEventType(.Value, withBlock: { snapshot in
//.Value can return multiple nodes within the snapshot so iterate over them
for child in snapshot.children {
let name = child.value.objectForKey("name") as! String
print(name) //prints each users name
}
})
note that the above attaches an observer to the users node so any future changes within that node will notify the app and re-send the entire node to the app
If you want just one user's info, and don't want to continue watching for changes
ref = myParentNodeName
let usersRef = myParentNodeName.childByAppendingPath("users")
let thisUserRef = usersRef.childByAppendingPath("uid_2")
thisUserRef.observeSingleEventOfType(.Value, withBlock: { snapshot in
let name = child.value.objectForKey("name") as! String
print(name) //prints each users name
})
Finally, to query for all keys that belong to uid_0 (which is a little redundant in this example since we already know which keys they have from their node). If the keys ref also contained other info like the door name, the building the door was in, or the door location, it would be more appropriate and would require a different structure, so assume that's the case:
ref = myParentNodeName
let keysRef = myParentNodeName.childByAppendingPath("keys")
keysRef.queryOrderedByChild("uid_0").queryEqualToValue(true)
.observeSingleEventOfType(.Value, withBlock: { snapshot in
let doorLocation = child.value.objectForKey("door_location") as! String
print(doorLocation) //prints each users name
})
note this code is Swift since the platform was not specified in the question.
The other question:
Can we do nested queries? e.g. combine the previous condition with
some condition on the door keys so the JSON object returned is only
relevant to the door-keys contained in the "user1" node?
I think you mean can you query for uid_2, see which keys they have and then load in the info from those specific keys.
Yes! But... (there's always a but)
Firebase is asynchronous so you have to take that into account when nesting queries i.e. you need to ensure all of the data is returned before getting more data. So for example, if you wanted uid_2 key data, you could observeSingleEventOfType on node uid_2. You would then have their keys and could then observeSingleEventOfType on each key.
Technically this will work but with asynchronous data flying around, you could end up with code stomping on other code and processing data before it's actually been returned.
The better option (per my example) is to just avoid that entirely and query the keys node for uid_2's keys.
As a side note, observing a node has a lot less overhead than a query so if you want to load a single node and you know the path, use observe.
Just to level-set:
CakePHP 3 introduces Entity objects which can represent an ORM (database) record as an object rather than an array. Creating the object from the raw data is called "hydration". This has pros and cons, depending on what you're trying to achieve, so CakePHP gives you the option to control hydration through the hydrate() function which can be chained in the query.
What I've observed is only the top level results are hydrated; nested results are not. So if my query is something like:
$authors=$this->Authors->find("all")->contain("Books");
$this->set("authors",$authors);
will return something along the lines of
authors (array) << This is an array since we can have multiple records
0 (object) <<< This is the Entity object representing the first Author
id 1
name "Roger Kaplan"
[other author fields]
books (array) <<< This is an array because there are multiple books
0 (array) <<< I expect this to be an entity object!!
id 100
title "CakePHP Made Easy"
[other book fields]
1 (array) <<< I want this to be an entity too
id 101
title "Solving Java-induced Neuroses"
[other book fields]
Is it possible to have the nested entities hydrated as well?
The reason I'm asking is that I'm building helpers which expect an entity object to be passed, and uses the metadata on the entity object to do interesting things. I want to be able to pass nested records as well as top-level ones.
EDIT:
Something I just noticed is that belongsTo associations, which will only contain one record, will be inserted as an array of values (ie not an entity) while hasMany associations will return an array of entities. Here is the dump from my actual project; I've attempted to edit it down for clarity:
$message = $this->Messages->find("all")->where(["Messages.id" => $message_id])->contain(["MessageBodies","JobOrders","Candidates"])->first();
Messages belongsTo Candidates and JobOrders, and hasMany MessageBodies.
Here is a rendering of the result:
message(array)
id 1
job_order_id 2
candidate_id 1
candidate(array)
id 1
first_name Roger
last_name Kaplan
job_order(array)
id 2
name Chief Cook and Bottle Washer
message_bodies(array)
0(object)
1(object)
2(object)
So if my assumption is correct, that only hasMany associations are returned as an array of entities, the question is, how can I get belongsTo (and possibly hasOne, which I tend not to use) contained data to show up as Entity's?
The data passed into the view does indeed contain nested entities, as verified by the debugger. I was using the variable viewer in the Cake toolbar to look into the structure, and that view was reporting the embedded entities as arrays. Based on that data, I was reaching into the structure with ["array syntax"] but the Cake entity is smart enough to intercept that and convert to get() calls.
Assume an object schema stored in ScriptDb:
{name: 'alice',
age: 12,
interests: [
{interest: 'tea parties', enthusiasm: 'high'},
{interest: 'croquet', enthusiasm: 'moderate'},
]
}
I understand how to query against the first two attributes but not how to run a query to return all rows where interests[enthusiasm = moderate]
Taking that example literally and trying: db.query({interests:[{enthusiasm: 'moderate'}]});
returns a ScriptDbResult but any attempt to use that result's methods results in an error:
Queries can only contain letters, numbers, spaces, dashes and underscores as keys.
This is not currently possible. It may be supported in a future update. The best you can do now is load all interests and loop through them yourself.