Relationships in Neo4J - csv

I have two csv files namely: file1.csv {contains fields: Gene, Tumor, Totalpatients, Level} and file2.csv{Gene, Sample, Value, Abundance}
I need to create the relations between two files such as: Gene is connected to Tumor and created to sample too and so on (similar relations)
I am trying the following but it shows me the desired result (explained below the code)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'file///home/file1.csv' AS row
LOAD CSV WITH HEADERS
FROM 'file///home/file2.csv' AS line
MERGE (genes:Genes { name: 'Genes' })
MERGE (cancer:Cancer { name: 'Cancer' })
MERGE (rna:RNA {name: 'RNA'})
MERGE (gene:Gene {name: UPPER(row.Gene)})
MERGE (tumor:Tumor {name: UPPER(row.Tumor)})
MERGE (patient:Patients { name: 'Patients' })
MERGE (total:Totalpatients {name: UPPER(row.Totalpatients)})
MERGE (level:Level {name: UPPER(row.Level)})
MERGE (count:Countpatients {name: UPPER(row.Countpatients)})
MERGE (sample:Sample{name: UPPER(line.Sample)})
MERGE (genes)-[:GENES]->(gene)
MERGE (genes)-[:TUMOR]->(tumor)
MERGE (gene)-[:RNA]->(Sample)
RETURN row;
After executing it the relation RNA shows Gene which is connected to empty node i.e no property to show.
How to correct it?

In the absence of detailed knowledge of the contents of your .csv files and assuming that the Cypher posted above is exactly the same as that you are executing...
I think this could be a simple casing issue. Spot the difference:
MERGE (sample:Sample{name: UPPER(line.Sample)})
MERGE (gene)-[:RNA]->(Sample)
You have the identifier sample beginning with a lower case 's' but in the relationship MERGE statement it is an upper case 'S', i.e. Sample.
This could be confirmed by checking whether the node at the other end of the :RNA relationship from gene:Gene has a label.
MATCH (gene:Gene)-[:RNA]-(sample)
RETURN labels(sample)
If it does not then the MERGE statement has created the Sample node in your original statement as the pattern did not exist. Of course, Sample has no labels and no properties - hence your observation.

Related

Creating one to many relations in neo4j

so I'm very new to using a graph database, and I have chosen neo4j. I'm trying to make a simple recommending system based on the graph nodes.
So I have my original dataset that is a CSV that looks like this:
Since some of the fields have Semicolons, I separated them and parsed it to a new CSV. (Basically made every combination of fields)
New CSV looks like this:
Above image is just shown for N2, I have done the same thing for N1 and N3 aswell.
Now, I need to create nodes and relations in such a way that each
Name KNOWS Language
Name WORKED_WITH Database.
Hence, I ran the following query:
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row
CREATE (n:Name {name: row.Name})
CREATE (l: Language {language: row.Language})
CREATE (d: Database {database: row.Database})
CREATE (n)-[:KNOWS]->(l)
CREATE (n)-[:WORKED_WITH]->(d)
This is the following output I get:
Only shown for N2 nodes
Since I want to build a recommender, my idea was to link the name to language and database.
Expected output:
I want to link it in this way so I can count the total number of incoming nodes on a Language or Database to recommend it.
Can someone tell me where I'm going wrong?
When you use CREATE clause it creates new nodes each time.
If you want to use the existing node and create only if it doesn't exist then you need to use MERGE clause instead of CREATE.
Here is your query with MERGE:
LOAD CSV WITH HEADERS FROM "file:///data.csv" AS row
MERGE (n:Name {name: row.Name})
MERGE (l: Language {Language: row.Language})
MERGE (d: Database {database: row.Database})
MERGE (n)-[:KNOWS]->(l)
MERGE (n)-[:WORKED_WITH]->(d)

Create a node for each column only once while importing csv into Neo4j

I have a csv file that looks the following way:
I want to create a database from it in Neo4j. Rows are nodes with labels gene, columns are also nodes with labels cell. I need to write a CREATE query that would create all my gene and cell - nodes and a relationship one for each combination of gene and cell. Currently I am stuck with the following code:
LOAD CSV WITH HEADERS FROM 'file:///merged_full.csv' AS line
CREATE (:Gene {id: line.gene_ids, name: line.wikigene_name})
I need to somehow iterate over all columns - starting from index 3 - after creating gene nodes, but I do not know how to do that.
Here are 3 queries that, performed in order, should do what you want.
This query creates a temporary Headers node with a names property that contains the collection of headers from the CSV file. It uses LIMIT 1 to only process the first row of the file. It also creates all the Cell nodes, each with it own name property.
LOAD CSV FROM 'file:///merged_full.csv' AS line
MERGE (h:Headers)
SET h.names = line
WITH line
LIMIT 1
UNWIND line[3..] AS name
MERGE (c:Cell {name: name})
This query uses the APOC function apoc.map.fromNodes to generate a map named cells, which maps each cell name to its cell node. It also gets the Headers node. It then loads the non-header data from the CSV file (using SKIP 1 to skip over the header row), and processes each row as follows. It uses MERGE to get/create a Gene node, g, with the desired id and name. It uses the REDUCE function to generate a collection of the Cell nodes that have a "1" column value in the current row, and the FOREACH clause then creates a (g)-[:HAS]->(x) relationship (if necessary) for every cell, x, in that collection.
WITH apoc.map.fromNodes('Cell', 'name') AS cells
MATCH (h:Headers)
LOAD CSV FROM 'file:///merged_full.csv' AS line
WITH h, cells, line
SKIP 1
MERGE (g:Gene {id: line[1], name: line[2]})
FOREACH(
x IN REDUCE(s = [], i IN RANGE(3, SIZE(line)-1) |
CASE line[i] WHEN "1" THEN s + cells[h.names[i]] ELSE s END) |
MERGE (g)-[:HAS]->(x))
This query just deletes the temporary Headers node (if you wish):
MATCH (h:Headers)
DELETE h;
If the columns correspond with cell nodes, then you should know all the cell nodes you need just be looking at the CSV header.
I'd recommend writing a small query just to create each of the cell nodes you need, then create an index or unique constraint on :Cell(id) (or name, or whatever the property is that is meant to identify a :Cell).
At that point the problem becomes getting and processing each relevant column (I assume only the ones with 1 as the value). APOC Procedures may help here.
apoc.map.sortedProperties() can be used to take your line map and give you a list of key/value list pairs, which you can filter down to those where the key begins with 'V', and where the value is 1, then use what's remaining to match on the relevant :Cell node and create the relationship.

Mapping Similarities based on Attributes

I have a CSV that looks like this:
Bob 123.com random.com something.something.com etc
Mike 123.com random.com something.something.something.com etc
Joe etc.com random.domain.com random.com something.com
The names are the labels I am using and the domain names are the attributes that I would want to connect to one another based on similarity (name of attribute). How can I do this without typing every single one of the labels and attributes?
Given your CSV file format, here is an example of how to create unique Person and Domain nodes, and the relationships between them:
LOAD CSV FROM 'url-of-csv' AS row
MERGE (p:Person {name: row[0]})
WITH p, TAIL(row) AS domains
UNWIND domains AS domain
MERGE (d:Domain {name: domain})
MERGE (p)-[:IN]->(d);
And there is an example of how you'd get all the people who are in the random.com domain:
MATCH (d:Domain {name: 'random.com'})<-[:IN]-(p:Person)
RETURN p;

Load from CSV to create a binary tree

I need to create a binary tree in Neo4j. I've started with creating two CSV, one for vertices and one for edges and then I launched two query to create the entire tree.
I thought that I could create the entire tree with only one query.
The CSV from where I start is this:
"parent","child_1","child_1_attr1","child_1_attr2","edge_1_attr1","edge_1_attr2","child_2","child_2_attr1","child_2_attr2","edge_2_attr1","edge_2_attr2"
"vertex_1","vertex_2","2","5","4","1","vertex_3","5","3","2","2"
"vertex_2","vertex_4","3","5","2","3","vertex_5","4","4","4","3"
"vertex_3","vertex_6","2","1","2","4","vertex_7","2","2","5","5"
"vertex_4","vertex_8","4","4","4","5","vertex_9","2","3","2","5"
"vertex_5","vertex_10","1","1","3","3","vertex_11","1","3","2","3"
"vertex_6","vertex_12","3","1","1","1","vertex_13","1","2","5","1"
"vertex_7","vertex_14","4","2","2","1","vertex_15","2","5","4","3"
Then I tried this query:
LOAD CSV WITH HEADERS FROM 'file:///Prova1.csv' AS line
Match (p:Vertex {name: line.parent})
Create (c1:Vertex {name: line.child_1, attr1: line.child_1_attr1, attr2: line.child_1_attr2})
Create (c2:Vertex {name: line.child_2, attr1: line.child_2_attr1, attr2: line.child_2_attr2})
Create (p)<-[:EDGE {attr1: line.edge_1_attr1, attr2: line.edge_1_attr2}]-(c1)
Create (p)<-[:EDGE {attr1: line.edge_2_attr1, attr2: line.edge_2_attr2}]-(c2)
Before this query I create manually the first Vertex, and the I run this query, but the only result that I get is the creation of Vertices 1,2 and 3.
It should match the parent ( that is always already created ), then create the two childs and then it should connect these two children to his father.
Who can help me?
Likely your view of execution is, for each line/row, execute all Cypher code, then repeat this for the next line/row until finished, and this is incorrect.
Instead, each individual Cypher operation will execute for all rows, then the next Cypher operation will execute for all rows, etc.
This means your MATCH operation:
Match (p:Vertex {name: line.parent})
is performed across all lines in your CSV at the same time, and only then will it proceed to the next operation (your CREATE, acting on all lines), and so on.
Since you stated that you manually created the first vertex, that vertex is the only one that will be matched upon, the MATCH will fail for all other lines in your CSV because the CREATE statements haven't executed yet so those nodes don't exist. This means only two vertexes will be created, the children of that solely matched node.
It's usually good practice when importing CSV data to create all nodes first, and then have a separate CSV processing for matching upon the already created nodes and creating the relevant relationships.
However, if you do want to create everything in one goal, you'll likely want to use MERGE in various places, but this is also tricky if you don't fully understand the behavior of MERGE (it's like an attempt to MATCH, and if no match is found, a CREATE) or don't fully understand how Cypher is executed (as in this case).
You'll also want to MERGE according to unique node values instead of all properties, and SET the remaining properties. It's also especially helpful to have either unique constraints or indexes (whichever is appropriate) on the relevant label/properties for faster execution, especially as the size of your graph grows.
This query may work.
LOAD CSV WITH HEADERS FROM 'file:///Prova1.csv' AS line
MERGE (p:Vertex {name: line.parent})
MERGE (c1:Vertex {name: line.child_1})
SET c1.attr1 = line.child_1_attr1, c1.attr2 = line.child_1_attr2
MERGE (c2:Vertex {name: line.child_2})
SET c2.attr1 = line.child_2_attr1, c2.attr2 = line.child_2_attr2
Create (p)<-[:EDGE {attr1: line.edge_1_attr1, attr2: line.edge_1_attr2}]-(c1)
Create (p)<-[:EDGE {attr1: line.edge_2_attr1, attr2: line.edge_2_attr2}]-(c2)
The reason this one works is that by the time your very first MERGE is completed for all parent nodes, that will have created ALL parent nodes (or rather, nodes that will be parents) in your graph.
So when we reach the MERGE for your child nodes, that will MATCH most of those already created nodes in your graph...the only new nodes that will be created at that point will be leaf nodes, which would not have been created by your first MERGE, since they won't be parents to any other nodes, and do not show up in your parent column in your CSV.
For some reason the import query is not working because you match the parent first and then create your nodes and relationships. I modified the query this way and it is working now:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS line
CREATE (c1:Vertex {name: line.child_1, attr1: line.child_1_attr1, attr2: line.child_1_attr2}),
(c2:Vertex {name: line.child_2, attr1: line.child_2_attr1, attr2: line.child_2_attr2}) WITH c1,c2, line
MATCH (p:Vertex {name:line.parent}) CREATE (p)<-[:EDGE {attr1: line.edge_1_attr1, attr2: line.edge_1_attr2}]-(c1),
(p)<-[:EDGE {attr1: line.edge_2_attr1, attr2: line.edge_2_attr2}]-(c2)
So if you first create the nodes and then match the parent and create the relationships the query is working. The result looks like this:
I will investigate on your query to find a reason why it is not working because I do not really understand why it is not working.
foreach (num in range(1,15) |
merge (parent:Node {number: num})
merge (left:Node {number: num + num})
merge (right:Node {number: num + num + 1})
merge (left)<-[:LEFT]-(parent)-[:RIGHT]->(right)
)
Explanation:
This creates a perfect binary tree structure with 31 nodes. Then you can include the same numbers in the CSV to find and add properties to each correspondingly numbered node.
In a binary tree if you include a number property on the first (root or top most node) with value 1, then increment each subsequent node's number value by 1 (left to right; top to bottom) then you get a convenient mathematical relationship where each node's left child has a number value of the parent's number + number and the right child is number + number + 1.

Connect imported nodes (CSV LOAD) to a general group

I was trying to build a query that will solve this tasks:
Import a CSV with the format "user","group" to Neo4J
Generate for each USER a node - avoid duplicates
Generate for each GROUP a node - avoid duplicates
Connect the node USER to the imported GROUP
Finally connect every imported GROUP to a MAINGROUP
I have written the query like this:
LOAD CSV FROM "file:.....csv" AS csvLine
MERGE (u:User { name: csvLine[0]})
MERGE (g:Group { name: csvLine[1]})
MERGE (u)-[:IS_MEMBER_OF]->(g)
MERGE (g)-[:IS_MEMBER_OF]->(m:Group {name: "MAINGROUP"})
So far this works as I get every User and every Group and they are connected.
Problem: All my GROUPs do not have a relationship to a single Node (MAINGROUP) but each GROUP has a relationsship to a duplicate MAINGROUP - means for every GROUP my query seems to generate a duplicate new MAINGROUP (although I was hoping MERGE would prevent this) so I end up with as many nodes MAINGROUP as I have imported GROUPs.
How do I need to alter the query to get the desired graph?
This is a common gotcha of using MERGE. See the docs here.
When you use MERGE on a pattern, it creates everything if the entire pattern didn't already exist, not just the portions of the pattern that didn't already exist.
What you should be doing is using MERGE once to find/create (m:Group {name: "MAINGROUP"}) and then MERGE just the new relationship. Because MERGE is matching on the whole pattern (g)-[:IS_MEMBER_OF]->(m:Group {name: "MAINGROUP"}) and it doesn't exist, it's re-creating the main group every time.
So you might want to do this:
LOAD CSV FROM "file:.....csv" AS csvLine
MERGE (u:User { name: csvLine[0]})
MERGE (g:Group { name: csvLine[1]})
MERGE (u)-[:IS_MEMBER_OF]->(g)
MERGE (m:Group {name: "MAINGROUP"})
MERGE (g)-[:IS_MEMBER_OF]->(m)
The last two lines are different.
This way of getting tripped up with MERGE is unfortunately really common. :)