LOAD CSV: mulitple MERGE and Eager operator - csv

I have a large CSV file that contains mulitple nodes per line. I would like to use LOAD CSV to MERGE the nodes and set some properties. However, I always get the "Eager operator" warning for this query:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///MRCONSO.RRF' AS line FIELDTERMINATOR '|'
MERGE (c:Concept {cui: line[0]})
ON CREATE SET c.language = line[1]
MERGE (l:LexicalForm {lui: line[3]})
ON CREATE SET l.status = line[2];
When I remove the ON CREATE part it works but I want to merge on a specific ID, not on the combination of the ID and the other properties.
Is it possible to rephrase this somehow to avoid the Eager operator? I would like to create 6 different nodes from a single line and the alternative would be to iterate the file 6 times.

You have an RRF file and load CSV requires a CSV line. When you reference a field in the csv you need to include the data type and your tag (line). You also need the periodic commit number of rows per iteration. Your file needs to be in the import folder of your database.
For example:
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:///MRCONSO.RRF' AS line FIELDTERMINATOR '|'
MERGE (c:Concept {cui: toInteger(linecui),language: toString(line.language})

Related

Neo4j Cypher - Load Tree Structure from CSV

New to cypher, and I'm trying to load in a csv of a tree structure with 5 columns. For a single row, every item is a node, and every node in column n+1 is a child of the node in column n.
Example:
Csv columns: Level1, Level2, Level3, Level4, Level5
Structure: Level1_thing <--child_of-- Level2_thing <--child_of-- Level3_thing etc...
The database is non-normalized, so there are many repetitions of node names in all the levels except the lowest ones. What's the best way to load in this csv using cypher and create this tree structure from the csv?
Apologies if this question is poorly formatted or asked, I'm new to both stack overflow and graph DBs.
What you are searching is the MERGE command.
To do your script you have to do it in two phases for an optimal execution
1) Create nodes if they don't already exist
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///my_file.csv" AS row
MERGE (l5:Node {value:row.Level5})
MERGE (l4:Node {value:row.Level4})
MERGE (l3:Node {value:row.Level3})
MERGE (l2:Node {value:row.Level2})
MERGE (l1:Node {value:row.Level1})
2) Create relationships if they don't already exist
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///my_file.csv" AS row
MATCH (l5:Node {value:row.Level5})
MATCH (l4:Node {value:row.Level4})
MATCH (l3:Node {value:row.Level3})
MATCH (l2:Node {value:row.Level2})
MATCH (l1:Node {value:row.Level1})
MERGE (l5)-[:child_of]->(l4)
MERGE (l4)-[:child_of]->(l3)
MERGE (l3)-[:child_of]->(l2)
MERGE (l2)-[:child_of]->(l1)
And before all, you need to create a constraint on your node to facilitate the work of the MERGE. On my example it will be :
CREATE CONSTRAINT ON (n:Node) ASSERT n.value IS UNIQUE;
IIUC, you can use the LOAD CSV function in Cypher to load both nodes and relationships. In your case you can use MERGE to take care of duplicates. Your example should work in this way, with a bit of pseudo-code:
LOAD CSV with HEADERS from "your_path" AS row
MERGE (l1:Label {prop:row.Level1}
...
MERGE (l5:Label {prop:row.Level1}
MERGE (l1)<-[CHILD_OF]-(l2)<-...-(l5)
Basically you can create on-the-fly nodes and relationships while reading from the .csv file with headers. Hope that helps.
If the csv-file does not have a header line, and the column sequence is fixed, then you can solve the problem like this:
LOAD CSV from "file:///path/to/tree.csv" AS row
// WITH row SKIP 1 // If there is a headers, you can skip the first line
// Pass on the columns:
UNWIND RANGE(0, size(row)-2) AS i
MERGE (P:Node {id: row[i]})
MERGE (C:Node {id: row[i+1]})
MERGE (P)<-[:child_of]-(C)
RETURN *
And yes, before that it's really worth adding an index:
CREATE INDEX ON :Node(id)

cypher - load multiple csv files

I have many csv files with names 0_0.csv , 0_1.csv , 0_2.csv , ... , 1_0.csv , 1_1.csv , ... , z_17.csv.
I wanted to know how can I import them in a loop or something ?
Also I wanted to know am I doing it good ? ( each file is 50MB and whole files size is about 100GB )
This is my code :
create index on :name(v)
create index on :value(v)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///0_0.txt" AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
You could handle multiple files by constructing a file name. Unfortunately this seems to break when using the USING PERIODIC COMMIT query hint so it won't be a good option for you. You could create a script to wrap it up and send the commands to bin/cypher-shell though.
UNWIND ['0','1','z'] as outer
UNWIND range(0,17) as inner
LOAD CSV WITH HEADERS FROM 'file:///'+ outer +'_' + toString(inner) + '.csv' AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
As far as your actual load query goes. Do you name and value nodes come up multiple times in the files? If they are unique, you would be better off loading the the data in multiple passes. Load the nodes first without the indexes; then add the indexes once the nodes are loaded; and then do the relationships as the last step.
Using CREATE for the :kind relationship will result in multiple relationships even if it is the same value for csv.kind. You might want to use MERGE instead if that is the case.
For 100 GB of data though if you are starting with an empty database and are looking for speed, I would take a look at using bin/neo4j-admin import.

Neo4J Create Relationships in Cypher returns no changes, no rows

I have a CSV dataset through which I'm trying to build relationships between two node types(Comment and Person) that already exist in my database.
This is the database information -
This is the csv file of the current relationship comment_hasCreator_person that I'm trying to build -
The problem is - no matter which Cypher query I try, all of them returns the same thing - "no changes, no rows".
Here are the different variations of the query I've tried -
This is the first query -
// comment_hasCreator_person_0_0.csv
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://dl.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv" AS line
MATCH (comment:Comment { id: toInt(line.Comment.id)}),(person:Person { id: toInt(line.Person.id)})
CREATE (comment)-[:hasCreator]->(person)
I assumed this might have not worked because my CSV headers were initially named Comment.id and Person.id. So I removed the . and tried out the query, with the same result -
// comment_hasCreator_person_0_0.csv
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://dl.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv" AS line
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
CREATE (comment)-[:hasCreator]->(person)
When that didn't work, I followed this answer and tried using MERGE instead of CREATE, even though it shouldn't make a difference because the relationships didn't exist in the first place -
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://www.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv?dl=0" AS line
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
MERGE (comment)-[r:hasCreator]->(person)
RETURN comment,r, person
This query just returned "no rows".
I also tried a variation of the query where I didn't use the toInt() function, but that didn't make any difference.
To ensure the nodes exist, I selected random cell values from the CSV file and used a MATCH clause to ensure the corresponding Comment and Person nodes exist in the database, and I did find all the nodes.
As the last step, I decided to create a relationship manually between the first row values from the CSV file -
MATCH (c:Comment{id:1236950581249}), (p:Person{id:10995116284808})
CREATE (c)-[r:hasCreator]->(p)
RETURN c,r,p
and this worked just fine -
I'm totally clueless as to why the relationships won't get created when I import it from the CSV file. I would appreciate any help.
You have a problem in yout CSV file. The field terminator character used in it is "|" and not the default ",". You can edit your CSV file and chenge the field terminator character to "," or use the option FIELDTERMINATOR available in the LOAD CSV.
Try editing your query to something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://www.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv?dl=0" AS line
FIELDTERMINATOR '|'
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
MERGE (comment)-[r:hasCreator]->(person)
RETURN comment,r, person
You are missing the field terminator here as it is | in your case, instead of ;.
You can try this out:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "filename" AS LINE FIELDTERMINATOR '|'
MERGE (comment:Comment { id: toInt(LINE.Commentid)})
MERGE (person:Person { id: toInt(line.Personid)})
MERGE (comment) - [r:has_creator] -> (person)
RETURN comment,r,person
Another reason for this kind of error may be white spaces in CSV file. If line in CSV looks like:
2a9b40bc-78f0-4e79-9b2b-441108883448, Pink node - 2, 2, pink
then index 1 for results will be: ' Pink node - 2' (notice space at beginning), not: 'Pink node - 2'. Editing csv files or using trim() function would be the solution here:
...
WHERE a.id = trim(line[0]) AND b.id = trim(line[1])
...

Create Relationship from CSV Import Adding a Relationship Property

I have created a set of nodes from a CSV import and labelled them as 'Argument'.
I have another CSV file which contains Connector_ID, Start_Object_ID, End_Object_ID which I want to:
Create the relationship (from start object to end object)
Add the value of the Connector_ID to the relationship created
At the moment I've only got as far as failing to create the relationships (valid syntax but does nothing) using:
LOAD CSV WITH HEADERS FROM "file:///Users/argument_has_part_argument.txt" AS row
MATCH (argument1:Argument {object_ID: row.Start_Object_ID})
MATCH (argument2:Argument {object_ID: row.End_Object_ID})
MERGE (argument1)-[:has_part]->(argument2);
but cannot yet see
why it fails to do anything
how to get it to create a relationship
and how to add the Connector_ID to the connector so created.
Any pointers?
from: http://neo4j.com/developer/guide-import-csv/#_csv_data_quality
Cypher
What Cypher sees, is what will be imported, so you can use that to your advantage. You can use LOAD CSV without creating graph structure and just output samples, counts or distributions. So it is also possible to detect incorrect header column counts, delimiters, quotes, escapes or spelling of header names.
// assert correct line count
LOAD CSV FROM "file-url" AS line
RETURN count(*);
// check first few raw lines
LOAD CSV FROM "file-url" AS line WITH line
RETURN line
LIMIT 5;
// check first 5 line-sample with header-mapping
LOAD CSV WITH HEADERS FROM "file-url" AS line WITH line
RETURN line
LIMIT 5;
For your last question:
LOAD CSV WITH HEADERS FROM "file:///Users/argument_has_part_argument.txt" AS row
MATCH (argument1:Argument {object_ID: row.Start_Object_ID})
MATCH (argument2:Argument {object_ID: row.End_Object_ID})
MERGE (argument1)-[r:has_part]->(argument2)
ON CREATE SET r.connector_ID = row.Connector_ID;

Uploading CSV in neo4j

I am trying to upload the following csv (https://www.dropbox.com/s/95j774tg13qsdxr/out.csv?dl=0) file in to neo4j by following command
LOAD CSV WITH HEADERS FROM
"file:/home/pavan637/Neo4jDemo/out.csv"
AS csvimport
match (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
merge (Prokaryotes_Proteins: Prokaryotes_Proteins{UniprotID: csvimport.DBUni, ProteinID: csvimport.ProteinID, IdentityPercentage: csvimport.IdentityPercentage, AlignedLength:csvimport.al, Mismatches:csvimport.mm, QueryStart:csvimport.qs, QueryEnd: csvimport.qe, SubjectStrat: csvimport.ss, SubjectEnd: csvimport.se, Evalue: csvimport.evalue, BitScore: csvimport.bs})
merge (uniprotid)-[:BlastResults]->(Prokaryotes_Proteins)
I used "match" command in the LOAD CSV command in order to match with the "Uniprot_ID's" of previously loaded CSV.
I have first loaded ReactomeDB.csv (https://www.dropbox.com/s/9e5m1629p3pi3m5/Reactomesample.csv?dl=0) with the following cypher
LOAD CSV WITH HEADERS FROM
"file:/home/pavan637/Neo4jDemo/Reactomesample.csv"
AS csvimport
merge (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
merge (reactionname: ReactionName{ReactionName: csvimport.ReactionName, ReactomeID: csvimport.ReactomeID})
merge (uniprotid)-[:ReactionInformation]->(reactionname)
into neo4j which was successful.
Later on I am uploading out.csv
From both the CSV files, Uniprot_ID columns are present and some of those ID's are same. Though some of the Uniprot_ID are common, neo4j is not returning any rows.
Any solutions
Thanks in Advance
Pavan Kumar Alluri
Just a few tips:
only use ONE label and ONE property for MERGE
set the others with ON CREATE SET ...
try to create nodes and rels separately, otherwise you might get into memory issues
you should be consistent with your spelling and upper/lowercase of properties and labels, otherwise you will spent hours in debugging (labels, rel-types and property-names are case-sensitive)
you probably don't need merge for relationships, create should do fine
for your statement:
CREATE CONSTRAINT ON (up:UniprotID) assert pp.Uniprotid is unique;
CREATE CONSTRAINT ON (pp:Prokaryotes_Proteins) assert pp.UniprotID is unique;
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/pavan637/Neo4jDemo/out.csv" AS csvimport
merge (pp: Prokaryotes_Proteins {UniprotID: csvimport.DBUni})
ON CREATE SET pp.ProteinID=csvimport.ProteinID,
pp.IdentityPercentage=csvimport.IdentityPercentage, ...
;
LOAD CSV WITH HEADERS FROM "file:/home/pavan637/Neo4jDemo/out.csv" AS csvimport
match (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
match (pp: Prokaryotes_Proteins {UniprotID: csvimport.DBUni})
merge (uniprotid)-[:BlastResults]->(Prokaryotes_Proteins);