Neo4J load CSV and create relationships doesn't import all data - csv

I am using following CSV load Cypher statement to import csv file with about 3.5m records. But it only imports about 3.2m. So about 300000 records are not imported.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///path/to/csvfile.csv") as line
CREATE (ticket:Ticket {id: line.transaction_hash, from_stop: toInt(line.from_stop), to_stop: toInt(line.to_stop), ride_id: toInt(line.ride_id), price: toFloat(line.price)})
MATCH (from_stop:Stop)-[r:RELATES]->(to_stop:Stop) WHERE toInt(line.route_id) in r.routes
CREATE (from_stop)-[:CONNECTS {ticket_id: ID(ticket)}]->(to_stop)
Note that Stop nodes are already created in separate import statement.
When I only created Nodes without creating relationships it was able to import all data. This same import statement works fine with smaller set of same format csv data.
I tried twice just to make sure it wasn't terminated accidentally.
Is there node to relationship limit in Neo4J? Or what could be other reason?
Neo4J version: 3.0.3 size of database directory is 5.31 GiB.

This is probably because whenever the MATCH does not succeed for a line, the entire query for that line (including the first CREATE) also fails.
On the other hand, the failure of an OPTIONAL MATCH would not abort the entire query for a line. Try this:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM ("file:///path/to/csvfile.csv") as line
CREATE (ticket:Ticket {id: line.transaction_hash, from_stop: toInt(line.from_stop), to_stop: toInt(line.to_stop), ride_id: toInt(line.ride_id), price: toFloat(line.price)})
OPTIONAL MATCH (from:Stop)-[r:RELATES]->(to:Stop)
WHERE toInt(line.route_id) in r.routes
FOREACH(x IN CASE WHEN from IS NULL THEN NULL ELSE [1] END |
CREATE (from)-[:CONNECTS {ticket_id: ID(ticket)}]->(to)
);
The FOREACH clause uses a somewhat roundabout technique to only CREATE the relationship if the OPTIONAL MATCH succeeded for a line.

Related

LOAD CSV: mulitple MERGE and Eager operator

I have a large CSV file that contains mulitple nodes per line. I would like to use LOAD CSV to MERGE the nodes and set some properties. However, I always get the "Eager operator" warning for this query:
USING PERIODIC COMMIT
LOAD CSV FROM 'file:///MRCONSO.RRF' AS line FIELDTERMINATOR '|'
MERGE (c:Concept {cui: line[0]})
ON CREATE SET c.language = line[1]
MERGE (l:LexicalForm {lui: line[3]})
ON CREATE SET l.status = line[2];
When I remove the ON CREATE part it works but I want to merge on a specific ID, not on the combination of the ID and the other properties.
Is it possible to rephrase this somehow to avoid the Eager operator? I would like to create 6 different nodes from a single line and the alternative would be to iterate the file 6 times.
You have an RRF file and load CSV requires a CSV line. When you reference a field in the csv you need to include the data type and your tag (line). You also need the periodic commit number of rows per iteration. Your file needs to be in the import folder of your database.
For example:
USING PERIODIC COMMIT 5000
LOAD CSV FROM 'file:///MRCONSO.RRF' AS line FIELDTERMINATOR '|'
MERGE (c:Concept {cui: toInteger(linecui),language: toString(line.language})

cypher - load multiple csv files

I have many csv files with names 0_0.csv , 0_1.csv , 0_2.csv , ... , 1_0.csv , 1_1.csv , ... , z_17.csv.
I wanted to know how can I import them in a loop or something ?
Also I wanted to know am I doing it good ? ( each file is 50MB and whole files size is about 100GB )
This is my code :
create index on :name(v)
create index on :value(v)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///0_0.txt" AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
You could handle multiple files by constructing a file name. Unfortunately this seems to break when using the USING PERIODIC COMMIT query hint so it won't be a good option for you. You could create a script to wrap it up and send the commands to bin/cypher-shell though.
UNWIND ['0','1','z'] as outer
UNWIND range(0,17) as inner
LOAD CSV WITH HEADERS FROM 'file:///'+ outer +'_' + toString(inner) + '.csv' AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
As far as your actual load query goes. Do you name and value nodes come up multiple times in the files? If they are unique, you would be better off loading the the data in multiple passes. Load the nodes first without the indexes; then add the indexes once the nodes are loaded; and then do the relationships as the last step.
Using CREATE for the :kind relationship will result in multiple relationships even if it is the same value for csv.kind. You might want to use MERGE instead if that is the case.
For 100 GB of data though if you are starting with an empty database and are looking for speed, I would take a look at using bin/neo4j-admin import.

Neo4J Create Relationships in Cypher returns no changes, no rows

I have a CSV dataset through which I'm trying to build relationships between two node types(Comment and Person) that already exist in my database.
This is the database information -
This is the csv file of the current relationship comment_hasCreator_person that I'm trying to build -
The problem is - no matter which Cypher query I try, all of them returns the same thing - "no changes, no rows".
Here are the different variations of the query I've tried -
This is the first query -
// comment_hasCreator_person_0_0.csv
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://dl.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv" AS line
MATCH (comment:Comment { id: toInt(line.Comment.id)}),(person:Person { id: toInt(line.Person.id)})
CREATE (comment)-[:hasCreator]->(person)
I assumed this might have not worked because my CSV headers were initially named Comment.id and Person.id. So I removed the . and tried out the query, with the same result -
// comment_hasCreator_person_0_0.csv
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://dl.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv" AS line
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
CREATE (comment)-[:hasCreator]->(person)
When that didn't work, I followed this answer and tried using MERGE instead of CREATE, even though it shouldn't make a difference because the relationships didn't exist in the first place -
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://www.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv?dl=0" AS line
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
MERGE (comment)-[r:hasCreator]->(person)
RETURN comment,r, person
This query just returned "no rows".
I also tried a variation of the query where I didn't use the toInt() function, but that didn't make any difference.
To ensure the nodes exist, I selected random cell values from the CSV file and used a MATCH clause to ensure the corresponding Comment and Person nodes exist in the database, and I did find all the nodes.
As the last step, I decided to create a relationship manually between the first row values from the CSV file -
MATCH (c:Comment{id:1236950581249}), (p:Person{id:10995116284808})
CREATE (c)-[r:hasCreator]->(p)
RETURN c,r,p
and this worked just fine -
I'm totally clueless as to why the relationships won't get created when I import it from the CSV file. I would appreciate any help.
You have a problem in yout CSV file. The field terminator character used in it is "|" and not the default ",". You can edit your CSV file and chenge the field terminator character to "," or use the option FIELDTERMINATOR available in the LOAD CSV.
Try editing your query to something like this:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "https://www.dropbox.com/s/qb4occggixmaz9g/comment_hasCreator_person_0_0.csv?dl=0" AS line
FIELDTERMINATOR '|'
MATCH (comment:Comment { id: toInt(line.Commentid)}),(person:Person { id: toInt(line.Personid)})
MERGE (comment)-[r:hasCreator]->(person)
RETURN comment,r, person
You are missing the field terminator here as it is | in your case, instead of ;.
You can try this out:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "filename" AS LINE FIELDTERMINATOR '|'
MERGE (comment:Comment { id: toInt(LINE.Commentid)})
MERGE (person:Person { id: toInt(line.Personid)})
MERGE (comment) - [r:has_creator] -> (person)
RETURN comment,r,person
Another reason for this kind of error may be white spaces in CSV file. If line in CSV looks like:
2a9b40bc-78f0-4e79-9b2b-441108883448, Pink node - 2, 2, pink
then index 1 for results will be: ' Pink node - 2' (notice space at beginning), not: 'Pink node - 2'. Editing csv files or using trim() function would be the solution here:
...
WHERE a.id = trim(line[0]) AND b.id = trim(line[1])
...

Uploading CSV in neo4j

I am trying to upload the following csv (https://www.dropbox.com/s/95j774tg13qsdxr/out.csv?dl=0) file in to neo4j by following command
LOAD CSV WITH HEADERS FROM
"file:/home/pavan637/Neo4jDemo/out.csv"
AS csvimport
match (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
merge (Prokaryotes_Proteins: Prokaryotes_Proteins{UniprotID: csvimport.DBUni, ProteinID: csvimport.ProteinID, IdentityPercentage: csvimport.IdentityPercentage, AlignedLength:csvimport.al, Mismatches:csvimport.mm, QueryStart:csvimport.qs, QueryEnd: csvimport.qe, SubjectStrat: csvimport.ss, SubjectEnd: csvimport.se, Evalue: csvimport.evalue, BitScore: csvimport.bs})
merge (uniprotid)-[:BlastResults]->(Prokaryotes_Proteins)
I used "match" command in the LOAD CSV command in order to match with the "Uniprot_ID's" of previously loaded CSV.
I have first loaded ReactomeDB.csv (https://www.dropbox.com/s/9e5m1629p3pi3m5/Reactomesample.csv?dl=0) with the following cypher
LOAD CSV WITH HEADERS FROM
"file:/home/pavan637/Neo4jDemo/Reactomesample.csv"
AS csvimport
merge (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
merge (reactionname: ReactionName{ReactionName: csvimport.ReactionName, ReactomeID: csvimport.ReactomeID})
merge (uniprotid)-[:ReactionInformation]->(reactionname)
into neo4j which was successful.
Later on I am uploading out.csv
From both the CSV files, Uniprot_ID columns are present and some of those ID's are same. Though some of the Uniprot_ID are common, neo4j is not returning any rows.
Any solutions
Thanks in Advance
Pavan Kumar Alluri
Just a few tips:
only use ONE label and ONE property for MERGE
set the others with ON CREATE SET ...
try to create nodes and rels separately, otherwise you might get into memory issues
you should be consistent with your spelling and upper/lowercase of properties and labels, otherwise you will spent hours in debugging (labels, rel-types and property-names are case-sensitive)
you probably don't need merge for relationships, create should do fine
for your statement:
CREATE CONSTRAINT ON (up:UniprotID) assert pp.Uniprotid is unique;
CREATE CONSTRAINT ON (pp:Prokaryotes_Proteins) assert pp.UniprotID is unique;
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/pavan637/Neo4jDemo/out.csv" AS csvimport
merge (pp: Prokaryotes_Proteins {UniprotID: csvimport.DBUni})
ON CREATE SET pp.ProteinID=csvimport.ProteinID,
pp.IdentityPercentage=csvimport.IdentityPercentage, ...
;
LOAD CSV WITH HEADERS FROM "file:/home/pavan637/Neo4jDemo/out.csv" AS csvimport
match (uniprotid:UniprotID{Uniprotid: csvimport.Uniprot_ID})
match (pp: Prokaryotes_Proteins {UniprotID: csvimport.DBUni})
merge (uniprotid)-[:BlastResults]->(Prokaryotes_Proteins);

CSV LOAD and updating existing nodes / creating new ones

I might be on the wrong track so I could use some helpful input. I receive data from other systems by CSV files which I can import into my DB with CSV LOAD. So far so good.
I stucked when I need to reload the CSV again to follow up updates. I cannot delet the former data as I might have additional user input already attached so I would need a query that imports the CSV data, makes a match and when it finds the node it will just use SET to override the existing properties. Saying that I am unsure how to catch the cases where there is no node in the DB (new record) and we need to create a node.
LOAD CSV FROM "file:xxx.csv" AS csvLine
MATCH (c:Customer {code:"ABC"})
SET c.name = name: csvLine[0]
***OPTIONAL MATCH // Here I am unsure how to express when the node is not found***
MERGE (c:Customer { name: csvLine[0], code: csvLine[1]})
So ideally Cypher would check if the node is there and make an UPDATE by SET the new property coming with the CSV or - if the node cannot be found - creates a new one with the CSV data.
And - as a sidenote: How would I find nodes that are not in the CSV file but in the DB in order to mark them as obsolete? (This might not be able in the import but maybe someone has an idea how to solve this in order to keep the DB clean of deleted records - which can only be detected by a comparison with the latest CSV import - happy for every idea).
Any idea or hint how to write the query for updaten the graph while importing?
You need to use MERGEs ON MATCH and/or ON CREATE handlers, see http://neo4j.com/docs/stable/query-merge.html#_use_on_create_and_on_match. I assume the customer code in the second column is the identifier - so the name in column one might change on updates:
LOAD CSV FROM "file:xxx.csv" AS csvLine
MERGE (c:Customer {code:csvLine[1]})
ON CREATE SET c.name = csvLine[0]
ON MATCH SET c.name = csvLine[0]