Create nodes from CSV in Neo4j - csv

I have a csv file and I want to make 2 nodes with relation (node country-reported_on->node report_date). I have tried this code but it returns empty nodes with numbers instead of country name.
Here is what my dataset looks like:
PEOPLE_POSITIVE_CASES_COUNT;REPORT_DATE;COUNTRY_SHORT_NAME;PEOPLE_DEATH_COUNT;LIFE_EXPECTANCY;GDP;DENSITY_POPULATION;WORKFORCE
0;22.01.2020;Lesotho;0;54.836;875.353432963926;70.5616600790514
134;09.07.2020;Lesotho;1;54.836;875.353432963926;70.5616600790514
79557;02.03.2021;Zambia;1104;64.194;985.132436038869;94.4781600309238
106470;02.03.2021;Kenya;1863;66.991;1878.58070251348;94.4781600309238
Here is the code that I used:
LOAD CSV WITH HEADERS FROM "file:///dataset.csv"
as row WITH row WHERE row.COUNTRY_SHORT_NAME IS NOT NULL
MERGE (c:Country {name: row.COUNTRY_SHORT_NAME,
life_exp: row.LIFE_EXPECTANCY,
gdp: row.GDP,
density_population: row.DENSITY_POPULATION,
worforce: row.WORKFORCE } )
MERGE ( d:Report_date { date: row.REPORT_DATE } )
MERGE (c)-[:reported_on {cases_count: row.PEOPLE_POSITIVE_CASES_COUNT,
death_count: row.PEOPLE_DEATH_COUNT}]->(d)
EDIT
I changed the delimiter to ';' because that is what we had in our dataset however we still get bad results here is how it looks like in neo4j after running this code:
LOAD CSV WITH HEADERS FROM "file:///dataset.csv"
as row FIELDTERMINATOR ';' WITH row WHERE row.COUNTRY_SHORT_NAME IS NOT NULL
MERGE (c:Country {name: row.COUNTRY_SHORT_NAME,
life_exp: row.LIFE_EXPECTANCY,
gdp: row.GDP,
density_population: row.DENSITY_POPULATION,
worforce: row.WORKFORCE } )
MERGE ( d:Report_date { date: row.REPORT_DATE } )
MERGE (c)-[:reported_on {cases_count: row.PEOPLE_POSITIVE_CASES_COUNT,
death_count: row.PEOPLE_DEATH_COUNT}]->(d)

I think you got confused with node caption in Neo4j browser, all nodes get assigned an node id by default and nodes must be showing that. You can change it to country name property by clicking on node label. Screen shot for reference.

Related

Is there a way to import a csv into Neo4j using foreach or unwind?

I am using the following .csv file for Neo4j import. There are 202 rackets. The numbers below racketX are the rating the user has given that racket.
I want to create the relationships among the users and the rating they have given to each racket. This is my current approach:
LOAD CSV WITH HEADERS FROM 'http://spreding.online/racket-recommendation-system/data/formattedFiles/formattedUsers.csv' AS row
WITH row
WHERE row.username IS NOT NULL
MERGE (u:User {
username: row.username,
height_m: toInteger(row.height),
weight_kg: toInteger(row.weight)
})
WITH row, u, range(3, 204) as indexes
MATCH (r:Racket)
UNWIND r as racket
UNWIND indexes as i
MERGE (u)-[:RATES {rating:toInteger(row[i])}]->(racket)
I get a "cannot access a map" error. Can you help me?
I would break down the load into multiple steps.
Load the users.
LOAD CSV WITH HEADERS FROM 'http://spreding.online/racket-recommendation-system/data/formattedFiles/formattedUsers.csv' AS row
WITH row
WHERE row.username IS NOT NULL
MERGE (u:User {
username: row.username,
height_m: toInteger(row.height),
weight_kg: toInteger(row.weight)
})
Load the rackets.
UNWIND RANGE(1,202) as idx
CREATE (:Racket {racketNumber:"racket"+idx})
Load the relationships.
LOAD CSV WITH HEADERS FROM 'http://spreding.online/racket-recommendation-system/data/formattedFiles/formattedUsers.csv' AS row
UNWIND RANGE (1,202) as idx
MATCH (u:User {username:row.username})
MATCH (r:Racket {racketNumber:"racket"+idx})
MERGE (u)-[:RATES {rating:toInteger(row["racket"+idx])}]->(r)

Loading CSV Neo4j "Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1'"

I am using grades.csv data from the link below,
https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html
I noticed that all the strings in the csv file were in "" and it causes
error messages:
Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1
so I removed the "" in the headers
the code I was trying to run:
LOAD CSV WITH HEADERS FROM 'file:///grades.csv' AS row
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
error message:
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Any, Map, Node, Relationship, Point, Duration, Date, Time, LocalTime, LocalDateTime or DateTime but was List<String> (line 2, column 24 (offset: 65))
"MERGE (t:Test1 {Test1: row.Test1})
Basically you can not merge node using null property value. In your case, Test1 must be null for one or more lines in your file. If you don't see blank values for Test1, please check is there is any blank line at the end of file.
You can also handle null check before MERGE using WHERE, like
LOAD CSV ...
WHERE row.Test1 IS NOT NULL
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
The issues are:
The file is missing a comma after the Test1 value in the row for "Airpump".
The file has white spaces between the values in each row. (Search for the regexp ", +" and replace with ",".)
Your query should work after fixing the above issues.

Import big csv file partially into Neo4j

A big csv file (ca. 7 GB), which is my result of some pairwise computation between rows of objects 1 and objects 2 using 3 different methods looks like this:
obj 1 , obj 2 , method1 , method2 , method3
obj1 & obj2 are strings and method1,method2,method3 are float values.
I don't want to import the whole csv into Neo4j, but when values of method1 is above certain threshold, I want that specific row be imported and between object1 and object2 of that row an edge to be defined and the same for method2 and method3, what I mean is, that when values of method2 is above certain threshold, I want that specific row be imported and between object1 and object2 of that row an edge to be defined. thanks #cybersam and #Dave Bennett, who wroted me here original queries, I changed them a bit and run 3 following queries seperately. After that when i write a for example simple query like:
match (n)-[r:similar_on_method2]-(m) return n,r,m
I get not only demanded relation but other relations included in result graph, I dont know, what is wrong ?!
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method1) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method1]->(obj2)
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method2) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method2]->(obj2)
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method3) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method3]->(obj2)
I guess how big is big? But something along these lines should get you started. If the obj1 and obj2 values repeat in your data or already exist in your database, you will want to create some indexes on the obj1 and obj2 values.
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toInteger(line.method1) >= $x
AND toInteger(line.method2) >= $y
AND toInteger(line.method3) >= $z
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:LINK]->(obj2)

What is good way to import a directory/file structure in Neo4j from CSV file?

I am looking to import a lot of filenames into a graph database, using Neo4j. The data is from an external source and available in CSV file. I'd like to create a tree structure from the data, so that I can easily 'navigate' the structure in queries later on (i.e. find all files underneath a certain directory, all file that occur in multiple directories etc.).
So, given the example input:
/foo/bar/example.txt
/bar/baz/another.csv
/example.txt
/foo/bar/onemore.txt
I'd like the create the following graph:
( / ) <-[:in]- ( foo ) <-[:in]- ( bar ) <-[:in]- ( example.txt )
<-[:in]- ( onemore.txt )
<-[:in]- ( bar ) <-[:in]- ( baz ) <-[:in]- ( another.csv )
<-[:in]- ( example.txt )
(where each node label is actually an attribute, e.g. path:).
I've been able to achieve the desired effect when using a fixed number of directory levels; for example when each file is at three levels deep, I could create a CSV file with 4 columns:
dir_a,dir_b,dir_c,file
foo,bar,baz,example.txt
foo,bar,ban,example.csv
foo,bar,baz,another.txt
And import it using a cypher query:
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS row
MERGE (dir_a:Path {name: row.dir_a})
MERGE (dir_b:Path {name: row.dir_b}) <-[:in]- (dir_a)
MERGE (dir_c:Path {name: row.dir_c}) <-[:in]- (dir_b)
MERGE (:Path {name: row.file}) <-[:in]- (dir_c)
But I'd like to have a general solution that works for any level of sub-directories (and combinations of levels in one dataset). Note that I am able to pre-process my input if necessary, so I can create any desirable structure in the input CSV file.
I've looked at gists or plugins, but cannot seem to find anything that works. I think/hope that I should be able to do something with the split() function, i.e. use split('/',row.path) to get a list of path elements, but I do not know how to process this list into a chain of MERGE operations.
Here is a first cut at something more generalized.
The premise is that you can split the fully qualified path into components and then use each component of it to split the path so you can struct the fully qualified path for each individual component of the larger path. Use this as the key to merge items on and set the individual component after they are merged. In the case that something is not the root level then find the parent of an individual component and create the relationship back to it. This will break down if there are duplicate component names in a fully qualified path.
First, i started by creating a uniqueness constraint on fq_path
create constraint on (c:Component) assert c.fq_path is unique;
Here is the load statement.
load csv from 'file:///path.csv' as line
with line[0] as line, split(line[0],'/') as path_components
unwind range(0, size(path_components)-1) as idx
with case
when idx = 0 then '/'
else
path_components[idx]
end as component
, case
when idx = 0 then '/'
else
split(line, path_components[idx])[0] + path_components[idx]
end as fq_path
, case
when idx = 0 then
null
when idx = 1 then
'/'
else
substring(split(line, path_components[idx])[0],0,size(split(line, path_components[idx])[0])-1)
end as parent
, case
when idx = 0 then
[]
else
[1]
end as find_parent
merge (new_comp:Component {fq_path: fq_path})
set new_comp.name = component
foreach ( y in find_parent |
merge (theparent:Component {fq_path: parent} )
merge (theparent)<-[:IN]-(new_comp)
)
return *
If you want to differentiate between files and folders here are a few queries you can run afterwards to set another label on the respective nodes.
Find the files and set them as File
// find the last Components in a tree (no inbound IN)
// and set them as Files
match (c:Component)
where not (c)<-[:IN]-(:Component)
set c:File
return c
Find the folders and set them as Folder
// find all Components with an inbound IN
// and set them as Folders
match (c:Component)
where (c)<-[:IN]-(:Component)
set c:Folder
return c

Relationship that connect the same node in Neo4j

I will try to be very succinct with my problem. I have the node Person that I loaded using a .csv file and I have another .csv file to be loaded - person_speaks_language_0.csv
(got this header: idPerson|languagePSL )
How can I relate this? How can I create this relationship?
Grabbing another example, that is very similar to the previous one, and that I can't solve. I have the Comment node loaded in Neo4j an I need to load another .csv file, that file is - comment_replyOf_comment_0.csv
(got his header: idComment|idComment)
How can I load this file? How can I connect a relation that goes "in and out" from the same node - that connects the same node?
For the first example. there is 2 options.
If you want Language to be a separate node, try this cypher:
LOAD CSV FROM 'person_speaks_language_0.csv' AS line
MATCH (p:Person)
WHERE p.id=line[0]
MERGE (p)-[r:Speaks]->(l:Language { name: line[1])})
RETURN p, l, r
Or, probably, better option
LOAD CSV FROM 'person_speaks_language_0.csv' AS line
MERGE (p:Person { id:line[0] })-[r:Speaks]->(l:Language { name: line[1]) })
RETURN p, l, r
If you want Language to be a property, try this:
LOAD CSV FROM 'person_speaks_language_0.csv' AS line
MERGE (p { id:line[0], language:line[1] })
RETURN p
The RETURN statement is optional and you don't want to include it for a big csv files (although it could be useful for debug).
For the second example, try this:
LOAD CSV FROM 'comment_replyOf_comment_0.csv' AS line
MERGE (c1:Comment { id:line[0] })-[r:Commented]->(c2:Comment { id:line[1]) })
RETURN c1, r, c2