Create nodes conditionally when loading nodes from csv in Neo4j - csv

I have a data set in csv format. One of the fields is "elem_type". Based on this type I need to create different types nodes and give to the "columns" of my csv a different name based on the "elem_type" when loading the data using csv load, any way to do that?
My csv has no header and the data look like this:
0, 123, Marco, Ciao
1, 345, Merceds, Car, Yellow
2, 987, Boat, 150cm
Based on the first colmuns that is my "elem_type" i want to load the data and define 3 types of nodes (Person, Car, Boat) and also based on the elem_type define different header

I highly recommend to pre-parse the csv file into separate files for each label. It will make the cypher for import so much easier. In the following I use a little trick by wrapping the CASE command inside a FOREACH:
load csv from "file:///test.csv" as line
foreach (i in case when line[0] = '0' then [1] else [] end |
merge (p:Person {id: line[1]}) set p.name = line[2] )
foreach (i in case when line[0] = '1' then [1] else [] end |
merge (c:Car {id: line[1]}) set c.name = line[2], c.color = line[4] )
foreach (i in case when line[0] = '2' then [1] else [] end |
merge (b:Boat {id: line[1]}) set b.name = line[2] )
Also, don't forget to add indexes on the properties you are merging on.

Related

How to check for specific field values based on some condition while converting csv file to json format

Below is the code to convert csv file to json format in python.
I have two fields 'recommendation' and 'rating'. Based on the recommendation value I need to set the value for rating field like if recommendation is 1 then rating =1 and vice versa. With the answer I got I'm getting output for only one record entry instead of getting all the records. I think it's overriding. Do I need to create separate list for that and append each record entry to the list to get the output for all records.
here's the updated code:
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
title = reader.fieldnames
for row in reader:
entry = OrderedDict()
for field in title:
entry[field] = row[field]
[c.update({'RATING': c['RECOMMENDATIONS']}) for c in reader]
csv_rows.append(entry)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
I want to create the nested format like the below:
"rating": {
"user_rating": {
"rating": 1
},
"recommended": {
"rating": 1
}
After you've read the file in, using the csv.DictReader, you'll have a list of dicts. Since you want to set the values now, it's a simple dict manipulation. There are several ways, of which one is:
[c.update({'rating': c['recommendation']}) for c in read_csvDictReader]
Hope that helps.

Import big csv file partially into Neo4j

A big csv file (ca. 7 GB), which is my result of some pairwise computation between rows of objects 1 and objects 2 using 3 different methods looks like this:
obj 1 , obj 2 , method1 , method2 , method3
obj1 & obj2 are strings and method1,method2,method3 are float values.
I don't want to import the whole csv into Neo4j, but when values of method1 is above certain threshold, I want that specific row be imported and between object1 and object2 of that row an edge to be defined and the same for method2 and method3, what I mean is, that when values of method2 is above certain threshold, I want that specific row be imported and between object1 and object2 of that row an edge to be defined. thanks #cybersam and #Dave Bennett, who wroted me here original queries, I changed them a bit and run 3 following queries seperately. After that when i write a for example simple query like:
match (n)-[r:similar_on_method2]-(m) return n,r,m
I get not only demanded relation but other relations included in result graph, I dont know, what is wrong ?!
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method1) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method1]->(obj2)
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method2) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method2]->(obj2)
Using periodic commit
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toFloat(line.method3) >= $x
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:similar_on_method3]->(obj2)
I guess how big is big? But something along these lines should get you started. If the obj1 and obj2 values repeat in your data or already exist in your database, you will want to create some indexes on the obj1 and obj2 values.
LOAD CSV WITH HEADERS
FROM 'file:///objects.csv'
AS line
WITH line
WHERE toInteger(line.method1) >= $x
AND toInteger(line.method2) >= $y
AND toInteger(line.method3) >= $z
MERGE (obj1:Object {name: line.obj1})
MERGE (obj2:Object {name: line.obj2})
MERGE (obj1)-[:LINK]->(obj2)

How to store a list of numbers in CSV file as integers in Neo4j

I have a CSV file with the following numbers for example,
example.csv contains(Field A1):
1, 2, 4, 5, 6, 7, 10
I tried to use the following Neo4j code
LOAD CSV FROM "file:///example.csv" AS line
CREATE (d:Data) SET d.date= timestamp()
SET d.data = line WITH d
MATCH (t:Timestamp {name: "Digital"})
CREATE (d)<-[:HAS]-(t)
So in d.data, a list of strings is stored but I want a list of integers. I tried to use the code below but it gives an error.
SET d.data = toInt(line)
Is it possible to store the CSV data above in d.data as a list of integers? Thanks in advance.
line is a list of values, so you can use a list comprehension to convert each of its elements to an integer. If there are spaces in the strings, use trim to remove them (ltrim might also work, but the performance gains are minor).
LOAD CSV FROM "file:///example.csv" AS line
CREATE (d:Data) SET d.date = timestamp()
SET d.data = [value IN line | toInteger(trim(value))]
WITH d
MATCH (t:Timestamp {name: "Digital"})
CREATE (d)<-[:HAS]-(t)
(Note that the line variable is only a list if you do not use the WITH HEADERS option.)
You could map a function to the list of string like so,
LOAD CSV FROM "file:///example.csv" AS line
CREATE (d:Data) SET d.date= timestamp()
SET d.data = extract(i in line | toInt(i)) WITH d
MATCH (t:Timestamp {name: "Digital"})
CREATE (d)<-[:HAS]-(t)
I'm not sure if cypher will automatically convert line to a list, but if it doesn't you can also try,
LOAD CSV FROM "file:///example.csv" AS line
CREATE (d:Data) SET d.date= timestamp()
SET d.data = extract(i in split(line,",") | toInt(i)) WITH d
MATCH (t:Timestamp {name: "Digital"})
CREATE (d)<-[:HAS]-(t)

What is good way to import a directory/file structure in Neo4j from CSV file?

I am looking to import a lot of filenames into a graph database, using Neo4j. The data is from an external source and available in CSV file. I'd like to create a tree structure from the data, so that I can easily 'navigate' the structure in queries later on (i.e. find all files underneath a certain directory, all file that occur in multiple directories etc.).
So, given the example input:
/foo/bar/example.txt
/bar/baz/another.csv
/example.txt
/foo/bar/onemore.txt
I'd like the create the following graph:
( / ) <-[:in]- ( foo ) <-[:in]- ( bar ) <-[:in]- ( example.txt )
<-[:in]- ( onemore.txt )
<-[:in]- ( bar ) <-[:in]- ( baz ) <-[:in]- ( another.csv )
<-[:in]- ( example.txt )
(where each node label is actually an attribute, e.g. path:).
I've been able to achieve the desired effect when using a fixed number of directory levels; for example when each file is at three levels deep, I could create a CSV file with 4 columns:
dir_a,dir_b,dir_c,file
foo,bar,baz,example.txt
foo,bar,ban,example.csv
foo,bar,baz,another.txt
And import it using a cypher query:
LOAD CSV WITH HEADERS FROM "file:///sample.csv" AS row
MERGE (dir_a:Path {name: row.dir_a})
MERGE (dir_b:Path {name: row.dir_b}) <-[:in]- (dir_a)
MERGE (dir_c:Path {name: row.dir_c}) <-[:in]- (dir_b)
MERGE (:Path {name: row.file}) <-[:in]- (dir_c)
But I'd like to have a general solution that works for any level of sub-directories (and combinations of levels in one dataset). Note that I am able to pre-process my input if necessary, so I can create any desirable structure in the input CSV file.
I've looked at gists or plugins, but cannot seem to find anything that works. I think/hope that I should be able to do something with the split() function, i.e. use split('/',row.path) to get a list of path elements, but I do not know how to process this list into a chain of MERGE operations.
Here is a first cut at something more generalized.
The premise is that you can split the fully qualified path into components and then use each component of it to split the path so you can struct the fully qualified path for each individual component of the larger path. Use this as the key to merge items on and set the individual component after they are merged. In the case that something is not the root level then find the parent of an individual component and create the relationship back to it. This will break down if there are duplicate component names in a fully qualified path.
First, i started by creating a uniqueness constraint on fq_path
create constraint on (c:Component) assert c.fq_path is unique;
Here is the load statement.
load csv from 'file:///path.csv' as line
with line[0] as line, split(line[0],'/') as path_components
unwind range(0, size(path_components)-1) as idx
with case
when idx = 0 then '/'
else
path_components[idx]
end as component
, case
when idx = 0 then '/'
else
split(line, path_components[idx])[0] + path_components[idx]
end as fq_path
, case
when idx = 0 then
null
when idx = 1 then
'/'
else
substring(split(line, path_components[idx])[0],0,size(split(line, path_components[idx])[0])-1)
end as parent
, case
when idx = 0 then
[]
else
[1]
end as find_parent
merge (new_comp:Component {fq_path: fq_path})
set new_comp.name = component
foreach ( y in find_parent |
merge (theparent:Component {fq_path: parent} )
merge (theparent)<-[:IN]-(new_comp)
)
return *
If you want to differentiate between files and folders here are a few queries you can run afterwards to set another label on the respective nodes.
Find the files and set them as File
// find the last Components in a tree (no inbound IN)
// and set them as Files
match (c:Component)
where not (c)<-[:IN]-(:Component)
set c:File
return c
Find the folders and set them as Folder
// find all Components with an inbound IN
// and set them as Folders
match (c:Component)
where (c)<-[:IN]-(:Component)
set c:Folder
return c

How to specify relationship type in CSV?

I have a CSV file with data like:
ID,Name,Role,Project
1,James,Owner,TST
2,Ed,Assistant,TST
3,Jack,Manager,TST
and want to create people whose relationships to the project are therein specified. I attempted to do it like this:
load csv from 'file:/../x.csv' as line
match (p:Project {code: line[3]})
create (n:Individual {name: line[1]})-[r:line[2]]->(p);
but it barfs with:
Invalid input '[': expected an identifier character, whitespace, '|',
a length specification, a property map or ']' (line 1, column 159
(offset: 158))
as it can't seem to dereference line in the relationship creation. if I hard-code that it works:
load csv from 'file:/../x.csv' as line
match (p:Project {code: line[3]})
create (n:Individual {name: line[1]})-[r:WORKSFOR]->(p);
so how do I make the reference?
Right now you can't as this is structural information.
Either use neo4j-import tool for that.
Or specify it manually as you did, or use this workaround:
load csv with headers from 'file:/../x.csv' as line
match (p:Project {code: line.Project})
create (n:Individual {name: lineName})
foreach (x in case line.Role when "Owner" then [1] else [] end |
create (n)-[r:Owner]->(p)
)
foreach (x in case line.Role when "Assistant" then [1] else [] end |
create (n)-[Assistant]->(p)
)
foreach (x in case line.Role when "Manager" then [1] else [] end |
create (n)-[r:Manager]->(p)
)
Michael's answer stands, however, I've found that what I can do is specify an attribute to the relationship, like this:
load csv from 'file:/.../x.csv' as line
match (p:Project {code: line[3]})
create (i:Individual {name: line[1]})-[r:Role { type: line[2] }]->(p)
and I can make Neo4j display the type attribute of the relationship instead of the label
this question is old, but there is a post by Mark Needham
that provide a great and easy solution using APOC
as follow
load csv with headers from "file:///people.csv" AS row
MERGE (p1:Person {name: row.node1})
MERGE (p2:Person {name: row.node2})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.relationship, {}, p2) YIELD rel
RETURN rel
note: the "YIELD rel" is essential and so for the return part