Read CSV to Neo4j creating one node per column and relations - csv

I am stuck in the command in Neo4j (I am a newbie) to create a database based on a CSV like this:
Country,Name1,Name2,Name3,Influence
France,John,Pete,Josh,2
Italy,Pete,Bepe,Juan,3
USA,Josh,Juan,Pete,1
Spain,Juan,John,,2
When I try to create one node per person (NameX) setting the relationship between names columns adding the tags of Influence and Country,It fails because there are empty names.
How can achive this?
Thanks
UPDATE:
LOAD CSV WITH HEADERS FROM 'file:///diag.csv' AS row FIELDTERMINATOR ';'
MERGE (c:Country{name:row.Country})
WITH CASE row.name1 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name1] END as
name1List ,c
WITH CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as
name2List ,c
WITH CASE row.name3 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name3] END as
name3List ,c
FOREACH (x IN name1List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
FOREACH (x IN name2List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
FOREACH (x IN name3List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
RETURN SUM(1)
Getting error:
Variable row not defined (line 4, column 11 (offset: 209))
"WITH CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as >name2List ,c"

The last line has empty Name3 field. Try adding Name3 to the last line in your data set.
Spain,Juan,John, {empty - fill this},2

LOAD CSV WITH HEADERS FROM 'file:///diag.csv' AS row FIELDTERMINATOR ';' MERGE (c:Country{name:row.Country})
WITH CASE row.name1 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name1] END as name1List ,
CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as name2List ,
CASE row.name3 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name3] END as name3List ,c,row
FOREACH (x IN name1List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) )
FOREACH (x IN name2List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) )
FOREACH (x IN name3List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) ) RETURN SUM(1)
here, with the help of 'case' expression of cypher , we are creating either empty list when its null or empty or list with one value i.e (row.name3) .After case check , we can use this list to iterate and create node with property name3 . so , when its null or empty , you iterate zero times, so you wont get the error .
Finally, sum(1) will give you number of rows you processed .so, you can cross check if you have processed all the rows in the csv file or not

Related

Adding frequency counter between nodes in neo4j during csv import

I've got a csv file with ManufacturerPartNumbers and Manufacturers. Both values can potentially be duplicated across rows one or more times. Meaning I could have ManufacturerParnNumber,Manufactuerere: A|X , A|Y, A|Y, B|X, C,X
In this case, I'd like to create ManufacturerPartNumber nodes (A), (B), (C) and Manufacturer nodes (X), (Y)
I also want to create relationships of
(A)-[MADE_BY]->(X)
(A)-[MADE_BY]->(Y)
And I also want to apply a weighting value in the relationship between A -> Y since it appears twice in my dataset, so that I know that there's a more frequent relationship between A|Y than there is between A|X.
Is there a more efficient way of doing this? I'm dealing with 10M rows of csv data and it is crashing during import.
:param UploadFile => 'http://localhost:11001/project-f64568ab-67b6-4560-ae89-8aea882892b0/file.csv';
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
//create nodes
MERGE (mfgr:Manufacturer {name: COALESCE(trim(toUpper(csvLine.Manufacturer)),'NULL')})
MERGE (mpn:MPN {name: COALESCE(trim(toUpper(csvLine.MPN)),'NULL')})
//set relationships
MERGE (mfgr)-[a:MAKES]->(mpn)
SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1, refid: (CASE WHEN a.refid is NULL THEN csvLine.id ELSE a.refid + ' ~ ' + csvLine.id END)}
;
Separating the node creation from the relationships creation and then setting the values helped a bit.
Ultimately what had the most impact was that I spun up an AuraDB at max size and then imported all of the data, followed by resizing it back down. Probably not an ideal way to handle it, but it worked better than all the other optimization and only cost me a few bucks!
//QUERY ONE: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
;
//QUERY TWO: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (var2)-[a:RELATES_TO]->(var1) SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1}
;
//QUERY THREE: handle descriptors
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
UNWIND split(trim(toUpper(csvLine.Descriptor)), ' ') AS DescriptionSep1 UNWIND split(trim(toUpper(DescriptionSep1)), ',') AS DescriptionSep2 UNWIND split(trim(toUpper(DescriptionSep2)), '|') AS DescriptionSep3 UNWIND split(trim(toUpper(DescriptionSep3)), ';') AS DescriptionSep4
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (descriptor:Descriptor {name: COALESCE(trim(toUpper(DescriptionSep4)),'NULL')})
SET descriptor += {appearances: (CASE WHEN descriptor.appearances is NULL THEN 0 ELSE descriptor.appearances END) + 1}
MERGE (descriptor)-[d:DESCRIBES]->(var1)
SET d += {appearances: (CASE WHEN d.appearances is NULL THEN 0 ELSE d.appearances END) + 1}
;

How to fix timeless execution in cypher query - Neo4j Graph Database?

I'm dealing with the import of Common Weakness Enumeration Catalog (.json file) to the Neo4j Graph Database, using cypher language query and the apoc library. Although i import properly the fields: Weaknesses, Views, External_References, i have an execution problem (without any error) with the import of the field: Categories which is executing without ending. Below i present the structure of .json file and my cypher code.
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
Cypher Query
After several tests i think that the logic error is between the last 2 parts [with value....(catRef)], without them, the query executes pretty good, at normal time. I've also changed a setting param. at the db configuration file due to an error (cypher.lenient_create_relationship = true). And i tested the different import sequence with the same bad results (weakness, categories, views, ext. references etc.)
call apoc.load.json(files) yield value
unwind value.Weakness_Catalog.Weaknesses.Weakness as weakness
merge (i:GeneralInfo_CWE {Name:value.Weakness_Catalog.Name, Version:value.Weakness_Catalog.Version,
Date:value.Weakness_Catalog.Date, Schema:'https://cwe.mitre.org/data/xsd/cwe_schema_v6.4.xsd'})
merge(w:CWE {Name:'CWE-' + weakness.ID})
set w.Extended_Name=weakness.Name, w.Abstraction=weakness.Abstraction,
w.Structure=weakness.Structure, w.Status=weakness.Status, w.Description=weakness.Description,
w.Extended_Description= apoc.convert.toString(weakness.Extended_Description),
w.Likelihood_Of_Exploit=weakness.Likelihood_Of_Exploit,
w.Background_Details=apoc.convert.toString(weakness.Background_Details.Background_Detail),
w.Modes_Of_Introduction=[value in weakness.Modes_Of_Introduction.Introduction | value.Phase],
w.Submission_Date=weakness.Content_History.Submission.Submission_Date,
w.Submission_Name=weakness.Content_History.Submission.Submission_Name,
w.Submission_Organization=weakness.Content_History.Submission.Submission_Organization,
w.Modifications=[value in weakness.Content_History.Modification | apoc.convert.toString(value)],
w.Alternate_Terms=apoc.convert.toString(weakness.Alternate_Terms),
w.Notes=[value in weakness.Notes.Note | apoc.convert.toString(value)],
w.Affected_Resources=[value in weakness.Affected_Resources.Affected_Resource | value],
w.Functional_Areas=[value in weakness.Functional_Areas.Functional_Area | value]
merge (w)-[:belongsTo]->(i)
with w, weakness, value
unwind weakness.Related_Weaknesses.Related_Weakness as Rel_Weakness
match (cwe:CWE) where cwe.Name='CWE-' + Rel_Weakness.CWE_ID
merge (w)-[:Related_Weakness{Nature:Rel_Weakness.Nature}]->(cwe)
with w, weakness, value
unwind weakness.Applicable_Platforms as appPl
foreach (lg in appPl.Language |
merge(ap:Applicable_Platform{Type:'Language', Prevalence:lg.Prevalence,
Name:coalesce(lg.Name, 'NOT SET'), Class:coalesce(lg.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (tch in appPl.Technology |
merge(ap:Applicable_Platform{Type:'Technology', Prevalence:tch.Prevalence,
Name:coalesce(tch.Name, 'NOT SET'), Class:coalesce(tch.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (arc in appPl.Architecture |
merge(ap:Applicable_Platform{Type:'Architecture', Prevalence:arc.Prevalence,
Name:coalesce(arc.Name, 'NOT SET'), Class:coalesce(arc.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (os in appPl.Operating_System |
merge(ap:Applicable_Platform{Type:'Operating System', Prevalence:os.Prevalence,
Name:coalesce(os.Name, 'NOT SET'), Class:coalesce(os.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value
foreach (example in weakness.Demonstrative_Examples.Demonstrative_Example |
merge(ex:Demonstrative_Example {Intro_Text:apoc.convert.toString(example.Intro_Text)})
set ex.Body_Text=[value in example.Body_Text | apoc.convert.toString(value)],
ex.Example_Code=[value in example.Example_Code | apoc.convert.toString(value)]
merge (w)-[:hasExample]->(ex))
with w, weakness, value
foreach (consequence in weakness.Common_Consequences.Consequence |
merge (con:Consequence{CWE:w.Name, Scope:[value in consequence.Scope | value]})
set con.Impact=[value in consequence.Impact | value],
con.Note=consequence.Note, con.Likelihood=consequence.Likelihood
merge(w)-[:hasConsequence]->(con))
with w, weakness, value
foreach (dec in weakness.Detection_Methods.Detection_Method |
merge(d:Detection_Method {Method:dec.Method})
merge(w)-[wd:canBeDetected{Description:apoc.convert.toString(dec.Description)}]->(d)
set wd.Effectiveness=dec.Effectiveness, wd.Effectiveness_Notes=dec.Effectiveness_Notes,
wd.Detection_Method_ID=dec.Detection_Method_ID)
with w, weakness, value
foreach (mit in weakness.Potential_Mitigations.Mitigation |
merge(m:Mitigation {Description:apoc.convert.toString(mit.Description)})
set m.Phase=[value in mit.Phase | value], m.Strategy=mit.Strategy,
m.Effectiveness=mit.Effectiveness, m.Effectiveness_Notes=mit.Effectiveness_Notes,
m.Mitigation_ID=mit.Mitigation_ID
merge(w)-[:hasMitigation]->(m))
with w, weakness, value
foreach (rap in weakness.Related_Attack_Patterns.Related_Attack_Pattern |
merge(cp:CAPEC {Name:rap.CAPEC_ID})
merge(w)-[:RelatedAttackPattern]->(cp))
with w, weakness, value
foreach (reference in value.Weakness_Catalog.External_References.External_Reference |
merge(r:External_Reference{Reference_ID:reference.Reference_ID})
set r.Author=[value in reference.Author | value], r.Title=reference.Title,
r.Edition=reference.Edition, r.URL=reference.URL,
r.Publication_Year=reference.Publication_Year, r.Publisher=reference.Publisher)
with w, weakness, value
unwind weakness.References.Reference as exReference
match (ref:External_Reference) where ref.Reference_ID=exReference.External_Reference_ID
merge(w)-[:hasExternal_Reference]->(ref)
with value
unwind value.Weakness_Catalog.Views.View as view
merge (v:CWE_VIEW{ViewID:view.ID})
set v.Name=view.Name, v.Type=view.Type, v.Status=view.Status,
v.Objective=apoc.convert.toString(view.Objective), v.Filter=view.Filter,
v.Notes=apoc.convert.toString(view.Notes),
v.Submission_Name=view.Content_History.Submission.Submission_Name,
v.Submission_Date=view.Content_History.Submission.Submission_Date,
v.Submission_Organization=view.Content_History.Submission.Submission_Organization,
v.Modification=[value in view.Content_History.Modification | apoc.convert.toString(value)]
foreach (value in view.Audience.Stakeholder |
merge (st:Stakeholder{Type:value.Type})
merge (v)-[rel:usefulFor]->(st)
set rel.Description=value.Description)
with v, view, value
unwind (case view.Members.Has_Member when [] then [null] else view.Members.Has_Member end) as members
optional match (MemberWeak:CWE{Name:'CWE-' + members.CWE_ID})
merge (v)-[:hasMember{ViewID:members.View_ID}]->(MemberWeak)
with v, view, value
unwind (case view.References.Reference when [] then [null] else view.References.Reference end) as viewExReference
optional match (viewRef:External_Reference{Reference_ID:viewExReference.External_Reference_ID})
merge (v)-[:hasExternal_Reference{ViewID:v.ViewID}]->(viewRef)
with value
unwind value.Weakness_Catalog.Categories.Category as category
merge (c:CWE_Category{CategoryID:category.ID})
set c.Name=category.Name, c.Status=category.Status, c.Summary=apoc.convert.toString(category.Summary),
c.Notes=apoc.convert.toString(category.Notes), c.Submission_Name=category.Content_History.Submission.Submission_Name,
c.Submission_Date=category.Content_History.Submission.Submission_Date,
c.Submission_Organization=category.Content_History.Submission.Submission_Organization,
c.Modification=[value in category.Content_History.Modification | apoc.convert.toString(value)]
with c, category
unwind (case category.References.Reference when [] then [null] else category.References.Reference end) as categoryExReference
optional match (catRef:External_Reference{Reference_ID:categoryExReference.External_Reference_ID})
merge (c)-[:hasExternal_Reference{CategoryID:c.CategoryID}]->(catRef)
So, the problem was that every time i use with, i'm working in nested loops. The more nested loops, the slower the query will be. A good way to speed up, is to create simplier queries when it's possible.
For example in the json file:
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
i will execute one query for Weaknesses, one for Categories, one for Views and one for External_References.

Comapring the rows of same dataframe

A B C
1 2 3
4 2 3
1 2 3
I want to compare row1 with row2 and row2 with row3 so on rown with row1.
If they are same I want to print it is as "same" or else "different" in another data frame
output for above table:
A B C
Different same same
Different same same
same same same
For the below code I'm getting
True or false as the output. I want to replace that with Different and same.
compare = t(combn(nrow(Data.matrix),2,FUN=function(x)we2009[x[1],]==Data.matrix[x[2],]))
rownames(compare) = combn(nrow(Data.matrix),2,FUN=function(x)paste0("seq",x[1],"_seq",x[2]))
View(compare)
there are plenty of options how to do that ,
since you add MySQL tag the easiest way is to do that with sql in case you have limited number of columns , you can also use package SQL in r
library(sqldf)
sqldf('select
case when a=b then 'same' else 'different' as a
case when b=c then 'same' else 'different' as b
case when c=a then 'same' else 'different' as c
from my_dataset'
Does this deliver the desired results?
data_test = data.frame(A = c(1,4,1), B = c(2,2,2), C = c(3,3,3))
# create shifted helper-columns
data_test_help = cbind(data_test, data_test[c(2:NROW(data_test), 1),])
# apply comparision on each row
t(apply(data_test_help,1, function(f) f[1:3] == f[4:6]))
# for same, different notation instead of true false
t(apply(data_test_help,1, function(f) ifelse(f[1:3] == f[4:6], "same", "Different")))

creating multiple labels with csv

I am trying to load a csv file to create nodes and labels. Is there a way to I add more than one label at the same time? (I am using neo4j 2.1.1)
this is my csv:
1,Test1,hardkey,button
2,Test2,touch,button
3,Test3,,screen
I tried this:
LOAD CSV FROM 'file:/Users/Claudia/Documents/nodes.csv' AS csvLine
FOREACH (n IN (CASE WHEN csvLine[2]='hardkey' THEN [1] ELSE[] END) |
MERGE (p:hardkey {name: csvLine[1]})
)
FOREACH (n IN (CASE WHEN csvLine[2]='touch' THEN [1] ELSE[] END) |
MERGE (p:touch {name: csvLine[1]})
)
This works, but how do I get the other column ("button" and "screen") included?
Thanks a lot.
Like this?
See the MERGE documentation.
LOAD CSV FROM 'file:/Users/Claudia/Documents/nodes.csv' AS csvLine
FOREACH (n IN (CASE WHEN csvLine[2]='hardkey' THEN [1] ELSE[] END) |
MERGE (p:hardkey {name: csvLine[1]}) ON CREATE SET p.what = csvLine[3]
)
FOREACH (n IN (CASE WHEN csvLine[2]='touch' THEN [1] ELSE[] END) |
MERGE (p:touch {name: csvLine[1]}) ON CREATE SET p.what = csvLine[3]
)

How do I sum up properties of a JSON object in coffescript?

I have an object that looks like this one:
object =
title : 'an object'
properties :
attribute1 :
random_number: 2
attribute_values:
a: 10
b: 'irrelevant'
attribute2 :
random_number: 4
attribute_values:
a: 15
b: 'irrelevant'
some_random_stuff: 'random stuff'
I want to extract the sum of the 'a' values on attribute1 and attribute2.
What would be the best way to do this in Coffeescript?
(I have already found one way to do it but that just looks like Java-translated-to-coffee and I was hoping for a more elegant solution.)
Here is what I came up with (edited to be more generic based on comment):
sum_attributes = (x) =>
sum = 0
for name, value of object.properties
sum += value.attribute_values[x]
sum
alert sum_attributes('a') # 25
alert sum_attributes('b') # 0irrelevantirrelevant
So, that does what you want... but it probably doesn't do exactly what you want with strings.
You might want to pass in the accumulator seed, like sum_attributes 0, 'a' and sum_attributes '', 'b'
Brian's answer is good. But if you wanted to bring in a functional programming library like Underscore.js, you could write a more succinct version:
sum = (arr) -> _.reduce arr, ((memo, num) -> memo + num), 0
sum _.pluck(object.properties, 'a')
total = (attr.attribute_values.a for key, attr of obj.properties).reduce (a,b) -> a+b
or
sum = (arr) -> arr.reduce((a, b) -> a+b)
total = sum (attr.attribute_values.a for k, attr of obj.properties)