creating multiple labels with csv - csv

I am trying to load a csv file to create nodes and labels. Is there a way to I add more than one label at the same time? (I am using neo4j 2.1.1)
this is my csv:
1,Test1,hardkey,button
2,Test2,touch,button
3,Test3,,screen
I tried this:
LOAD CSV FROM 'file:/Users/Claudia/Documents/nodes.csv' AS csvLine
FOREACH (n IN (CASE WHEN csvLine[2]='hardkey' THEN [1] ELSE[] END) |
MERGE (p:hardkey {name: csvLine[1]})
)
FOREACH (n IN (CASE WHEN csvLine[2]='touch' THEN [1] ELSE[] END) |
MERGE (p:touch {name: csvLine[1]})
)
This works, but how do I get the other column ("button" and "screen") included?
Thanks a lot.

Like this?
See the MERGE documentation.
LOAD CSV FROM 'file:/Users/Claudia/Documents/nodes.csv' AS csvLine
FOREACH (n IN (CASE WHEN csvLine[2]='hardkey' THEN [1] ELSE[] END) |
MERGE (p:hardkey {name: csvLine[1]}) ON CREATE SET p.what = csvLine[3]
)
FOREACH (n IN (CASE WHEN csvLine[2]='touch' THEN [1] ELSE[] END) |
MERGE (p:touch {name: csvLine[1]}) ON CREATE SET p.what = csvLine[3]
)

Related

Loading CSV in Neo4j is time consuming

I want load a CDR csv file with 648000 records to neo4j (4.4.10), But it is about 4 days and And it is not yet completed.
My CSV have 648000 records with 7 columns. and the size of file is about 48 MB.
My computer have 100 GB RAM and intel Zeon E5 CPU.
the columns of CSV are:
OP_Name
TP_Name
Called_Number
OP_ANI
Setup_Time
Duration
OP_Price
the code that I use to load CSV in Neo4j is:
```Cypher
:auto load csv with headers from 'file:///cdr.csv' as line FIELDTERMINATOR ','
with line
where line['Called_Number'] is not null and line['OP_ANI'] is not null
with line['OP_ANI'] as OP_Phone,
(CASE line['OP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS OP_country,
line['Called_Number'] as Called_Phone,
(CASE line['TP_Name']
WHEN 'TIC' THEN 'IRAN'
ELSE 'Foreign' END) AS TP_country,
line['Setup_Time'] as Setup_Time,
line['Duration'] as Duration,
line['OP_Price'] as OP_Price
call {
with OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
MERGE (c:Customer{phone: toInteger(Called_Phone)})
on create set c.country = TP_country
WITH c, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price
CALL apoc.create.addLabels( c, [ c.country ] ) YIELD node
MERGE (c2:Customer{phone: toInteger(OP_Phone)})
on create set c2.country = OP_country
WITH c2, OP_Phone, OP_country, Called_Phone, TP_country, Setup_Time, Duration, OP_Price, c
CALL apoc.create.addLabels( c2, [ c2.country ] ) YIELD node
MERGE (c2)-[r:CALLED{setupTime: Setup_Time,
duration: Duration,
OP_Price: OP_Price}]->(c)
} IN TRANSACTIONS
```
How can I speed up the load operation?
MERGE acts as an upsert in Neo4j. So the statement:
MERGE (c:Customer{phone: toInteger(Called_Phone)})
checks if there is a Customer node with the given phone number is there or not. If it is, it performs the update otherwise creates the node. When there is a large number of nodes, this lookup can be very slow, and CSV import will be slow overall. Creating an index on the phone property of Customer should do the trick. You can create the index like this:
CREATE INDEX phone IF NOT EXISTS FOR (n:Customer) ON (n.phone)

Adding frequency counter between nodes in neo4j during csv import

I've got a csv file with ManufacturerPartNumbers and Manufacturers. Both values can potentially be duplicated across rows one or more times. Meaning I could have ManufacturerParnNumber,Manufactuerere: A|X , A|Y, A|Y, B|X, C,X
In this case, I'd like to create ManufacturerPartNumber nodes (A), (B), (C) and Manufacturer nodes (X), (Y)
I also want to create relationships of
(A)-[MADE_BY]->(X)
(A)-[MADE_BY]->(Y)
And I also want to apply a weighting value in the relationship between A -> Y since it appears twice in my dataset, so that I know that there's a more frequent relationship between A|Y than there is between A|X.
Is there a more efficient way of doing this? I'm dealing with 10M rows of csv data and it is crashing during import.
:param UploadFile => 'http://localhost:11001/project-f64568ab-67b6-4560-ae89-8aea882892b0/file.csv';
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
//create nodes
MERGE (mfgr:Manufacturer {name: COALESCE(trim(toUpper(csvLine.Manufacturer)),'NULL')})
MERGE (mpn:MPN {name: COALESCE(trim(toUpper(csvLine.MPN)),'NULL')})
//set relationships
MERGE (mfgr)-[a:MAKES]->(mpn)
SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1, refid: (CASE WHEN a.refid is NULL THEN csvLine.id ELSE a.refid + ' ~ ' + csvLine.id END)}
;
Separating the node creation from the relationships creation and then setting the values helped a bit.
Ultimately what had the most impact was that I spun up an AuraDB at max size and then imported all of the data, followed by resizing it back down. Probably not an ideal way to handle it, but it worked better than all the other optimization and only cost me a few bucks!
//QUERY ONE: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
;
//QUERY TWO: var2 and var1 nodes
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (var2)-[a:RELATES_TO]->(var1) SET a += {appearances: (CASE WHEN a.appearances is NULL THEN 0 ELSE a.appearances END) + 1}
;
//QUERY THREE: handle descriptors
//open the CSV file
:auto USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM $UploadFile AS csvLine with csvLine where csvLine.id is not null
UNWIND split(trim(toUpper(csvLine.Descriptor)), ' ') AS DescriptionSep1 UNWIND split(trim(toUpper(DescriptionSep1)), ',') AS DescriptionSep2 UNWIND split(trim(toUpper(DescriptionSep2)), '|') AS DescriptionSep3 UNWIND split(trim(toUpper(DescriptionSep3)), ';') AS DescriptionSep4
MERGE (var2:VAR2 {name: COALESCE(trim(toUpper(csvLine.VAR2)),'NULL')})
MERGE (var1:VAR1 {name: COALESCE(trim(toUpper(csvLine.VAR1)),'NULL')})
MERGE (descriptor:Descriptor {name: COALESCE(trim(toUpper(DescriptionSep4)),'NULL')})
SET descriptor += {appearances: (CASE WHEN descriptor.appearances is NULL THEN 0 ELSE descriptor.appearances END) + 1}
MERGE (descriptor)-[d:DESCRIBES]->(var1)
SET d += {appearances: (CASE WHEN d.appearances is NULL THEN 0 ELSE d.appearances END) + 1}
;

How to fix timeless execution in cypher query - Neo4j Graph Database?

I'm dealing with the import of Common Weakness Enumeration Catalog (.json file) to the Neo4j Graph Database, using cypher language query and the apoc library. Although i import properly the fields: Weaknesses, Views, External_References, i have an execution problem (without any error) with the import of the field: Categories which is executing without ending. Below i present the structure of .json file and my cypher code.
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
Cypher Query
After several tests i think that the logic error is between the last 2 parts [with value....(catRef)], without them, the query executes pretty good, at normal time. I've also changed a setting param. at the db configuration file due to an error (cypher.lenient_create_relationship = true). And i tested the different import sequence with the same bad results (weakness, categories, views, ext. references etc.)
call apoc.load.json(files) yield value
unwind value.Weakness_Catalog.Weaknesses.Weakness as weakness
merge (i:GeneralInfo_CWE {Name:value.Weakness_Catalog.Name, Version:value.Weakness_Catalog.Version,
Date:value.Weakness_Catalog.Date, Schema:'https://cwe.mitre.org/data/xsd/cwe_schema_v6.4.xsd'})
merge(w:CWE {Name:'CWE-' + weakness.ID})
set w.Extended_Name=weakness.Name, w.Abstraction=weakness.Abstraction,
w.Structure=weakness.Structure, w.Status=weakness.Status, w.Description=weakness.Description,
w.Extended_Description= apoc.convert.toString(weakness.Extended_Description),
w.Likelihood_Of_Exploit=weakness.Likelihood_Of_Exploit,
w.Background_Details=apoc.convert.toString(weakness.Background_Details.Background_Detail),
w.Modes_Of_Introduction=[value in weakness.Modes_Of_Introduction.Introduction | value.Phase],
w.Submission_Date=weakness.Content_History.Submission.Submission_Date,
w.Submission_Name=weakness.Content_History.Submission.Submission_Name,
w.Submission_Organization=weakness.Content_History.Submission.Submission_Organization,
w.Modifications=[value in weakness.Content_History.Modification | apoc.convert.toString(value)],
w.Alternate_Terms=apoc.convert.toString(weakness.Alternate_Terms),
w.Notes=[value in weakness.Notes.Note | apoc.convert.toString(value)],
w.Affected_Resources=[value in weakness.Affected_Resources.Affected_Resource | value],
w.Functional_Areas=[value in weakness.Functional_Areas.Functional_Area | value]
merge (w)-[:belongsTo]->(i)
with w, weakness, value
unwind weakness.Related_Weaknesses.Related_Weakness as Rel_Weakness
match (cwe:CWE) where cwe.Name='CWE-' + Rel_Weakness.CWE_ID
merge (w)-[:Related_Weakness{Nature:Rel_Weakness.Nature}]->(cwe)
with w, weakness, value
unwind weakness.Applicable_Platforms as appPl
foreach (lg in appPl.Language |
merge(ap:Applicable_Platform{Type:'Language', Prevalence:lg.Prevalence,
Name:coalesce(lg.Name, 'NOT SET'), Class:coalesce(lg.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (tch in appPl.Technology |
merge(ap:Applicable_Platform{Type:'Technology', Prevalence:tch.Prevalence,
Name:coalesce(tch.Name, 'NOT SET'), Class:coalesce(tch.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (arc in appPl.Architecture |
merge(ap:Applicable_Platform{Type:'Architecture', Prevalence:arc.Prevalence,
Name:coalesce(arc.Name, 'NOT SET'), Class:coalesce(arc.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value, appPl
foreach (os in appPl.Operating_System |
merge(ap:Applicable_Platform{Type:'Operating System', Prevalence:os.Prevalence,
Name:coalesce(os.Name, 'NOT SET'), Class:coalesce(os.Class, 'NOT SET')})
merge(w)-[:Applicable_Platform]->(ap))
with w, weakness, value
foreach (example in weakness.Demonstrative_Examples.Demonstrative_Example |
merge(ex:Demonstrative_Example {Intro_Text:apoc.convert.toString(example.Intro_Text)})
set ex.Body_Text=[value in example.Body_Text | apoc.convert.toString(value)],
ex.Example_Code=[value in example.Example_Code | apoc.convert.toString(value)]
merge (w)-[:hasExample]->(ex))
with w, weakness, value
foreach (consequence in weakness.Common_Consequences.Consequence |
merge (con:Consequence{CWE:w.Name, Scope:[value in consequence.Scope | value]})
set con.Impact=[value in consequence.Impact | value],
con.Note=consequence.Note, con.Likelihood=consequence.Likelihood
merge(w)-[:hasConsequence]->(con))
with w, weakness, value
foreach (dec in weakness.Detection_Methods.Detection_Method |
merge(d:Detection_Method {Method:dec.Method})
merge(w)-[wd:canBeDetected{Description:apoc.convert.toString(dec.Description)}]->(d)
set wd.Effectiveness=dec.Effectiveness, wd.Effectiveness_Notes=dec.Effectiveness_Notes,
wd.Detection_Method_ID=dec.Detection_Method_ID)
with w, weakness, value
foreach (mit in weakness.Potential_Mitigations.Mitigation |
merge(m:Mitigation {Description:apoc.convert.toString(mit.Description)})
set m.Phase=[value in mit.Phase | value], m.Strategy=mit.Strategy,
m.Effectiveness=mit.Effectiveness, m.Effectiveness_Notes=mit.Effectiveness_Notes,
m.Mitigation_ID=mit.Mitigation_ID
merge(w)-[:hasMitigation]->(m))
with w, weakness, value
foreach (rap in weakness.Related_Attack_Patterns.Related_Attack_Pattern |
merge(cp:CAPEC {Name:rap.CAPEC_ID})
merge(w)-[:RelatedAttackPattern]->(cp))
with w, weakness, value
foreach (reference in value.Weakness_Catalog.External_References.External_Reference |
merge(r:External_Reference{Reference_ID:reference.Reference_ID})
set r.Author=[value in reference.Author | value], r.Title=reference.Title,
r.Edition=reference.Edition, r.URL=reference.URL,
r.Publication_Year=reference.Publication_Year, r.Publisher=reference.Publisher)
with w, weakness, value
unwind weakness.References.Reference as exReference
match (ref:External_Reference) where ref.Reference_ID=exReference.External_Reference_ID
merge(w)-[:hasExternal_Reference]->(ref)
with value
unwind value.Weakness_Catalog.Views.View as view
merge (v:CWE_VIEW{ViewID:view.ID})
set v.Name=view.Name, v.Type=view.Type, v.Status=view.Status,
v.Objective=apoc.convert.toString(view.Objective), v.Filter=view.Filter,
v.Notes=apoc.convert.toString(view.Notes),
v.Submission_Name=view.Content_History.Submission.Submission_Name,
v.Submission_Date=view.Content_History.Submission.Submission_Date,
v.Submission_Organization=view.Content_History.Submission.Submission_Organization,
v.Modification=[value in view.Content_History.Modification | apoc.convert.toString(value)]
foreach (value in view.Audience.Stakeholder |
merge (st:Stakeholder{Type:value.Type})
merge (v)-[rel:usefulFor]->(st)
set rel.Description=value.Description)
with v, view, value
unwind (case view.Members.Has_Member when [] then [null] else view.Members.Has_Member end) as members
optional match (MemberWeak:CWE{Name:'CWE-' + members.CWE_ID})
merge (v)-[:hasMember{ViewID:members.View_ID}]->(MemberWeak)
with v, view, value
unwind (case view.References.Reference when [] then [null] else view.References.Reference end) as viewExReference
optional match (viewRef:External_Reference{Reference_ID:viewExReference.External_Reference_ID})
merge (v)-[:hasExternal_Reference{ViewID:v.ViewID}]->(viewRef)
with value
unwind value.Weakness_Catalog.Categories.Category as category
merge (c:CWE_Category{CategoryID:category.ID})
set c.Name=category.Name, c.Status=category.Status, c.Summary=apoc.convert.toString(category.Summary),
c.Notes=apoc.convert.toString(category.Notes), c.Submission_Name=category.Content_History.Submission.Submission_Name,
c.Submission_Date=category.Content_History.Submission.Submission_Date,
c.Submission_Organization=category.Content_History.Submission.Submission_Organization,
c.Modification=[value in category.Content_History.Modification | apoc.convert.toString(value)]
with c, category
unwind (case category.References.Reference when [] then [null] else category.References.Reference end) as categoryExReference
optional match (catRef:External_Reference{Reference_ID:categoryExReference.External_Reference_ID})
merge (c)-[:hasExternal_Reference{CategoryID:c.CategoryID}]->(catRef)
So, the problem was that every time i use with, i'm working in nested loops. The more nested loops, the slower the query will be. A good way to speed up, is to create simplier queries when it's possible.
For example in the json file:
"Weakness_Catalog": {
"Weaknesses": {"Weakness":[...]}
"Categories": {"Category":[...]}
"Views": {"View":[...]}
"External_References": {"External_Reference":[...]}
}
i will execute one query for Weaknesses, one for Categories, one for Views and one for External_References.

Read CSV to Neo4j creating one node per column and relations

I am stuck in the command in Neo4j (I am a newbie) to create a database based on a CSV like this:
Country,Name1,Name2,Name3,Influence
France,John,Pete,Josh,2
Italy,Pete,Bepe,Juan,3
USA,Josh,Juan,Pete,1
Spain,Juan,John,,2
When I try to create one node per person (NameX) setting the relationship between names columns adding the tags of Influence and Country,It fails because there are empty names.
How can achive this?
Thanks
UPDATE:
LOAD CSV WITH HEADERS FROM 'file:///diag.csv' AS row FIELDTERMINATOR ';'
MERGE (c:Country{name:row.Country})
WITH CASE row.name1 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name1] END as
name1List ,c
WITH CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as
name2List ,c
WITH CASE row.name3 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name3] END as
name3List ,c
FOREACH (x IN name1List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
FOREACH (x IN name2List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
FOREACH (x IN name3List | MERGE (n:Node{name : x} )
MERGE (n)-[:REL_TYPE]->(c)
)
RETURN SUM(1)
Getting error:
Variable row not defined (line 4, column 11 (offset: 209))
"WITH CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as >name2List ,c"
The last line has empty Name3 field. Try adding Name3 to the last line in your data set.
Spain,Juan,John, {empty - fill this},2
LOAD CSV WITH HEADERS FROM 'file:///diag.csv' AS row FIELDTERMINATOR ';' MERGE (c:Country{name:row.Country})
WITH CASE row.name1 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name1] END as name1List ,
CASE row.name2 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name2] END as name2List ,
CASE row.name3 WHEN NULL THEN [] WHEN '' THEN [] ELSE [row.name3] END as name3List ,c,row
FOREACH (x IN name1List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) )
FOREACH (x IN name2List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) )
FOREACH (x IN name3List | MERGE (n:Node{name : x} ) MERGE (n)-[:REL_TYPE]->(c) ) RETURN SUM(1)
here, with the help of 'case' expression of cypher , we are creating either empty list when its null or empty or list with one value i.e (row.name3) .After case check , we can use this list to iterate and create node with property name3 . so , when its null or empty , you iterate zero times, so you wont get the error .
Finally, sum(1) will give you number of rows you processed .so, you can cross check if you have processed all the rows in the csv file or not

Scala Spark - For loop in Data Frame and compare date

I have a Data Frame which has 3 columns like this:
---------------------------------------------
| x(string) | date(date) | value(int) |
---------------------------------------------
I want to SELECT all the the rows [i] that satisfy all 4 conditions:
1) row [i] and row [i - 1] have the same value in column 'x'
AND
2) 'date' at row [i] == 'date' at row [i - 1] + 1 (two consecutive days)
AND
3) 'value' at row [i] > 5
AND
4) 'value' at row [i - 1] <= 5
I think maybe I need a For loop, but don't know how exactly! Please help me!
Every help is much appreciated!
It can be very easily done with Window functions, look at lag function:
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import sqlContext.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
// test data
val list = Seq(
("x", "2016-12-13", 1),
("x", "2016-12-14", 7)
);
val df = sc.parallelize(list).toDF("x", "date", "value");
// add lags - so read previous value from dataset
val withPrevs = df
.withColumn ("prevX", lag('x, 1).over(Window.orderBy($"date")))
.withColumn ("prevDate", lag('date, 1).over(Window.orderBy($"date")))
.withColumn ("prevValue", lag('value, 1).over(Window.orderBy($"date")))
// filter values and select only needed fields
withPrevs
.where('x === 'prevX)
.where('value > lit(5))
.where('prevValue < lit(5))
.where('date === date_add('prevDate, 1))
.select('x, 'date, 'value)
.show()
Note that without order, i.e. by date, this cannot be done. Dataset has none meaningful order, you must specify order explicity
If you have a DataFrame created, then all you need to do is to call a filter function on DataFrame will all your conditions.
For example:
df1.filter($"Column1" === 2 || $"Column2" === 3)
You can pass as many conditions as you want. It will return you a new DataFrame with filtered data.