Neo4j CSV file load with empty cells - csv

I am loading a basic CSV file into Neo4j database which has got two columns - "name" and "property". The name column always has a value and "property" column can either have a value or a blank space. I would like to values to be linked with a relationship "property1".
I am using this code:
LOAD CSV WITH HEADERS FROM 'file:///fileName.csv' AS line
MERGE (Test_Document:A {name: line.name})
WITH line, Test_Document
FOREACH (x IN CASE WHEN line.property IS NULL THEN [] ELSE [1] END |
MERGE (Properties:B {property1: line.property})
WITH Test_Document, Properties
FOREACH (y IN CASE WHEN Properties IS NULL THEN [] ELSE [1] END |
MERGE (Test_Document)-[:property1]->(Properties))
I am getting an error message:
Unexpected end of input: expected whitespace, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, CALL, RETURN or ')' (line 8, column 54 (offset: 423))
" MERGE (Test_Document)-[:property1]->(Properties))"
Any help would be appreciated.

There are two problems with your query:
Missing a closing paren on line 5
Properties is not in scope for the second FOREACH since it is declared in the previous FOREACH (aliases declared within a FOREACH are only scoped to within that FOREACH clause)
Try this:
LOAD CSV WITH HEADERS FROM 'file:///fileName.csv' AS line
MERGE (Test_Document:A {name: line.name})
WITH line, Test_Document
FOREACH (x IN CASE WHEN line.property IS NULL THEN [] ELSE [1] END |
MERGE (Properties:B {property1: line.property})
MERGE (Test_Document)-[:property1]->(Properties)
)

Another approach would use WHERE to create relationships only when there are not with missing values as:
LOAD CSV WITH HEADERS FROM 'file:///fileName.csv' AS line
WITH line, line.name AS Name, line.property AS Property
MERGE (Test_Document:A {name: Name})
WITH Property
WHERE Property <> ""
MERGE (Properties:B {property1: Property})
MERGE (Test_Document)-[:property1]->(Properties)
This creates the link and the B node only when the property field is not null.

Related

Python 3 psycopg2 COPY from stdin failed: error in .read()

I am trying to apply the code found on this page, in particular part 'Copy Data from String Iterator' of the Table of Contents, but run into an issue with my code.
Since not all lines coming from the generator (here log_lines) can be imported into the PostgreSQL database, I try to filter the correct lines (here row) using itertools.filterfalse like in the codeblock below:
def copy_string_iterator(connection, log_lines) -> None:
with connection.cursor() as cursor:
create_staging_table(cursor)
log_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (
row['date'],
row['time'],
row['cs_uri_query'],
row['s_contentpath'],
row['sc_status'],
row['s_computername'],
...
row['sc_substates'],
row['s_port'],
row['cs_version'],
row['c_protocol'],
row.update({'cs_cookie':'x'}),
row['timetakenms'],
row['cs_uri_stem'],
))) + '\n')
for row in filterfalse(lambda line: "#" in line.get('date'), log_lines)
)
cursor.copy_from(log_string_iterator, 'log_table', sep = '|')
When I run this, cursor.copy_from() gives me the following error:
QueryCanceled: COPY from stdin failed: error in .read() call
CONTEXT: COPY log_table, line 112910
I understand why this error happens, it is because in the test file I use there are only 112909 lines that meet the filterfalse condition. But why does it try to copy line 112910 and throw the error and not just stop?
Since Python doesn't have a coalescing operator, add something like:
(map(clean_csv_value, (
row['date'] if 'date' in row else None,
:
row['cs_uri_stem'] if 'cs_uri_stem' in row else None,
))) + '\n')
for each of your fields so you can handle any missing fields in the JSON file. Of course the fields should be nullable in the db if you use None otherwise replace with None with some default value for that field.

Julia - Rewriting a CSV

Complete Julia newbie here.
I'd like to do some processing on a CSV. Something along the lines of:
using CSV
in_file = CSV.Source('/dir/in.csv')
out_file = CSV.Sink('/dir/out.csv')
for line in CSV.eachline(in_file)
replace!(line, "None", "")
CSV.writeline(out_file, line)
end
This is in pseudocode, those aren't existing functions.
Idiomatically, should I iterate on 1:CSV.countlines(in_file)? Do a while and check something?
If all you want to do is replace a string in the line, you do not need any CSV parsing utilities. All you do is read the file line by line, replace, and write. So:
infile = "/path/to/input.csv"
outfile = "/path/to/output.csv"
out = open(outfile, "w+")
for line in readlines(infile)
newline = replace(line, "a", "b")
write(out, newline)
end
close(out)
This will replicate the pseudocode you have in your question.
If you need to parse and read the csv field by field, use the readcsv function in base.
data=readcsv(infile)
typeof(data) #Array{Any,2}
This will return the data in the file as a 2 dimensional array. You can process this data any way you want, and write it back using the writecsv function.
for i in 1:size(data,1) #iterate by rows
data[i, 1] = "This is " * data[i, 1] # Add text to first column
end
writecsv(outfile, data)
Documentation for these functions:
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.readcsv
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.writecsv

neo4j: How to load CSV using conditional correctly?

What I am trying to do is to import a dataset with a tree data structure inside from CSV to neo4j. Nodes are stored along with their parent node and depth level (max 6) in the tree. So I try to check depth level using CASE and then add a node to its parent like this (creating a node just for 1st level so far for testing purpose):
export FILEPATH=file:///Example.csv
CREATE CONSTRAINT ON (n:Node) ASSERT n.id IS UNIQUE;
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line,
line.`Level` AS level,
line.`ParentCodeID_Cal` AS parentCode,
line.`CodeSet` AS codeSet,
line.`Category` AS nodeCategory,
line.`Type` AS nodeType,
line.`L1code` AS l1Code, line.`L1Description` AS l1Description, line.`L1Name` AS l1Name, line.`L1NameAb` AS l1NameAb,
line.`L2code` AS l2Code, line.`L2Description` AS l2Description, line.`L2Name` AS l2Name, line.`L2NameAb` AS l2NameAb,
line.`L3code` AS l3Code, line.`L3Description` AS l3Description, line.`L3Name` AS l3Name, line.`L3NameAb` AS l3NameAb,
line.`L1code` AS l4Code, line.`L4Description` AS l4Description, line.`L4Name` AS l4Name, line.`L4NameAb` AS l4NameAb,
line.`L1code` AS l5Code, line.`L5Description` AS l5Description, line.`L5Name` AS l5Name, line.`L5NameAb` AS l5NameAb,
line.`L1code` AS l6Code, line.`L6Description` AS l6Description, line.`L6Name` AS l6Name, line.`L6NameAb` AS l6NameAb,
codeSet + parentCode AS nodeId
CASE line.`Level`
WHEN '1' THEN CREATE (n0:Node{id:nodeId, description:l1Description, name:l1Name, nameAb:l1NameAb, category:nodeCategory, type:nodeType})
ELSE
END;
But I get this result:
WARNING: Invalid input 'S': expected 'l/L' (line 17, column 3 (offset:
982)) "CASE level " ^
I'm aware there is a mistake at syntax.
I'm using neo4j 3.0.4 & Windows 10 (using neo4j shell running it with D:\Program Files\Neo4j CE 3.0.4\bin>java -classpath neo4j-desktop-3.0.4.jar org.neo4j.shell.StartClient).
You have several syntax errors. For example, a CASE clause cannot contain a CREATE clause.
In any case, you should be able to greatly simplify your Cypher. For example, this might suit your needs:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS
FROM {FILEPATH} AS line
WITH DISTINCT line, ('l' + line.Level) AS prefix
CREATE (:Node{
id: line.CodeSet + line.ParentCodeID_Cal,
description: line[prefix + 'Description'],
name: line[prefix + 'Name'],
nameAb: line[prefix + 'NameAb'],
category: line.Category,
type: line.Type})

Loading column from CSV file as a list assigned to a variable

given is a function f(a,b,x,y) in gnuplot, where we got a 3D-space with x,y,z (using splot).
Also given is a csv file (without any header) of the following structure:
2 4
1 9
6 7
...
Is there a way to read out all the values of the first column and assign them to the variable a? Implicitly it should create something like:
a = [2,1,6]
b = [4,9,7]
The idea is to plot the function f(a,b,x,y) having iterated for all a,b tuples.
I've read through other posts where I hoped it would be related to it such as e.g. Reading dataset value into a gnuplot variable (start of X series). However I could not make any progres.
Is there a way to go through all rows of a csv file with two columns, using the each column value of a row as the parameter of a function?
Say you have the following data file called data:
1 4
2 5
3 6
You can load the 1st and 2nd column values to variables a and b easily using an awk system call (you can also do this using plot preprocessing with gnuplot but it's more complicated that way):
a=system("awk '{print $1}' data")
b=system("awk '{print $2}' data")
f(a,b,x,y)=a*x+b*y # Example function
set yrange [-1:1]
set xrange [-1:1]
splot for [i in a] for [j in b] f(i,j,x,y)
This is a gnuplot-only solution without the need for a system call:
a=""
b=""
splot "data" u (a=sprintf(" %s %f", a, $1), b=sprintf(" %s %f", b, \
$2)):(1/0):(1/0) not, for [i in a] for [j in b] f(i,j,x,y)

Parse txt file with shell

I have a txt file containing the output from several commands executed on a networking equipment. I wanted to parse this txt file so i can sort and print on an HTML page.
What is the best/easiest way to do this? Export every command to an array and then print array with sort on the HTML code?
Commands are between lines and they're tabular data. example:
*********************************************************************
# command 1
*********************************************************************
Object column1 column2 Total
-------------------------------------------------------------------
object 1 526 9484 10010
object 2 2 10008 10010
Object 3 0 20000 20000
*********************************************************************
# command 2
*********************************************************************
(... tabular data ...)
Can someone suggest any code or file where see how to make this work?
Thanks!
This can be easily done in Python with this example code:
f = open('input.txt')
rulers = 0
table = []
for line in f.readlines():
if '****' in line:
rulers += 1
if rulers == 2:
table = []
elif rulers > 2:
print(table)
rulers = 0
continue
if line == '\n' or '----' in line or line.startswith('#'):
continue
table.append(line.split())
print(table)
It just prints list of lists of the tabular values. But it can be formatted to whatever HTML or another format you need.
Import into your spreadsheet software. Export to HTML from there, and modify as needed.