Having trouble creating relationships from csv import in neo4j - csv

I have used a command like this to successfully create named nodes from csv:
load csv with headers from "file:/Users/lwyglend/Developer/flavourGroups.csv" as
flavourGroup
create (fg {name: flavourGroup.flavourGroup})
set fg:flavourGroup
return fg
However I am not having any luck using load from csv to create relationships with a similar command:
load csv with headers from "file:/Users/lwyglend/Developer/flavoursByGroup.csv" as
relationship
match (flavour {name: relationship.flavour}),
(flavourGroup {name: relationship.flavourGroup})
create flavour-[:BELONGS_TO]->flavourGroup
From a headed csv file that looks a bit like this:
flavour, flavourGroup
fish, marine
caviar, marine
There are no errors, the command seems to execute, but no relationships are actually created.
If I do a simple match on name: fish and name: marine and then construct the belongs to relationship between the fish and marine pre-existing nodes with cypher, the relationship sets up fine.
Is there a problem with importing from csv? Is my code wrong somehow? I have played around with a few different things but as a total newb to neo4j would appreciate any advice you have.

Wiggle,
I don't know for sure if this is your problem, but I discovered that if you have spaces after your commas in your CSV file (as you show in your example), they appear to be included as part of the field names and field contents. When I made a CSV file like the one you showed and tried to load it, I found that it failed. When I took out the spaces, I found that it succeeded.
As a test, try this query:
LOAD FROM CSV WITH HEADERS FROM "file:/Users/lwyglend/Developer/flavoursByGroup.csv" AS line
RETURN line.flavourGroup
then try this query:
LOAD FROM CSV WITH HEADERS FROM "file:/Users/lwyglend/Developer/flavoursByGroup.csv" AS line
RETURN line.` flavourGroup`
Grace and peace,
Jim

I'm a bit late in answering your question, but I don't think the spaces alone are the culprit. In your example cypher there is no association to the actual nodes in your database, only the csv alias named "relationship".
Try something along this line instead:
load csv with headers from "file:/Users/lwyglend/Developer/flavoursByGroup.csv" as
relationship
match (f:flavour), (fg:flavourGroup)
where f.name = relationship.flavour and
fg.name = relationship.flavourGroup
create (f)-[:BELONGS_TO]->(fg)

Related

trying to csv load relationships in NEO4J

I'm trying to csv load relationships. My nodes represent 80 priests and 200 churches. I am trying to do this - which works:
MATCH (p:Priest{name: "Baranowski, Alexander Sylvester" }),(c:Church{name: "St Wenceslaus"})
MERGE (p)-[:POSTED {posting:'1955-61', zip: '60618'}]->(c)
but with 800 rels.
My csv sheet has priests listed perhaps 10x and so need to connect to 10 different churches.
My rels are years and zip codes. Nothing I have read and tried has worked. Ideas?
Thanks for your help.
you can try this.
put your CSV into the import folder of your neo4j instance.
load csv with headers from "file:///postings.csv" as row
MERGE (p:Priest{name: row.priest })
MERGE (c:Church{name: row.church })
MERGE (p)-[:POSTED {posting:row.posting, zip: row.zip}]->(c)
I assume posting is always present in the data.
load csv with headers from "file:///postings.csv" as row
MERGE (p:Priest{name: row.priest })
MERGE (c:Church{name: row.church })
MERGE (p)-[rel:POSTED{posting:row.posting}]->(c)
On Create set rel.zip=row.zip

Difficulties creating CSV table in Google BigQuery

I'm having some difficulties creating a table in Google BigQuery using CSV data that we download from another system.
The goal is to have a bucket in the Google Cloud Platform that we will upload a 1 CSV file per month. This CSV files have around 3,000 - 10,000 rows of data, depending on the month.
The error I am getting from the job history in the Big Query API is:
Error while reading data, error message: CSV table encountered too
many errors, giving up. Rows: 2949; errors: 1. Please look into the
errors[] collection for more details.
When I am uploading the CSV files, I am selecting the following:
file format: csv
table type: native table
auto detect: tried automatic and manual
partitioning: no partitioning
write preference: WRITE_EMPTY (cannot change this)
number of errors allowed: 0
ignore unknown values: unchecked
field delimiter: comma
header rows to skip: 1 (also tried 0 and manually deleting the header rows from the csv files).
Any help would be greatly appreciated.
This usually points to the error in the structure of data source (in this case your CSV file). Since your CSV file is small, you can run a little validation script to see that the number of columns is exactly the same across all your rows in the CSV, before running the export.
Maybe something like:
cat myfile.csv | awk -F, '{ a[NF]++ } END { for (n in a) print n, "rows have",a[n],"columns" }'
Or, you can bind it to the condition (lets say if your number of columns should be 5):
ncols=$(cat myfile.csv | awk -F, 'x=0;{ a[NF]++ } END { for (n in a){print a[n]; x++; if (x==1){break}}}'); if [ $ncols==5 ]; then python myexportscript.py; else echo "number of columns invalid: ", $ncols; fi;
It's impossible to point out the error without seeing an example CSV file, but it's very likely that your file is incorrectly formatted. As a result, one typo confuses BQ into thinking there are thousands. Let's say you have the following csv file:
Sally Whittaker,2018,McCarren House,312,3.75
Belinda Jameson 2017,Cushing House,148,3.52 //Missing a comma after the name
Jeff Smith,2018,Prescott House,17-D,3.20
Sandy Allen,2019,Oliver House,108,3.48
With the following schema:
Name(String) Class(Int64) Dorm(String) Room(String) GPA(Float64)
Since the schema is missing a comma, everything is shifted one column over. If you have a large file, it results in thousands of errors as it attempts to inserts Strings into Ints/Floats.
I suggest you run your csv file through a csv validator before uploading it to BQ. It might find something that breaks it. It's even possible that one of your fields has a comma inside the value which breaks everything.
Another theory to investigate is to make sure that all required columns receive an appropriate (non-null) value. A common cause of this error is if you cast data incorrectly which returns a null value for a specific field in every row.
As mentioned by Scicrazed, this issue seems to be generated as some file rows has an incorrect format, in which case it is required to validate the content data in order to figure out the specific error that is leading this issue.
I recommend you to check the errors[] collection that might contains additional information about the aspects that can be making to fail the process. You can do this by using the Jobs: get method that returns detailed information about your BigQuery Job or refer to the additionalErrors field of the JobStatus Stackdriver logs that contains the same complete error data that is reported by the service.
I'm probably too late for this, but it seems the file has some errors (it can be a character that cannot be parsed or just a string in an int column) and BigQuery cannot upload it automatically.
You need to understand what the error is and fix it somehow. An easy way to do it is by running this command on the terminal:
bq --format=prettyjson show -j <JobID>
and you will be able to see additional logs for the error to help you understand the problem.
If the error happens only a few times you just can increase the number of errors allowed.
If it happens many times you will need to manipulate your CSV file before you upload it.
Hope it helps

Problems to load .csv files generated in Social Network Benchmark to Neo4j

I've got problems in my work with Neo4j, and if you please to help, I will thank you a lot!
My work is something like this. I´ve got to study and evaluate several graph databases, and to do that I must use a benchmark. The benchmark that I'm used is the Social Network Benchmark (SNB)
I generate files with different setups all accordingly to the setup chosen. Something similar to this: forum_0.csv
This .csv files got certain headers, like this: id | title | creationDate | etc...
The next step in my project is to load them to Neo4j, build a database to test them with certain query’s, and my problems start here at this point.
I have loaded some files to Neo4j but others don't because of errors and I don't understand why.
I'm using this code to load those files. In this example I load the forum.csv to Neo4j.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM ".../forum_0.csv" AS csvLine
FIELDTERMINATOR "|"
CREATE (:FORUM_0 {id:csvLine.id, title:csvLine.title, creationDate:csvLine.creationDate})
And with this code, the data from this file is loaded to Neo4j correctly.
But with this file - forum_containerOf_post_0.csv I can´t load the data correctly with this header - Forum.id | Post.id.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM ".../forum_containerOf_post_0.csv" AS csvLine
FIELDTERMINATOR "|"
CREATE (:FCOP_0 {Forum.id:csvLine.Forum.id, Post.id:csvLine.Post.id})
The problem in here is I can´t access the id of forum_0.csv in the load process of forum_containerOf_post_0.csv. How can I access to that id, or another property? Is the lack of some Cypher code?
Is there something wrong in the process? Is there someone here that work with this - SNB and Neo4j?
Is there someone here to help me in this problem?
I tried to explain my problem but if you have questions about my problem, feel free to ask.
Thank you for your time
The problem is with the headers in the second file. If you want to embed periods . in the header column names you need to back tick the columns when you reference them in the load csv statement.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM ".../forum_containerOf_post_0.csv" AS csvLine
FIELDTERMINATOR "|"
CREATE (:FCOP_0 {Forum.id:csvLine.`Forum.id`, Post.id:csvLine.`Post.id`})
Yeah you got right in your answer, but with a little correction
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM ".../forum_containerOf_post_0.csv" AS csvLine
FIELDTERMINATOR "|"
CREATE (:FCOP_0 {`Forum.id`:csvLine.Forum.id, `Post.id`:csvLine.Post.id})
But I discover other problem. This creates me the FCOP_0 node label but without the properties that the forum_containerOf_post_0.csv have. The two properties are Forum.id and Post.id, but with this process the properties are not loaded to the respectives nodes...it creates the FCOP_O node label in Neo4j but their nodes don't have properties, those two properties.
Can you please help me?

How to Convert a Comma-Separated File to Pipe-Delimited File in Powershell?

I am downloading CSV files which are comma-separated. The problem i'm having is that the commas are screwing-up my import into a database table (SQL Server). For example, I have a header row called hotel_name, but some of the names are like the following:
HOTEL_NAME
hilton
cambridge,the
The problem is that fields containing a comma in the hotel name will move to the adjacent column, like this I'm wondering if converting from CSV to a pipe-delimited format will work.
The problem i'm having is that i'm not sure how to get started. I've tried following the Powershell documentation but get basic errors. I think this is because i'm new to Powershell and not understanding something. Can someone please post a script of how to change the comma-separated file to a pipe-delimited file?
Sorry if this is confusing, i'm finding the formatting on StackOverflow to be a bit crazy.
Taken from Dealing with commas in a CSV file
Use " to wrap data that contains a comma.
For example
Server000,"Microsoft(R) Windows(R) Server 2003, Enterprise Edition"

CSV LOAD and updating existing nodes / creating new ones

I might be on the wrong track so I could use some helpful input. I receive data from other systems by CSV files which I can import into my DB with CSV LOAD. So far so good.
I stucked when I need to reload the CSV again to follow up updates. I cannot delet the former data as I might have additional user input already attached so I would need a query that imports the CSV data, makes a match and when it finds the node it will just use SET to override the existing properties. Saying that I am unsure how to catch the cases where there is no node in the DB (new record) and we need to create a node.
LOAD CSV FROM "file:xxx.csv" AS csvLine
MATCH (c:Customer {code:"ABC"})
SET c.name = name: csvLine[0]
***OPTIONAL MATCH // Here I am unsure how to express when the node is not found***
MERGE (c:Customer { name: csvLine[0], code: csvLine[1]})
So ideally Cypher would check if the node is there and make an UPDATE by SET the new property coming with the CSV or - if the node cannot be found - creates a new one with the CSV data.
And - as a sidenote: How would I find nodes that are not in the CSV file but in the DB in order to mark them as obsolete? (This might not be able in the import but maybe someone has an idea how to solve this in order to keep the DB clean of deleted records - which can only be detected by a comparison with the latest CSV import - happy for every idea).
Any idea or hint how to write the query for updaten the graph while importing?
You need to use MERGEs ON MATCH and/or ON CREATE handlers, see http://neo4j.com/docs/stable/query-merge.html#_use_on_create_and_on_match. I assume the customer code in the second column is the identifier - so the name in column one might change on updates:
LOAD CSV FROM "file:xxx.csv" AS csvLine
MERGE (c:Customer {code:csvLine[1]})
ON CREATE SET c.name = csvLine[0]
ON MATCH SET c.name = csvLine[0]