Neo4j CSV import datatype error - csv

I am trying to import genealogy data into Neo4j using a CSV file. The dates are strings such as 2012 or 19860105). However, when importing, Neo4j interprets them as LongValue, creating an error.
My import statement is either
LOAD CSV WITH HEADERS FROM 'file:///Neo4jPersonNodes1.csv' AS line FIELDTERMINATOR '|'
CREATE (:Person{RN: toInteger(line[0]),fullname: line[1],surname: line[2],name: line[3],sex: line[4],union_id: toInteger(line[5]),mn: line[6],BD: line[7],BDGed: line[8],DD: line[9],DDGed: line[10],bp_id: toInteger(line[11]),dp_id: toInteger(line[12]),BP: line[13],DP: line[14],kit: line[15]})
or, adding the toString() function
LOAD CSV WITH HEADERS FROM 'file:///Neo4jPersonNodes1.csv' AS line FIELDTERMINATOR '|'
CREATE (:Person{RN: toInteger(line[0]),fullname: toString(line[1]),surname: toString(line[2]),name: toString(line[3]),sex: toString(line[4]),union_id: toInteger(line[5]),mn: toString(line[6]),BD: toString(line[7]),BDGed: toString(line[8]),DD: toString(line[9]),DDGed: toString(line[10]),bp_id: toInteger(line[11]),dp_id: toInteger(line[12]),BP: toString(line[13]),DP: toString(line[14]),kit: toString(line[15])})
A sample of the CSV is
"RN"|"fullname"|"surname"|"name"|"sex"|"union_id"|"mn"|"BD"|"BDGed"|"DD"|"DDGed"|"bp_id"|"dp_id"|"BP"|"DP"|"kit"
"5"|"Ale Harmens Slump"|"Slump"|"Ale Harmens"|"M"|"313"|"3"|"18891223"|"23 Dec 1889"|"19890111"|"11 Jan 1989"|"23"|"4552"|"Echten, Friesland, Neth."|"Sebastopol, California"|""
the error message is:
Neo4j.Driver.V1.ClientException: 'Error when pulling unconsumed
session.run records into memory in session: Expected Long(7) to be a
org.neo4j.values.storable.TextValue, but it was a
org.neo4j.values.storable.LongValue'
I'm not sure why Neo4j does not treat the numeric string as a string.

Since your CSV file has a header row (and specified WITH HEADERS), your Cypher code must treat line as a map (whose property names match all your header names) instead of as an array.
For example, instead of line[0], you must use line.RN. If you fix all the uses of line accordingly, you should no longer get such errors.

Related

How can I Download to CSV in Neo4j

I've been trying to download a certain data on my graph and it returns this error :
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected List<Node> but was Node (line 2, column 27 (offset: 77))"CALL apoc.export.csv.data(c,[], "contrib.csv",{})"
This is the query I did :
MATCH (c:Contrib) WHERE c.nationality CONTAINS "|" CALL apoc.export.csv.data(c,[], "contrib.csv",{}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
What went wrong ? :(
Thanks
The syntax for the function: apoc.export.csv.data is
apoc.export.csv.data(nodes,rels,file,config)
exports given nodes and relationships as csv to the provided file
The nodes is a collection of nodes rather than a node.
OLD: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
CALL apoc.export.csv.data(c,[], "contrib.csv",{})
NEW: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
WITH collect(c) as contribs
CALL apoc.export.csv.data(contribs, [], "contrib.csv", {})

Python3 Replacing special character from .csv file after convert the same from JSON

I am trying to develop a program using Python3.6.4 which convert a JSON file into a CSV file and also we need to clean the data in the csv file. as for example:
My JSON File:
{emp:[{"Name":"Bo#b","email":"bob#gmail.com","Des":"Unknown"},
{"Name":"Martin","email":"mar#tin#gmail.com","Des":"D#eveloper"}]}
Problem 1:
After converting that into csv its creating a blank row between every 2 rows. As
**Name email Des**
[<BLANK ROW>]
Bo#b bob#gmail.com Unknown
[<BLANK ROW>]
Martin mar#tin#gmail.com D#eveloper
Problem 2:
In my code I am using emp but I need to use it dynamically.
fobj = open("D:/Users/shamiks/PycharmProjects/jsonSamle.txt")
jsonCont = fobj.read()
print(jsonCont)
fobj.close()
employee_parsed = json.loads(jsonCont)
emp_data = employee_parsed['employee']
As we will not know the structure or content of up-coming JSON file.
Problem 3:
I also need to remove all # characters from the CSV file.
For solving Problem 3, you can use .replace (https://www.tutorialspoint.com/python/string_replace.htm).
For problem 2, you can use the dictionary keys and then get the zeroth item out of it.
fobj = open("D:/Users/shamiks/PycharmProjects/jsonSamle.txt")
jsonCont = fobj.read().replace("#", "")
print(jsonCont)
fobj.close()
employee_parsed = json.loads(jsonCont)
first_key = employee_parsed.keys()[0]
emp_data = employee_parsed[first_key]
I can't solve problem 1 without more code to see how your are exporting the result. It may be that your data has newlines in it. In which case, you could add .replace("\n","") and/or .replace("\r","") after the previous replace so the line would read fobj.read().replace("#", "").replace("\n", "").replace("\r", "").

How to omit the header in when use spark to read csv.file?

I am trying to use Spark to read a csv file in jupyter notebook. So far I have
spark = SparkSession.builder.master("local[4]").getOrCreate()
reviews_df = spark.read.option("header","true").csv("small.csv")
reviews_df.collect()
This is how the reviews_df looks like:
[Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5'),
Row(reviewerID=u'A2YB0B3QOHEFR', asin=u'B000JJSRNY', overall=u'5'),
Row(reviewerID=u'AAI0092FR8V1W', asin=u'B0060MYKYY', overall=u'5'),
Row(reviewerID=u'A2TAPSNKK9AFSQ', asin=u'6303187218', overall=u'5'),
Row(reviewerID=u'A316JR2TQLQT5F', asin=u'6305364206', overall=u'5')...]
But each row of the data frame contains the column names, how can I reformat the data, so that it can become:
[(u'A1YKOIHKQHB58W', u'B0001VL0K2', u'5'),
(u'A2YB0B3QOHEFR', u'B000JJSRNY', u'5')....]
Dataframe always returns Row objects, thats why when you issue collect() on dataframe, it shows -
Row(reviewerID=u'A1YKOIHKQHB58W', asin=u'B0001VL0K2', overall=u'5')
to get what you want, you can do -
reviews_df.rdd.map(lambda row : (row.reviewerID,row.asin,row.overall)).collect()
this will return you tuple of values of rows

Setting properties of a node from a csv - Neo4j

This is an example of my csv file:
_id,official_name,common_name,country,started_by,
ABO.00,Association Football Club Bournemouth,Bournemouth,England,"{""day"":NumberInt(1),""month"":NumberInt(1),""year"":NumberInt(1899)}"
AOK.00,PAE Kerkyra,Kerkyra,Greece,"{""day"":NumberInt(30),""month"":NumberInt(11),""year"":NumberInt(1968)}"
I have to import this csv into Neo4j:
LOAD CSV WITH HEADERS FROM
'file:///Z:/path/to/file/team.csv' as line
create (p:Team {_id:line._id, official_name:line.official_name, common_name:line.common_name, country:line.country, started_by_day:line.started_by.day,started_by_month:line.started_by.month,started_by_year:line.started_by.year
I get an error(Neo.ClientError.Statement.InvalidType) setting started_by.day, started_by.month, started_by.year
How can I set rightly the properties about started_by?
Format of you csv should be following:
_id,official_name,common_name,country,started_by_day,started_by_month,started_by_year
ABO.00,Association Football Club Bournemouth,Bournemouth,England,1,1,1899
Cypher:
LOAD CSV WITH HEADERS FROM 'file:///Z:/path/to/file/team.csv' as line
CREATE (p:Team {_id:line._id, official_name:line.official_name, common_name:line.common_name, country:line.country, started_by_day:line.started_by_day,started_by_month:line.started_by_month,started_by_year:line.started_by_year})
It looks like your date part in the csv file is in JSON format - don't you need to parse that first?
line.started_by
is this string
"{""day"":NumberInt(30),""month"":NumberInt(11),""year"":NumberInt(1968)}"
There is no line.started_by.day

Using Python's csv.dictreader to search for specific key to then print its value

BACKGROUND:
I am having issues trying to search through some CSV files.
I've gone through the python documentation: http://docs.python.org/2/library/csv.html
about the csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) object of the csv module.
My understanding is that the csv.DictReader assumes the first line/row of the file are the fieldnames, however, my csv dictionary file simply starts with "key","value" and goes on for atleast 500,000 lines.
My program will ask the user for the title (thus the key) they are looking for, and present the value (which is the 2nd column) to the screen using the print function. My problem is how to use the csv.dictreader to search for a specific key, and print its value.
Sample Data:
Below is an example of the csv file and its contents...
"Mamer","285713:13"
"Champhol","461034:2"
"Station Palais","972811:0"
So if i want to find "Station Palais" (input), my output will be 972811:0. I am able to manipulate the string and create the overall program, I just need help with the csv.dictreader.I appreciate any assistance.
EDITED PART:
import csv
def main():
with open('anchor_summary2.csv', 'rb') as file_data:
list_of_stuff = []
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
print list_of_stuff
main()
The documentation you linked to provides half the answer:
class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
[...] maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.
It would seem that if the fieldnames parameter is passed, the given file will not have its first record interpreted as headers (the parameter will be used instead).
# file_data is the text of the file, not the filename
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
which will (apparently; I've been having trouble with it) produce the following data structure:
[{"title": "Mamer", "value": "285713:13"},
{"title": "Champhol", "value": "461034:2"},
{"title": "Station Palais", "value": "972811:0"}]
which may need to be further massaged into a title-to-value mapping by something like this:
data = {}
for i in list_of_stuff:
data[i["title"]] = i["value"]
Now just use the keys and values of data to complete your task.
And here it is as a dictionary comprehension:
data = {row["title"]: row["value"] for row in csv.DictReader(file_data, ("title", "value"))}
The currently accepted answer is fine, but there's a slightly more direct way of getting at the data. The dict() constructor in Python can take any iterable.
In addition, your code might have issues on Python 3, because Python 3's csv module expects the file to be opened in text mode, not binary mode. You can make your code compatible with 2 and 3 by using io.open instead of open.
import csv
import io
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
data = dict(csv.reader(f))
print(data['Champhol'])
As a warning, if your csv file has two rows with the same value in the first column, the later value will overwrite the earlier value. (This is also true of the other posted solution.)
If your program really is only supposed to print the result, there's really no reason to build a keyed dictionary.
import csv
import io
# Python 2/3 compat
try:
input = raw_input
except NameError:
pass
def main():
# Case-insensitive & leading/trailing whitespace insensitive
user_city = input('Enter a city: ').strip().lower()
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
for city, value in csv.reader(f):
if user_city == city.lower():
print(value)
break
else:
print("City not found.")
if __name __ == '__main__':
main()
The advantage of this technique is that the csv isn't loaded into memory and the data is only iterated over once. I also added a little code the calls lower on both the keys to make the match case-insensitive. Another advantage is if the city the user requests is near the top of the file, it returns almost immediately and stops looking through the file.
With all that said, if searching performance is your primary consideration, you should consider storing the data in a database.