How can I Download to CSV in Neo4j - csv

I've been trying to download a certain data on my graph and it returns this error :
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected List<Node> but was Node (line 2, column 27 (offset: 77))"CALL apoc.export.csv.data(c,[], "contrib.csv",{})"
This is the query I did :
MATCH (c:Contrib) WHERE c.nationality CONTAINS "|" CALL apoc.export.csv.data(c,[], "contrib.csv",{}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
What went wrong ? :(
Thanks

The syntax for the function: apoc.export.csv.data is
apoc.export.csv.data(nodes,rels,file,config)
exports given nodes and relationships as csv to the provided file
The nodes is a collection of nodes rather than a node.
OLD: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
CALL apoc.export.csv.data(c,[], "contrib.csv",{})
NEW: MATCH (c:Contrib) WHERE c.nationality CONTAINS "|"
WITH collect(c) as contribs
CALL apoc.export.csv.data(contribs, [], "contrib.csv", {})

Related

How to convert csv to Json using expression transformation in informatica?

I have a csv file, I am converting this to Json array format. Below are the row wise operations in expression transformation for the two fields.
region
country
Json(output port): '{'||'"region": '||'"'||Region||'",'||'"Country": '||'"'||Country||'"'||'},'
output:
{"region": "Australia and Oceania", "Country": "Tuvalu"},
This output is saved in a text file with session file properties as fixed width.
enter code here
second mapping expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,1)
V_JSON_end(variable port):instr(JSON,'}',1,10)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
output:
[{"region": "Australia and Oceania","Country": "Tuvalu"},
{"region": "Central America and the Caribbean","Country": "Grenada"}]
when I try to fetch next 10 records as json, it pulls 20 records instead of ten
below is the expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,11)
V_JSON_end(variable port):instr(JSON,'}',1,20)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
Kindly look into this and correct where i am missing.
input: flat file(csv with two fields region and country))
expected output:(5 sessions each session 10 records in json format)
eg., [{"region";"value","country":"value"},
{"region":"value","country":"value"}]
session1(csv to json) -->session 2--session3--session4--session5--session6(all parallel sessions using the file of 1st session 5 records in json format)

Loading CSV Neo4j "Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1'"

I am using grades.csv data from the link below,
https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html
I noticed that all the strings in the csv file were in "" and it causes
error messages:
Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1
so I removed the "" in the headers
the code I was trying to run:
LOAD CSV WITH HEADERS FROM 'file:///grades.csv' AS row
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
error message:
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Any, Map, Node, Relationship, Point, Duration, Date, Time, LocalTime, LocalDateTime or DateTime but was List<String> (line 2, column 24 (offset: 65))
"MERGE (t:Test1 {Test1: row.Test1})
Basically you can not merge node using null property value. In your case, Test1 must be null for one or more lines in your file. If you don't see blank values for Test1, please check is there is any blank line at the end of file.
You can also handle null check before MERGE using WHERE, like
LOAD CSV ...
WHERE row.Test1 IS NOT NULL
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
The issues are:
The file is missing a comma after the Test1 value in the row for "Airpump".
The file has white spaces between the values in each row. (Search for the regexp ", +" and replace with ",".)
Your query should work after fixing the above issues.

How do I search for a string in this JSON with Python

My JSON file looks something like:
{
"generator": {
"name": "Xfer Records Serum",
....
},
"generator": {
"name: "Lennar Digital Sylenth1",
....
}
}
I ask the user for search term and the input is searched for in the name key only. All matching results are returned. It means if I input 's' only then also both the above ones would be returned. Also please explain me how to return all the object names which are generators. The more simple method the better it will be for me. I use json library. However if another library is required not a problem.
Before switching to JSON I tried XML but it did not work.
If your goal is just to search all name properties, this will do the trick:
import re
def search_names(term, lines):
name_search = re.compile('\s*"name"\s*:\s*"(.*' + term + '.*)",?$', re.I)
return [x.group(1) for x in [name_search.search(y) for y in lines] if x]
with open('path/to/your.json') as f:
lines = f.readlines()
print(search_names('s', lines))
which would return both names you listed in your example.
The way the search_names() function works is it builds a regular expression that will match any line starting with "name": " (with varying amount of whitespace) followed by your search term with any other characters around it then terminated with " followed by an optional , and the end of string. Then applies that to each line from the file. Finally it filters out any non-matching lines and returns the value of the name property (the capture group contents) for each match.

JSON as input in a mapreduce

I have a JSON file contains fields such as machine_id, category, and ... Category contains states of machines such as "alarm", "failure". I simply like to see how many times each machine_id has been reported using rmr2.
For example, if I have the following:
machine_id, state
48, alarm
39, failure
48, utilization
I like to see this result:
48,2
39,1
What I did:
I wrote a simple mapreduce to read the value of JSON file and used it as an input in the second mapreduce. Code is:
mp = function(k,v){
machine_id=v$machine_id
keyval(machine_id,1) }
rd = function(k,v) keyval(k,length(v))
mapreduce(input = mapreduce(input='\user\cloudera\sample.json', input.format="json" , map=function(k,v) keyval(k,v)) , map=mp, reduce = rd)
Unfortunately, it returns only the last two values of JSON file. It seems that it doesn't read entire of the value of the JSON file. I would appreciate any help.

Using Python's csv.dictreader to search for specific key to then print its value

BACKGROUND:
I am having issues trying to search through some CSV files.
I've gone through the python documentation: http://docs.python.org/2/library/csv.html
about the csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) object of the csv module.
My understanding is that the csv.DictReader assumes the first line/row of the file are the fieldnames, however, my csv dictionary file simply starts with "key","value" and goes on for atleast 500,000 lines.
My program will ask the user for the title (thus the key) they are looking for, and present the value (which is the 2nd column) to the screen using the print function. My problem is how to use the csv.dictreader to search for a specific key, and print its value.
Sample Data:
Below is an example of the csv file and its contents...
"Mamer","285713:13"
"Champhol","461034:2"
"Station Palais","972811:0"
So if i want to find "Station Palais" (input), my output will be 972811:0. I am able to manipulate the string and create the overall program, I just need help with the csv.dictreader.I appreciate any assistance.
EDITED PART:
import csv
def main():
with open('anchor_summary2.csv', 'rb') as file_data:
list_of_stuff = []
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
print list_of_stuff
main()
The documentation you linked to provides half the answer:
class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
[...] maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.
It would seem that if the fieldnames parameter is passed, the given file will not have its first record interpreted as headers (the parameter will be used instead).
# file_data is the text of the file, not the filename
reader = csv.DictReader(file_data, ("title", "value"))
for i in reader:
list_of_stuff.append(i)
which will (apparently; I've been having trouble with it) produce the following data structure:
[{"title": "Mamer", "value": "285713:13"},
{"title": "Champhol", "value": "461034:2"},
{"title": "Station Palais", "value": "972811:0"}]
which may need to be further massaged into a title-to-value mapping by something like this:
data = {}
for i in list_of_stuff:
data[i["title"]] = i["value"]
Now just use the keys and values of data to complete your task.
And here it is as a dictionary comprehension:
data = {row["title"]: row["value"] for row in csv.DictReader(file_data, ("title", "value"))}
The currently accepted answer is fine, but there's a slightly more direct way of getting at the data. The dict() constructor in Python can take any iterable.
In addition, your code might have issues on Python 3, because Python 3's csv module expects the file to be opened in text mode, not binary mode. You can make your code compatible with 2 and 3 by using io.open instead of open.
import csv
import io
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
data = dict(csv.reader(f))
print(data['Champhol'])
As a warning, if your csv file has two rows with the same value in the first column, the later value will overwrite the earlier value. (This is also true of the other posted solution.)
If your program really is only supposed to print the result, there's really no reason to build a keyed dictionary.
import csv
import io
# Python 2/3 compat
try:
input = raw_input
except NameError:
pass
def main():
# Case-insensitive & leading/trailing whitespace insensitive
user_city = input('Enter a city: ').strip().lower()
with io.open('anchor_summary2.csv', 'r', newline='', encoding='utf-8') as f:
for city, value in csv.reader(f):
if user_city == city.lower():
print(value)
break
else:
print("City not found.")
if __name __ == '__main__':
main()
The advantage of this technique is that the csv isn't loaded into memory and the data is only iterated over once. I also added a little code the calls lower on both the keys to make the match case-insensitive. Another advantage is if the city the user requests is near the top of the file, it returns almost immediately and stops looking through the file.
With all that said, if searching performance is your primary consideration, you should consider storing the data in a database.