Neo4j apoc load json: No data in Neo4j - json

I am using exporting neo4j all db to json using apoc APIs & again importing with same. Import query executes successfully but cannot find any data in neo4j.
Export query:
CALL apoc.export.json.all('complete-db.json',{useTypes:true, storeNodeIds:false})
Import query:
CALL apoc.load.json('complete-db.json')
When I execute:
MATCH (n) RETURN n
It shows no results found.

This is a little bit confusing but apoc.load.json just reads(loads) data from the JSON File/URL.
It doesn't import the data or create the graph. You need to create the graph(nodes and/or relationships) using the Cypher statements.
In this case, you just read the file and didn't do anything with it so statement executed successfully. Your query isn't an import query, it's a JSON load query.
Refer the following example for import using apoc.load.json:
CALL apoc.load.json('complete-db.json') YIELD value
UNWIND value.items AS item
CREATE (i:Item(name:item.name, id:item.id)

apoc.import.json does what you need.
The export-import process:
Export:
CALL apoc.export.json.all('file:///complete-db.json', {useTypes:true, storeNodeIds:false})
Import:
CALL apoc.import.json("file:///complete-db.json")
(#rajendra-kadam explains why your version does not work, and this is the complementary API call to apoc.export.json.all you were expecting. )

Related

Is opening multiple cursors in Mysql a costly operation using Python?

I have few tables in Mysql which are to be loaded into Teradata, I am going with file based approach here, Meaning I export Mysql tables into delimiter file and those files i'm trying to load into teradata. The question/clarity i am expecting is, We are maintaining Mysql stored procedure to extract the data from tables, this stored procedure i'm using in python script to fetch the table data. Is it good/optimal to use stored procedure. Because to get the list of tables, retention period, datebase and other details, i'm creating one cursor to fetch data from 1 table, and again i have to create another cursor to call stored procedure.
Is it a costly operation in mysql creating a cursor.
Instead of table to fetch list of tables, retention period, datebase and other details, is it good thought to maintain them in flat file.
Please share your thoughts.
import sys
import mysql.connector
from mysql.connector import MySQLConnection, Error
import csv
output_file_path='/home/XXXXXXX/'
sys.path.insert(0, '/home/XXXXXXX/')
from mysql_config import *
def stored_proc_call(tbl):
print('SP call:', tbl)
conn_sp = mysql.connector.connect(host=dsn,database=database,user=username,passwd=password,allow_local_infile=True)
conn_sp_cursor = conn_sp.cursor(buffered=True)
conn_sp_cursor.callproc('mysql_stored_proc', [tbl])
output_file = output_file_path + tbl + '.txt'
print('output_file:', output_file)
with open(output_file, 'w') as filehandle:
writer = csv.writer(filehandle, delimiter='\x10')
for result in conn_sp_cursor.stored_results():
print('Stored proc cursor:{}, value:{}'.format(type(result), result))
for row in result:
writer.writerow(row)
#print('cursor row', row)
# Allow loading client-side files using the LOAD DATA LOCAL INFILE statement.
con = mysql.connector.connect(host=dsn,database=database,user=username,passwd=password,allow_local_infile=True)
cursor = con.cursor(buffered=True)
cursor.execute("select * from table")
for row in cursor:
print('Archive table cursor:{}, value:{}'.format(type(row), row))
(db,table,col,orgid,*allvalues)=row
stored_proc_call(table)
#print('db:{}, table:{}, col:{}, orgid:{}, ret_period:{}, allvalues:{}'.format(db,table,col,orgid,ret_period,allvalues))
#print('db:{}, table:{}, col:{}, orgid:{}, ret_period:{}, allvalues:{}'.format(db,table,col,orgid,ret_period,allvalues))

ID as int in neo4j bulk import produces error in relationships import

I use the admin-import tool of Neo4j to import bulk data in csv format. I use Integer as ID datatype in the header [journal:ID:int(Journal-ID)] and the part of importing the nodes works fine. When the import-tool comes to the relationships, I get the error that the referring node is missing.
Seems like the relations-import it is searching the ID in String format.
I already tried to change the type of the ID in the relations File as well, but get an other error. I found no way to specify the ID as int in the relations-File.
Here is an minimal example. Lets say we have two node types with the headers:
journal:ID:int(Journal-ID)
and
documentID:ID(Document-ID),title
and the example files journal.csv:
"123"
"987"
and document.csv:
"PMID:1", "Title"
"PMID:2", "Other Title"
We also have a relation "hasDocument" with the header:
:START_ID(Journal-ID),:END_ID(Document-ID)
and the example file relation.csv:
"123", "PMID:1"
When running the import I get the the error:
Error in input data
Caused by:123 (Journal-ID)-[hasDocument]->PMID:1 (Document-ID) referring to missing node 123
I tried to specify the relation header as
:START_ID:int(Journal-ID),:END_ID(Document-ID)
but this also produces an error.
The command to start the import is:
neo4j-admin import --nodes:Document="document-header.csv,documentNodes.csv" --nodes:Journal="journal-header.csv,journalNodes.csv" --relationships:hasDocument="hasDocument-header.csv,relationsHasDocument.csv"
Is there a way to specify the ID in the relation file as Integer or is there an other solution to that problem?
It doesn't seem to be supported. The documentation doesn't mention it and the code doesn't have such test case.
You could import the data with String ids and cast it after you start the database.
MATCH (j:Journal)
SET j.id = toInteger(j.id)
If your dataset is large you can use apoc with iterate:
call apoc.periodic.iterate("
MATCH (j:Journal) RETURN j
","
SET j.id = toInteger(j.id)
",{batchSize:10000})

Writing to MySQL with Python without using SQL strings

I am importing data into my Python3 environment and then writing it to a MySQL database. However, there is a lot of different data tables, and so writing out each INSERT statement isn't really pragmatic, plus some have 50+ columns.
Is there a good way to create a table in MySQL directly from a dataframe, and then send insert commands to that same table using a dataframe of the same format, without having to actually type out all the col names? I started trying to call column names and format it and concat everything as a string, but it is extremely messy.
Ideally there is a function out there to directly handle this. For example:
apiconn.request("GET", url, headers=datheaders)
#pull in some JSON data from an API
eventres = apiconn.getresponse()
eventjson = json.loads(eventres.read().decode("utf-8"))
#create a dataframe from the data
eventtable = json_normalize(eventjson)
dbconn = pymysql.connect(host='hostval',
user='userval',
passwd='passval',
db='dbval')
cursor = dbconn.cursor()
sql = sqltranslate(table = 'eventtable', fun = 'append')
#where sqlwrite() is some magic function that takes a dataframe and
#creates SQL commands that pymysql can execute.
cursor.execute(sql)
What you want is a way to abstract the generation of the SQL statements.
A library like SQLAlchemy will do a good job, including a powerful way to construct DDL, DML, and DQL statements without needing to directly write any SQL.

ETL script in Python to load data from another server .csv file into mysql

I work as a Business Analyst and new to Python.
In one of my project, I want to extract data from .csv file and load that data into my MySQL DB (Staging).
Can anyone guide me with a sample code and frameworks I should use?
Simple program to create sqllite. You can read the CSV file and use dynamic_entry to insert into your desired target table.
import sqlite3
import time
import datetime
import random
conn = sqlite3.connect('test.db')
c = conn.cursor()
def create_table():
c.execute('create table if not exists stuffToPlot(unix REAL, datestamp TEXT, keyword TEXT, value REAL)')
def data_entry():
c.execute("INSERT INTO stuffToPlot VALUES(1452549219,'2016-01-11 13:53:39','Python',6)")
conn.commit()
c.close()
conn.close()
def dynamic_data_entry():
unix = time.time();
date = str(datetime.datetime.fromtimestamp(unix).strftime('%Y-%m-%d %H:%M:%S'))
keyword = 'python'
value = random.randrange(0,10)
c.execute("INSERT INTO stuffToPlot(unix,datestamp,keyword,value) values(?,?,?,?)",
(unix,date,keyword,value))
conn.commit()
def read_from_db():
c.execute('select * from stuffToPlot')
#data = c.fetchall()
#print(data)
for row in c.fetchall():
print(row)
read_from_db()
c.close()
conn.close()
You can iterate through the data in CSV and load into sqllite3. Please refer below link as well.
Quick easy way to migrate SQLite3 to MySQL?
If that's a properly formatted CSV file you can use the LOAD DATA INFILE MySQL command and you won't need any python. Then after it is loaded in the staging area (without processing) you can continue transforming it using sql/etl tool of choice.
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
A problem with that is that you need to add all columns but still even if you have data you don't need you might prefer to load everything in the staging.

Importing MYSQL query output to CSV file using Python

I am trying to export the output of a query in a MYSQL database to a CSV file in the local system using Python. There are 2 issues. First of all using fetchall() I am not getting any data( The same query in database produces more than 5000 rows of data), though I got data output initially. Secondly I would like to know the code to put the username and password in a separate file which the user cannot access but will be imported to this file when the script runs.
import os
import csv
import pymysql
import pymysql.cursors
d=open('c:/Users/dasa17/Desktop/pylearn/Roster.csv', 'w')
c=csv.writer(d)
Connection = pymysql.connect(host='xxxxx', user='xxxxx', password='xxxx',
db='xxxx',charset='utf8mb4',cursorclass=pymysql.cursors.DictCursor )
a=Connection.cursor()
a.execute("select statement")
data=a.fetchall()
for item in data:
c.writerow(item)
a.close()
d.close()
Connection.close()