Export and Import Json Neo4j - json

Neo4j give you the possibility to export your graph in json but i don't find how to import this json for a new graph.
First, is it possible?
Other way to do it?
Thank you,

You can use the apoc.load.json procedure to load JSON data into Neo4j using Cypher.
For example, given the json file person.json:
{
"name": "Bob",
"age": 27
}
You can load this using Cypher:
CALL apoc.load.json("file:///person.json") YIELD value AS person
CREATE (p:Person {name: person.name})
SET p.age = person.age

Related

Write JSON to a file in pretty print format in Django

I need to create some JSON files for exporting data from a Django system to Google Big Query.
The problem is that Google BQ imposes some characteristics in the JSON file, for example, that each object must be in a different line.
json.dumps writes a stringified version of the JSON, so it is not useful for me.
Django serializes writes better JSON, but it put all in one line. All the information I found about pretty-printing is about json.dumps, which I cannot use.
I will like to know if anyone knows a way to create a JSON file in the format required by Big Query.
Example:
JSONSerializer = serializers.get_serializer("json")
json_serializer = JSONSerializer()
data_objects = DataObject.objects.all()
with open("dataobjects.json", "w") as out:
json_serializer.serialize(data_objects, stream=out)
json.dumps is OK. You have to use indent like this.
import json
myjson = '{"latitude":48.858093,"longitude":2.294694}'
mydata = json.loads(myjson)
print(json.dumps(mydata, indent=4, sort_keys=True))
Output:
{
"latitude": 48.858093,
"longitude": 2.294694
}

How to set key_type and value_type in DjangoCassandraModel Map field?

In models.py:
import uuid
from cassandra.cqlengine import columns
from django_cassandra_engine.models import DjangoCassandraModel
class ExampleModel(DjangoCassandraModel):
response = columns.Map(key_type=?, value_type=?)
I have a json response like this,
{
"name": "MH Habib",
"university": "X university"
}
I want save this json into my model. I find out Map field is suitable for store json data. Now i have to set text type data in key_type and text type data in value_type. How can i solve this issue ?
Finally i have found an answer. When i set
response = columns.Map(key_type=columns.Text(), value_type=columns.Text())
Then run this command to sync the database
python manage.py sync_cassandra
Yeah ! This problem is solved.

Reading data from file and converting into JSON in Python

I have the file log.txt with following data:
{"__TIMESTAMP":"2020-07-09T19:05:20.858013","__LABEL":"web_channel","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"Port web_channel/diagnose_client not connected!"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229737","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229761","__LABEL":"context_logging_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229793","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"startup component"}
{"__TIMESTAMP":"2020-07-09T19:05:21.229805","__LABEL":"context_monitoring_addon","__LEVEL":4,"__DIAGNOSE_SLOT":"","msg":"activate component"}
If I define a single row, I can convert in real JSON type:
import json
import datetime
from json import JSONEncoder
log = {
"__TIMESTAMP":"2020-07-09T19:05:21.229737",
"__LABEL":"context_logging_addon",
"__LEVEL":4,
"__DIAGNOSE_SLOT":"",
"msg":"Port web_channel/diagnose_client not connected!"}
class DateTimeEncoder(JSONEncoder):
#Override the default method
def default(self,obj):
if isinstance(obj,(datetime.date,datetime.datetime)):
return obj.isoformat()
print("Printing to check how it will look like")
print(DateTimeEncoder().encode(log))
I have the following output, which format is perfect JSON.
Printing to check how it will look like
{"__TIMESTAMP": "2020-07-09T19:05:21.229737", "__LABEL": "context_logging_addon", "__LEVEL": 4, "__DIAGNOSE_SLOT": "", "msg": "Port web_channel/diagnose_client not connected!"}
But I don't know how should I open the log.txt file, read the data to convert into JSON without any failure.
Could you help me please? Thanks in advance.
Let us say your log.txt file is in the same directory than your .py file.
Just open it with with open(... and then parse your file according to your syntax to create a list of dictionaries (each item corresponding to a row, then parse each dictionary as you're currently doing).
Here is how you could open and parse your file:
with open("log.txt","r") as file:
all_text = file.readlines()
parsed_line = list()
for text in all_text:
parsed_line.append(dict([item.split('":"') for item in text[2:-2].split('","')]))
If you have any question about the parsing let me know. This one is pretty straightforward.
Hope this helped you.
Try it this way:
logs = """[your log file above]"
for log in logs.splitlines():
print(DateTimeEncoder().encode(log))
Output:
"{\"__TIMESTAMP\":\"2020-07-09T19:05:20.858013\",\"__LABEL\":\"web_channel\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"Port web_channel/diagnose_client not connected!\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229737\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229761\",\"__LABEL\":\"context_logging_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229793\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"startup component\"}"
"{\"__TIMESTAMP\":\"2020-07-09T19:05:21.229805\",\"__LABEL\":\"context_monitoring_addon\",\"__LEVEL\":4,\"__DIAGNOSE_SLOT\":\"\",\"msg\":\"activate component\"}"

Loading json data into hive using spark sql

I am Unable to push json data into hive Below is the sample json data and my work . Please suggest me the missing one
json Data
{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
},
{
"userId":"thanks",
"jobTitleName":"Program Directory",
"firstName":"Tom",
"lastName":"Hanks",
"preferredFullName":"Tom Hanks",
"employeeCode":"E3",
"region":"CA",
"phoneNumber":"408-2222222",
"emailAddress":"tomhanks#gmail.com"
}
]
}
I tried to use sqlcontext and jsonFile method to load which is failing to parse the json
val f = sqlc.jsonFile("file:///home/vm/Downloads/emp.json")
f.show
error is : java.lang.RuntimeException: Failed to parse a value for data type StructType() (current token: VALUE_STRING)
I tried in different way and able to crack and get the schema
val files = sc.wholeTextFiles("file:///home/vm/Downloads/emp.json")
val jsonData = files.map(x => x._2)
sqlc.jsonRDD(jsonData).registerTempTable("employee")
val emp= sqlc.sql("select Employees[1].userId as ID,Employees[1].jobTitleName as Title,Employees[1].firstName as FirstName,Employees[1].lastName as LastName,Employees[1].preferredFullName as PeferedName,Employees[1].employeeCode as empCode,Employees[1].region as Region,Employees[1].phoneNumber as Phone,Employees[1].emailAddress as email from employee")
emp.show // displays all the values
I am able to get the data and schema seperately for each record but I am missing an idea to get all the data and load into hive.
Any help or suggestion is much appreaciated.
Here is the Cracked answer
val files = sc.wholeTextFiles("file:///home/vm/Downloads/emp.json")
val jsonData = files.map(x => x._2)
import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive.HiveContext
val hc=new HiveContext(sc)
hc.jsonRDD(jsonData).registerTempTable("employee")
val fuldf=hc.jsonRDD(jsonData)
val dfemp=fuldf.select(explode(col("Employees")))
dfemp.saveAsTable("empdummy")
val df=sql("select * from empdummy")
df.select ("_c0.userId","_c0.jobTitleName","_c0.firstName","_c0.lastName","_c0.preferredFullName","_c0.employeeCode","_c0.region","_c0.phoneNumber","_c0.emailAddress").saveAsTable("dummytab")
Any suggestion for optimising the above code.
SparkSQL only supports reading JSON files when the file contains one JSON object per line.
SQLContext.scala
/**
* Loads a JSON file (one object per line), returning the result as a [[DataFrame]].
* It goes through the entire dataset once to determine the schema.
*
* #group specificdata
* #deprecated As of 1.4.0, replaced by `read().json()`. This will be removed in Spark 2.0.
*/
#deprecated("Use read.json(). This will be removed in Spark 2.0.", "1.4.0")
def jsonFile(path: String): DataFrame = {
read.json(path)
}
Your file should look like this (strictly speaking, it's not a proper JSON file)
{"userId":"rirani","jobTitleName":"Developer","firstName":"Romin","lastName":"Irani","preferredFullName":"Romin Irani","employeeCode":"E1","region":"CA","phoneNumber":"408-1234567","emailAddress":"romin.k.irani#gmail.com"}
{"userId":"nirani","jobTitleName":"Developer","firstName":"Neil","lastName":"Irani","preferredFullName":"Neil Irani","employeeCode":"E2","region":"CA","phoneNumber":"408-1111111","emailAddress":"neilrirani#gmail.com"}
{"userId":"thanks","jobTitleName":"Program Directory","firstName":"Tom","lastName":"Hanks","preferredFullName":"Tom Hanks","employeeCode":"E3","region":"CA","phoneNumber":"408-2222222","emailAddress":"tomhanks#gmail.com"}
Please have a look at the outstanding JIRA issue. Don't think it is that much of priority, but just for record.
You have two options
Convert your json data to the supported format, one object per line
Have one file per JSON object - this will result in too many files.
Note that SQLContext.jsonFile is deprecated, use SQLContext.read.json.
Examples from spark documentation

How to convert DataFrame to Json?

I have a huge JSON file, a small part from it as follows:
{
"socialNews": [{
"adminTagIds": "",
"fileIds": "",
"departmentTagIds": "",
........
........
"comments": [{
"commentId": "",
"newsId": "",
"entityId": "",
....
....
}]
}]
.....
}
I have applied lateral view explode on socialNews as follows:
val rdd = sqlContext.jsonFile("file:///home/ashish/test")
rdd.registerTempTable("social")
val result = sqlContext.sql("select * from social LATERAL VIEW explode(socialNews) social AS comment")
Now I want to convert back this result (DataFrame) to JSON and save into a file, but I am not able to find any Scala API to do the conversion.
Is there any standard library to do this or some way to figure it out?
val result: DataFrame = sqlContext.read.json(path)
result.write.json("/yourPath")
The method write is in the class DataFrameWriter and should be accessible to you on DataFrame objects. Just make sure that your rdd is of type DataFrame and not of deprecated type SchemaRdd. You can explicitly provide type definition val data: DataFrame or cast to dataFrame with toDF().
If you have a DataFrame there is an API to convert back to an RDD[String] that contains the json records.
val df = Seq((2012, 8, "Batman", 9.8), (2012, 8, "Hero", 8.7), (2012, 7, "Robot", 5.5), (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating")
df.toJSON.saveAsTextFile("/tmp/jsonRecords")
df.toJSON.take(2).foreach(println)
This should be available from Spark 1.4 onward. Call the API on the result DataFrame you created.
The APIs available are listed here
sqlContext.read().json(dataFrame.toJSON())
When you run your spark job as
--master local --deploy-mode client
Then,
df.write.json('path/to/file/data.json') works.
If you run on cluster [on header node], [--master yarn --deploy-mode cluster] better approach is to write data to aws s3 or azure blob and read from it.
df.write.json('s3://bucket/path/to/file/data.json') works.
If you still can't figure out a way to convert Dataframe into JSON, you can use to_json or toJSON inbuilt Spark functions.
Let me know if you have a sample Dataframe and a format of JSON to convert.