Reading JSON file from R - json

I try reading a JSON file from R using rjson but keep getting errors. I validated the JSON file using various online validators. Here is the content of the JSON file:
{
"scenarios": [
{
"files": {
"type1": "/home/blah/Desktop/temp/scen_0.type1",
"type2": "/home/blah/Desktop/temp/scen_0.type2"
},
"ID": "scen_0",
"arr": [],
"TypeToElementStatsFilename": {
"type1": "/home/blah/Desktop/temp/scen_0.type1.elements",
"type2": "/home/blah/Desktop/temp/scen_0.type2.elements"
}
}
],
"randomSeed": "39327314969888",
"zone": {
"length": 1000000,
"start": 1
},
"instanceFilename": "/home/blah/bloo/data/XY112.zip",
"txtFilename": "/home/blah/bloo/data/XY112.txt",
"nSimulations": 2,
"TypeTodbFilename": {
"type1": "/home/blah/bloo/data/map.type1.oneAmb.XY112.out"
},
"arr": {
"seg11": {
"length": 1000,
"start": 147000
},
"seg12": {
"length": 1000,
"start": 153000
},
"seg5": {
"length": 1000,
"start": 145000
},
"seg6": {
"length": 1000,
"start": 146000
},
"seg1": {
"length": 100,
"start": 20000
}
},
"outPath": "/home/blah/Desktop/temp",
"instanceID": "XY112",
"arrIds": [
"seg5",
"seg6",
"seg1",
"seg11",
"seg12"
],
"truth": {
"files": {
"type1": "/home/blah/Desktop/temp/truth.type1",
"type2": "/home/blah/Desktop/temp/truth.type2"
},
"ID": "truth",
"TypeToElementStatsFilename": {
"type1": "/home/blah/Desktop/temp/truth.type1.elements",
"type2": "/home/blah/Desktop/temp/truth.type2.elements"
}
}
}
And the error:
> json_file <- "~/json"
> json_data <- fromJSON(paste(readLines(json_file), collapse=""))
Error in fromJSON(paste(readLines(json_file), collapse = "")) :
unexpected character: :

RJSON freaks out about empty arrays.
fromJSON( '{ "arr": [ ] }')
Error in fromJSON("{ \"arr\": [ ] }") : unexpected character: :

You can try the fromJSON function in the RJSONIO package hosted at http://www.omegahat.org. It seems to read the file fine.

There's a fix for this.
Create a new function to replace the existing getURL function used in RCurl and you should have your solution.
myGetURL <- function(...) {
rcurlEnv <- getNamespace("RCurl")
mapUnicodeEscapes <- get("mapUnicodeEscapes", rcurlEnv)
unlockBinding("mapUnicodeEscapes", rcurlEnv)
assign("mapUnicodeEscapes", function(str) str, rcurlEnv)
on.exit({
assign("mapUnicodeEscapes", mapUnicodeEscapes, rcurlEnv)
lockBinding("mapUnicodeEscapes", rcurlEnv)
}, add = TRUE)
return(getURL(...))
}
Test:
> json <- myGetURL("http://abicky.net/hatena/rcurl/a.json")
> cat(json, fill = TRUE)
{"a":"\\\"\u0030\\\""}
> fromJSON(json)
$a
[1] "\\\"0\\\""

Related

How to get the All index values in Groovy JSON xpath

Please find the attached Groovy code which I am using to get the particular filed from the response body.
Query 1 :
It is retrieving the results when the I am using the correct Index value like if the data.RenewalDetails[o], will give output as Value 1 and if the data.RenewalDetails[1], output as Value 2.
But in my real case, I will never know about number of blocks in the response, so I want to get all the values that are satisficing the condition, I tried data.RenewalDetails[*] but it is not working. Can you please help ?
Query 2:
Apart from the above condition, I want to add one more filter, where "FamilyCode": "PREMIUM" in the Itemdetails, Can you help on the same ?
def BoundId = new groovy.json.JsonSlurper().parseText('{"data":{"RenewalDetails":[{"ExpiryDetails":{"duration":"xxxxx","destination":"LHR","from":"AUH","value":2,"segments":[{"valudeid":"xxx-xx6262-xxxyyy-1111-11-11-1111"}]},"Itemdetails":[{"BoundId":"Value1","isexpired":true,"FamilyCode":"PREMIUM","availabilityDetails":[{"travelID":"AAA-AB1234-AAABBB-2022-11-10-1111","quota":"X","scale":"XXX","class":"X"}]}]},{"ExpiryDetails":{"duration":"xxxxx","destination":"LHR","from":"AUH","value":2,"segments":[{"valudeid":"xxx-xx6262-xxxyyy-1111-11-11-1111"}]},"Itemdetails":[{"BoundId":"Value2","isexpired":true,"FamilyCode":"PREMIUM","availabilityDetails":[{"travelID":"AAA-AB1234-AAABBB-2022-11-10-1111","quota":"X","scale":"XXX","class":"X"}]}]}]},"warnings":[{"code":"xxxx","detail":"xxxxxxxx","title":"xxxxxxxx"}]}')
.data.RenewalDetails[0].Itemdetails.find { itemDetail ->
itemDetail.availabilityDetails[0].travelID.length() == 33
}?.BoundId
println "Hello " + BoundId
Something like this:
def txt = '''\
{
"data": {
"RenewalDetails": [
{
"ExpiryDetails": {
"duration": "xxxxx",
"destination": "LHR",
"from": "AUH",
"value": 2,
"segments": [
{
"valudeid": "xxx-xx6262-xxxyyy-1111-11-11-1111"
}
]
},
"Itemdetails": [
{
"BoundId": "Value1",
"isexpired": true,
"FamilyCode": "PREMIUM",
"availabilityDetails": [
{
"travelID": "AAA-AB1234-AAABBB-2022-11-10-1111",
"quota": "X",
"scale": "XXX",
"class": "X"
}
]
}
]
},
{
"ExpiryDetails": {
"duration": "xxxxx",
"destination": "LHR",
"from": "AUH",
"value": 2,
"segments": [
{
"valudeid": "xxx-xx6262-xxxyyy-1111-11-11-1111"
}
]
},
"Itemdetails": [
{
"BoundId": "Value2",
"isexpired": true,
"FamilyCode": "PREMIUM",
"availabilityDetails": [
{
"travelID": "AAA-AB1234-AAABBB-2022-11-10-1111",
"quota": "X",
"scale": "XXX",
"class": "X"
}
]
}
]
}
]
},
"warnings": [
{
"code": "xxxx",
"detail": "xxxxxxxx",
"title": "xxxxxxxx"
}
]
}'''
def json = new groovy.json.JsonSlurper().parseText txt
List<String> BoundIds = json.data.RenewalDetails.Itemdetails*.find { itemDetail ->
itemDetail.availabilityDetails[0].travelID.size() == 33 && itemDetail.FamilyCode == 'PREMIUM'
}?.BoundId
assert BoundIds.toString() == '[Value1, Value2]'
Note, that you will get the BoundIds as a List
If you amend your code like this:
def json = new groovy.json.JsonSlurper().parse(prev.getResponseData()
you would be able to access the number of returned items as:
def size = json.data.RenewalDetails.size()
as RenewalDetails represents a List
Just add as many queries you want using Groovy's && operator:
find { itemDetail ->
itemDetail.availabilityDetails[0].travelID.length() == 33 &&
itemDetail.FamilyCode.equals('PREMIUM')
}
More information:
Apache Groovy - Parsing and producing JSON
Apache Groovy: What Is Groovy Used For?

How to update json file sorted by number? - Python

For example I have a JSON file with a mess number
{
"data": {
"31": {
...
},
"52": {
...
},
"1": {
...
}
}
}
I wanted to make It like sorted by number so the json data will not be messed up
{
"data": {
"52": {
...
},
"31": {
...
},
"52": {
...
}
}
}
I tried a code that uses:
with open ("file.json", "r", encoding="utf-8") as f:
file = json.load(f)
file["data"].update(
{num: {"question": question, "answer": answer, "options": options}}
)
My errors code: TypeError: cannot convert dictionary update sequence element #0 to a sequence
Option #1
Use sorted with key argument:
import json
my_dict = {
"data": {
"31": {
"k": "..."
},
"52": {
"k": "..."
},
"1": {
"k": "..."
}
}
}
my_dict['data'] = dict(sorted(my_dict['data'].items(), key=lambda t: int(t[0])))
print(my_dict['data'])
Result:
{'1': {'k': '...'}, '31': {'k': '...'}, '52': {'k': '...'}}
Option #2
Use json.dumps with sort_keys argument:
print(json.dumps(my_dict, indent=2, sort_keys=True))
Result:
{
"data": {
"1": {
"k": "..."
},
"31": {
"k": "..."
},
"52": {
"k": "..."
}
}
}

How to get required json output from complex nested json format.?

My original file is in CSV format which I have converted to python JSON array to JSON Sring.
jsonfile
<class 'list'>
<class 'dict'>
[
{
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc.reg": "nord000",
"loc.count": "abs39i5",
"loc.town": "cold54",
"co.gdp": "nscrt77",
"co.pop.min": "min50",
"co.pop.max": "max75",
"co.rev": "",
"chain.system": "5t5t5",
"chain.type": "765ef",
"chain.strat": "",
}
]
I would like to get the output as below:
{
"timestamp001": {
"key": "timestamp001",
"phNo": "ner007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc": {
"reg": "nord000",
"count": "abs39i5",
"town": "cold54"
},
"co": {
"form": "nscrt77",
"pop": {
"min": "min50",
"max": "max75"
},
"rev: ""
},
"chain":{
"system": "5t5t5",
"type": "765ef",
"strat": ""
}
...
}
...
}
]
I have tried different options; tried to enumerate, but cannot get the required output. Please help me with this. Thanks in advance.
You can use something like this to create the nested dict:
import json
def unflatten(somedict):
unflattened = {}
for key, value in somedict.items():
splitkey = key.split(".")
print(f"doing {key} {value} {splitkey}")
# subdict is the dict that goes deeper in the nested structure
subdict = unflattened
for subkey in splitkey[:-1]:
# if this is the first time we see this key, add it
if subkey not in subdict:
subdict[subkey] = {}
# shift the subdict a level deeper
subdict = subdict[subkey]
# add the value
subdict[splitkey[-1]] = value
return unflattened
data = {
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc.reg": "nord000",
"loc.count": "abs39i5",
"loc.town": "cold54",
"co.gdp": "nscrt77",
"co.pop.min": "min50",
"co.pop.max": "max75",
"co.rev": "",
"chain.system": "5t5t5",
"chain.type": "765ef",
"chain.strat": "",
}
unflattened = unflatten(data)
print(json.dumps(unflattened, indent=4))
Which produces:
{
"key": "timestamp",
"source": "eia007",
"turnover": "65million",
"url": "abc.com",
"record": "",
"loc": {
"reg": "nord000",
"count": "abs39i5",
"town": "cold54"
},
"co": {
"gdp": "nscrt77",
"pop": {
"min": "min50",
"max": "max75"
},
"rev": ""
},
"chain": {
"system": "5t5t5",
"type": "765ef",
"strat": ""
}
}
Cheers!

Convert Nested Dictionary to a graphable format

So I'm trying to convert a nested dictionary like:
A = {
"root":
{
"child1":
{
"child11":"hmm",
"child12":"not_hmm"
},
"child2":"hello"
}
}
To this:
{
"name":"root",
"children": [
{"name":"child1",
"children" :
[{"name":"child11",
"children":[{"name":"hmm"}]}
{"name":"child12",
"children":[{"name":"not_hmm"}]}
]
},
{"name":"child2",
"children":[{"name":"hello"}]
}
]
}
I need this, since I'm trying to visualize it with this graph drawing template: Collapsible Tree
I'm having some trouble creating a recursive method that is capable of this transformation.
Preferably in python3. So far I have:
def visit(node, parent=None):
B = {}
for k,v in node.items():
B["name"]=k
B["children"] = []
if isinstance(v,dict):
print("Key value pair is",k,v)
B["children"].append(visit(v,k))
new_dict = {}
new_dict["name"]=v
return [new_dict]
C = visit(A) # This should have the final result
But its wrong. Any help is appreciated.
We'll have a function that takes a root (assuming it has only one entry), and returns a dict, as well as a helper function that returns lists of dicts.
def convert(d):
for k, v in d.items():
return {"name": k, "children": convert_helper(v)}
def convert_helper(d):
if isinstance(d, dict):
return [{"name": k, "children": convert_helper(v)} for k, v in d.items()]
else:
return [{"name": d}]
which gives us
json.dumps(convert(A), indent=2)
{
"name": "root",
"children": [
{
"name": "child1",
"children": [
{
"name": "child11",
"children": [
{
"name": "hmm"
}
]
},
{
"name": "child12",
"children": [
{
"name": "not_hmm"
}
]
}
]
},
{
"name": "child2",
"children": [
{
"name": "hello"
}
]
}
]
}

Import JSON to OrientDB type document using ETL

How can I import some json files to OrientDB to use it like document type (not graph)?
My data is something like this:
{
"p_partkey": 1,
"p_name": "lace spring",
"lineorder": [{
"customer": [{
"c_name": "Customer#000014704"
}],
"lo_quantity": 49,
"lo_orderpriority": "1-URGENT",
"lo_discount": 3,
"lo_shipmode": "RAIL|",
"lo_tax": 0
}, {
"customer": [{
"c_name": "Customer#000026548"
}],
"lo_quantity": 15,
"lo_orderpriority": "3-MEDIUM",
"lo_discount": 10,
"lo_shipmode": "SHIP|",
"lo_tax": 0
}]
}
and I create a configfile.json like under to import but it dont work:
{
"config": {
"log": "debug"
},
"source" : {
"file": { "path": "/home/raphael/Documents/data/part/part1.json", "lock" : true }
},
"extractor" : {
"json": {}
},
"transformers" : [
{ "merge": { "joinFieldName":"p_partkey"} },
{ "vertex": { "class": "part"} }
],
"loader" : {
"orientdb": {
"dbURL": "plocal:/opt/orientdb/databases/part",
"dbUser": "root",
"dbPassword": "rasns1901",
"dbAutoCreate": true,
"tx": false,
"batchCommit": 1000,
"dbType": "document",
"classes": [
{"name": "part", "extends": "V"}
],
"indexes": [
{"class":"part", "fields":["p_partkey:integer"], "type":"UNIQUE_HASH_INDEX" }
]
}
}
}
There's something wrong with my configfile? Theres no example of it on OrientDB documents.
I gave up using the ETL and did it using python, it was easier.
Here goes my code:
from __future__ import division
import csv
import sys
import collections
import pyorient
def Inicio():
db_name = "db"
client = pyorient.OrientDB("127.0.0.1", 2424)
session_id = client.connect( "admin", "admin" )
client.db_open( db_name, "admin", "admin" )
i=1
while i<3:
file= open('home/Desktop/part'+str(i)+'.json','rd')
texto = file.readline()
co = 'INSERT INTO part CONTENT '+texto
client.command(co)
print("Inserted:"+str(i))
file.close()
i=i+1
client.db_close()
Inicio()
The only thing you have to pay atention is that my json file dont have carriage returns, so the readline() function works.