Compare JSON List with JSON Dict; only by Keys - json

I have a List and a dict which need to be compared just by the Keys.
The List is created by Hand for define which Vars will be used in the following process. The List will be used for writing the result in an CSV List = Header.
Some Devices doesn't support all Vars and won't send them Back in the response.
base=["General.IpAddress", "General.ActualHostname", "General.UserLabel1", "General.UserLabel2"]
response_diff='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217"}'
As you see the General.UserLabel1 and General.UserLabel2is missing in the response. (There can be missing more vars)
So i have to add to the response the missing Vars with NULL Value.

import json
from pprint import pprint
def compare_ListWithDict(list_base,dict_ref):
#temp dict
dict_base_tmp = {}
dict_ref = dict_ref.lower()
#run thru List an generate an dict with Value 0 for every Key
for item in list_base:
dict_base_tmp[item.lower()] = 0
#load dict_ref as JSON
dict_ref_json=json.loads(dict_ref)
#get len
dict_base_len= len(dict_base_tmp)
dict_ref_len= len(dict_ref_json)
#if lens are equal return the dict_ref (response from Device)
if dict_base_len == dict_ref_len:
return dict_ref_json
else:
#run thru list_base and search for keys they AREN'T in dict_ref_json
#if missing key is found, add the key with Value NULL to the dict_ref_json
for item in list_base:
if not item.lower() in dict_ref_json.keys():
item_lower = item.lower()
dict_ref_json[item_lower]='null'
return dict_ref_json
base=["General.IpAddress", "General.ActualHostname", "General.UserLabel1", "General.UserLabel2"]
response_diff='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217"}'
response_equal='{"general.actualhostname":"ST_38_217","general.ipaddress":"192.168.38.217","general.userlabel1":"First Label", "general.userlabel2":"Second Label"}'
Results:
pprint(compare_ListWithDict(base,response_equal))
#base and response are equal by the keys
{'general.actualhostname': 'st_38_217',
'general.ipaddress': '192.168.38.217',
'general.userlabel1': 'first label',
'general.userlabel2': 'second label'}
pprint(compare_ListWithDict(base,response_diff))
#base and response is different by the keys
{'general.actualhostname': 'st_38_217',
'general.ipaddress': '192.168.38.217',
'general.userlabel1': 'null',
'general.userlabel2': 'null'}

Related

How to convert multi dimensional array in JSON as separate columns in pandas

I have a DB collection consisting of nested strings . I am trying to convert the contents under "status" column as separate columns against each order ID in order to track the time taken from "order confirmed" to "pick up confirmed". The string looks as follows:
I have tried the same using
xyz_db= db.logisticsOrders -------------------------(DB collection)
df =pd.DataFrame(list(xyz_db.find())) ------------(JSON to dataframe)
Using normalize :
parse1=pd.json_normalize(df['status'])
It works fine in case of non nested arrays. But status being a nested array the output is as follows:
Using for :
data = df[['orderid','status']]
data = list(data['status'])
dfy = pd.DataFrame(columns = ['statuscode','statusname','laststatusupdatedon'])
for i in range(0, len(data)):
result = data[i]
dfy.loc[i] = [data[i][0],data[i][0],data[i][0],data[i][0]]
It gives the result in form of appended rows which is not the format i am trying to achieve
The output I am trying to get is :
Please help out!!
i share you which i used json read, maybe help you:
you can use two and more list
def jsonify(z):
genr = []
if z==z and z is not None:
z = eval(z)
if type(z) in (dict, list, tuple):
for dic in z:
for key, val in dic.items():
if key == "name":
genr.append(val)
else:
return None
else:
return None
return genr
top_genr['genres_N']=top_genr['genres'].apply(jsonify)

I need to parse through a dictionary and update its value

I have a problem statement in which I have to read a JSON file. The JSON file when converted to the dictionary using json.loads() has 12 keys.
One of the key('body') has the value which is of type string. When converting this string again to dictionary using json.loads() results in a list of dictionaries. The length of this list of dictionaries is 1000 while each dictionary within has a length of 24.
I need to increase the number of dictionaries so that my list of dictionaries has a new length of 2000. Each dictionary within has a length of 24 has a key('id') that needs to be unique.
Now, this is my code snippet where I'm trying to update the value of the dictionary if my key value is 'id':
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
v = val
print("value is ",v)
val = val+1
O/P
value is 1
value is 2
and so on...
Now, when I am trying to view the updated value again, I can see the previous values only.
This is the code snippet:
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
print("value is ",v)
O/P
value is 11123
value is 11128
and so on...
Whereas I want the output as above since I have updated the values already.
Got the answer. Actually in the first for-in loop, I realized that I forgot to update the dictionary and that's why in the second loop, I couldn't see the updated data. So the updated code for the first for loop would be :
val = 1
for each_dict in list_of_dictionary:
for k,v in each_dict.items():
if k == 'id':
temp = {k, val}
each_dict.update(temp)
val = val+1
Now I'm able to see the updated data in the second loop.

How to manipulate Json.Encode.Value in Elm?

I'm writing some code to auto-gen JSON codecs for Elm data-structures. There is a point my code, where a "sub-structure/sub-type", has already been encoded to a Json.Encode.Value, and I need to add another key-value pair to it. Is there any way to "destructure" a Json.Encode.Value in Elm? Or combine two values of type Json.Encode.Value?
Here's some sample code:
type alias Entity record =
{ entityKey: (Key record)
, entityVal: record
}
jsonEncEntity : (record -> Value) -> Entity record -> Value
jsonEncEntity localEncoder_record val =
let
encodedRecord = localEncoder_record val.entityVal
in
-- NOTE: the following line won't compile, but this is essentially
-- what I'm looking for
Json.combine encodedRecord (Json.Encode.object [ ( "id", jsonEncKey val.entityKey ) ] )
You can decode the value into a list of key value pairs using D.keyValuePairs D.value and then append the new field. Here's how you'd do that:
module Main exposing (..)
import Json.Decode as D
import Json.Encode as E exposing (Value)
addKeyValue : String -> Value -> Value -> Value
addKeyValue key value input =
case D.decodeValue (D.keyValuePairs D.value) input of
Ok ok ->
E.object <| ( key, value ) :: ok
Err _ ->
input
> import Main
> import Json.Encode as E
> value = E.object [("a", E.int 1)]
{ a = 1 } : Json.Encode.Value
> value2 = Main.addKeyValue "b" E.null value
{ b = null, a = 1 } : Json.Encode.Value
If the input is not an object, this will return the input unchanged:
> Main.addKeyValue "b" E.null (E.int 1)
1 : Json.Encode.Value
If you want to do this, you need to use a decoder to unwrap the values by one level into a Dict String Value, then combine the dictionaries, and finally re-encode as a JSON value. You can unwrap like so:
unwrapObject : Value -> Result String (Dict String Value)
unwrapObject value =
Json.Decode.decodeValue (Json.Decode.dict Json.Decode.value) value
Notice that you have to work with Results from this point on because there's the possibility, as far as Elm is concerned, that your JSON value wasn't really an object (maybe it was a number or a string instead, for instance) and you have to handle that case. For that reason, it's not really best practice to do too much with JSON Values directly; if you can, keep things as Dicts or some other more informative type until the end of processing and then convert the whole result into a Value as the last step.

How to use ijson/other to parse this large JSON file?

I have this massive json file (8gb), and I run out of memory when trying to read it in to Python. How would I implement a similar procedure using ijson or some other library that is more efficient with large json files?
import pandas as pd
#There are (say) 1m objects - each is its json object - within in this file.
with open('my_file.json') as json_file:
data = json_file.readlines()
#So I take a list of these json objects
list_of_objs = [obj for obj in data]
#But I only want about 200 of the json objects
desired_data = [obj for obj in list_of_objs if object['feature']=="desired_feature"]
How would I implement this using ijson or something similar? Is there a way I can extract the objects I want without reading in the whole JSON file?
The file is a list of objects like:
{
"review_id": "zdSx_SD6obEhz9VrW9uAWA",
"user_id": "Ha3iJu77CxlrFm-vQRs_8g",
"business_id": "tnhfDv5Il8EaGSXZGiuQGg",
"stars": 4,
"date": "2016-03-09",
"text": "Great place to hang out after work: the prices are decent, and the ambience is fun. It's a bit loud, but very lively. The staff is friendly, and the food is good. They have a good selection of drinks.",
"useful": 0,
"funny": 0,
}
The file is a list of objects
This is a little ambiguous. Looking at your code snippet it looks like your file contains separate JSON object on each line. Which is not the same as the actual JSON array that starts with [, ends with ] and has , between items.
In the case of a json-per-line file it's as easy as:
import json
from itertools import islice
with(open(filename)) as f:
objects = (json.loads(line) for line in f)
objects = islice(objects, 200)
Note the differences:
you don't need .readlines(), the file object itself is an iterable that yields individual lines
parentheses (..) instead of brackets [..] in (... for line in f) create a lazy generator expression instead of a Python list in memory with all the lines
islice(objects, 200) will give you the first 200 items without iterating further. If objects would've been a list you could just do objects[:200]
Now, if your file is actually a JSON array then you indeed need ijson:
import ijson # or choose a faster backend if needed
from itertools import islice
with open(filename) as f:
objects = ijson.items(f, 'item')
objects = islice(objects, 200)
ijson.items returns a lazy iterator over a parsed array. The 'item' in the second parameter means "each item in a top-level array".
The problem is that not all JSON comes nicely formatted and you cannot rely on line-by-line parsing to extract your objects.
I understood your "acceptance criteria" as "want to collect only those JSON objects whose specified keys contain specified values". For example, only collecting objects about a person if that person's name is "Bob". The following function will provide a list of all objects that fit your criteria. Parsing is done character by character (something that would be much more efficient in C, but Python is still pretty good). This should be more robust because it doesn't care about newlines, formatting etc. I tested this on both formatted and unformatted JSON with 1,000,000 objects.
import json
def parse_out_objects(file, feature, desired_value):
with open(file) as f:
compose_object_flag = False
ignore_characters_flag = False
object_string = ''
selected_objects = []
json_object = None
while True:
c = f.read(1)
if c == '"':
ignore_characters_flag = not ignore_characters_flag
if c == '{' and ignore_characters_flag == False:
compose_object_flag = True
if c == '}' and compose_object_flag == True and ignore_characters_flag == False:
compose_object_flag = False
object_string = object_string + '}'
json_object = json.loads(object_string)
if json_object[feature] == desired_value:
selected_objects.append(json_object)
object_string = ''
if compose_object_flag == True:
object_string = object_string + c
if not c:
break
return selected_objects

spark.RDD take(n) returns array of element n, n times

I'm using code from https://github.com/alexholmes/json-mapreduce to read a multi-line json file into an RDD.
var data = sc.newAPIHadoopFile(
filepath,
classOf[MultiLineJsonInputFormat],
classOf[LongWritable],
classOf[Text],
conf)
I printed out the first n elements to check if it was working correctly.
data.take(n).foreach { p =>
val (line, json) = p
println
println(new JSONObject(json.toString).toString(4))
}
However when I try to look at the data, the arrays returned from take don't seem to be correct.
Instead of returning an array of the form
[ data[0], data[1], ... data[n] ]
it is in the form
[ data[n], data[n], ... data[n] ]
Is this an issue with the RDD I've created, or an issue with how I'm trying to print it?
I figured out why take it was returning an array with duplicate values.
As the API mentions:
Note: Because Hadoop's RecordReader class re-uses the same Writable object
for each record, directly caching the returned RDD will create many
references to the same object. If you plan to directly cache Hadoop
writable objects, you should first copy them using a map function.
Therefore in my case it was reusing the same LongWritable and Text objects. For example if I did:
val foo = data.take(5)
foo.map( r => System.identityHashCode(r._1) )
The output was:
Array[Int] = Array(1805824193, 1805824193, 1805824193, 1805824193, 1805824193)
So in order to prevent it from doing this, I simply mapped the reused objects to their respective values:
val data = sc.newAPIHadoopFile(
filepath,
classOf[MultiLineJsonInputFormat],
classOf[LongWritable],
classOf[Text],
conf ).map(p => (p._1.get, p._2.toString))