Python 2.7: Generate JSON file with multiple query results in nested dict - json

What started as my personal initiative, ended up being a quiet interesting ( may I say, challenging to some degree) project. My company decided to phase out one product and replace it with new one, which instead of storing data in mdb files, uses JSON files. So I took the initiative to create a converter that will read already created mdb files and convert them into the new format JSON.
However, now I'm at wits-ends with this one:
I can read mdb files, run query to extract specific data.
By placing the targetobj inside the FOR LOOP, I managed to extract data for each row and fed into a dict(targetobj)
for val in rows:
targetobj={"connection_props": {"port": 7800, "service": "", "host": val.Hostname, "pwd": "", "username": ""},
"group_list": val.Groups, "cpu_core_cnt": 2, "target_name": "somename", "target_type": "somethingsamething",
"os": val.OS, "rule_list": [], "user_list": val.Users}
if I print targetobj to console, I can clearly get all extracted values for each row.
Now, my quest is to have the obtained results ( for each row), inserted into the main_dict under the key targets:[]. ( Please see sample of JSON file for illustration)
main_dict = {"changed_time": 0, "year": 0, "description": 'blahblahblah', 'targets':[RESULTS FROM TARGETOBJ SHOULD BE ADDED HERE],"enabled": False}
so for example my Json file should have structure such as:
{"changed_time":1234556,
"year":0,
"description":"blahblahblah",
"targets":[
{"group_list":["QA"],
"cpu_core_cnt":1,
"target_name":"NewTarget",
"os":"unix",
"target_type":"",
"rule_list":[],
"user_list":[""],"connection_props":"port":someport,"service":"","host":"host1","pwd":"","username":""}
},
{"group_list":[],
"cpu_core_cnt":2,
"target_name":"",
"os":"unix",
"target_type":"",
"rule_list":[],
"user_list":["Web2user"],
"connection_props":{"port":anotherport,"service":"","host":"host2","pwd":"","username":""}}
],
"enabled":false}
So far I've been tweaking here and there, to have the results written as intended, however each time,I'm getting only the last row values written.
ie.: putting the targetobj as a variable inside the targets:[]
{"changed_time": 0, "year": 0, "description": 'ConvertedConfigFile', 'targets':[targetobj],
I know I'm missing something, I just need to find what and where.
Any help would be highly appreciated.
thank you

Just create your main_dict first and append to it in your loop, i.e.:
main_dict = {"changed_time": 0,
"year": 0,
"description": "blahblahblah",
"targets": [], # a new list for the target objects
"enabled": False}
for val in rows:
main_dict["targets"].append({ # append this dict to the targets list of main_dict
"connection_props": {
"port": 7800,
"service": "",
"host": val.Hostname,
"pwd": "",
"username": ""},
"group_list": val.Groups,
"cpu_core_cnt": 2,
"target_name": "somename",
"target_type": "somethingsamething",
"os": val.OS,
"rule_list": [],
"user_list": val.Users
})

Related

pandas json_normalize nested json where dictionary only exists on some records

I am trying to run pandas.json_normalize on a data file that has highly varied, nested json, where the content of the records can vary considerably.
I am processing a house listing file and trying to pull out prices. The prices data is stored as follows, and 'prices' is at the first nesting level within the json file:
"prices": [
{
"amountMax": 420000,
"amountMin": 420000,
"availability": "false",
"currency": "USD",
"dateSeen": [
"2020-12-21T11:57:17.190Z",
"2020-12-25T02:35:41.009Z"
],
"isSale": "false",
"isSold": "true",
"pricePerSquareFoot": 235,
"sourceURLs": [
"https://www.redfin.com/FL/Coconut-Creek/.../home/4146834"
]
}, # followed by additional entries
I am using the following line of code, which works if I edit the input file down to a single record that includes a 'prices' section:
df3 = pd.json_normalize(df['records'], record_path='prices',
meta=['id'],
errors='ignore'
)
However, the full file includes many records that do not include a prices section. If I run the code against a file with 2 records (one with, one without), it fails with KeyError: 'prices'
Clearly the 'errors='ignore'' in the json_normalize is not enough to handle the error.
What can I do? I would just like to skip the records without prices entirely.
A list comprehension on your JSON will do it. I've synthesized some JSON to match your description of input data.
js = {
"records": [
{
"prices": [
{
"amountMax": 420000,
"amountMin": 420000,
"availability": "false",
"currency": "USD",
"dateSeen": [
"2020-12-21T11:57:17.190Z",
"2020-12-25T02:35:41.009Z"
],
"isSale": "false",
"isSold": "true",
"pricePerSquareFoot": 235,
"sourceURLs": [
"https://www.redfin.com/FL/Coconut-Creek/.../home/4146834"
]
}
],
"id": 1
},{"id":2}
]
}
pd.json_normalize({"records":[r for r in js["records"] if "prices" in r.keys()]}["records"],record_path="prices",meta="id")

Powershell comparing hashtable objects

I am trying to retrieve a server status, and see if it have changed since last run, using Powershell. My server outputs a json hashtable. The idea is to import the data and compare it to previous data, if it has changed, save the new data to disk and do some "things".
It's easy enough to import the json to a PSObject, and it seems the common way to store it to disk is using Export-Clixml. But reading back the data with Import-Clixml does not give an identical PSObject. At least they have different TypeName values.
My code boils down to this:
$newserverstatus = Get-Content C:\status.json | ConvertFrom-Json
#for real I am piping the output from the status command into
#ConvertFrom-Json, in testing I read the content from a file
$oldserverstatus = Import-Clixml C:\oldstatus.xml
#read the saved information from disk
if ($oldserverstatus -ne $newserverstatus)
{
#do "things" using the data#
$newserverstatus | Export-Clixml C:\oldstatus.xml
#and finish with saving the updated status to disk
}
If this worked, I wouldn't be posting here. As mentioned I have noticed the objects have different TypeName values, but the data seems identical.
Main question is if there is a way to make a comparison of object data, without looking at the PSObject metadata?
I am not a programmer, so there is naturally many different better ways to do this, I just thought this was a possible solution, to import the new data into an object and to compare it with an object created from saved data.
The json data that I work with is this:
{
"clusterName": "CLUSTER11",
"defaultReplicaSet": {
"name": "default",
"primary": "192.168.2.30:3306",
"ssl": "DISABLED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"192.168.2.30:3306": {
"address": "192.168.2.30:3306",
"mode": "R/W",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.19"
},
"192.168.2.31:3306": {
"address": "192.168.2.31:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.19"
},
"192.168.3.139:3306": {
"address": "192.168.3.139:3306",
"mode": "R/O",
"readReplicas": {},
"replicationLag": null,
"role": "HA",
"status": "ONLINE",
"version": "8.0.19"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "192.168.2.30:3306"
}
I have also tried saving the data in json format with ConvertTo-Json, and using ConvertFrom-Json to read the old status, to no help. The thought of saving the new server status straight to file, and compare that file with an old copy has crossed my mind, but to me it seems like a horribly ugly way of doing it.

Extract data from a JSON file using python

Say if I have JSON entry as follows(The JSON file generated by fetching data from a Firebase DB):
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
How do I extract the content corresponding to 'value' which is present inside column 'bills' . Any way to do this ?
My python code is as follows. With this I was only able to get data within bills column. But I need only the entry corresponding to 'value' which is present inside bills.
import json
filedata = open('firebase-dataset.json','r')
data = json.load(filedata)
listoffields = [] # To produce it into a list with fields
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
print(listoffields)
The JSON you posted contains misplaced quotes.
I think you are trying to extract the value of 'value' column within bills.
try this
print(listoffields[0][0]['value'])
which will print you 100.0 as str. use float() to use it in calculations.
---edit---
Say the JSON you having contains many JSON objects separated by commas as..
[{ first-entry },{ second-entry },{ third.. }, ....and so on]
..and you want to find the value of each bill in the each JSON obj..
may be the code below will work.-
bill_value_list = [] # to store 'value' of each bill
for bill_list in listoffields:
bill_value_list.append(float(bill_list[0]['value'])) # blill_list[0] will contain complete bill dictionary.
print(bill_value_list)
print(sum(bill_value_list)) # do something usefull
Paste it after the code you posted.(no changes to your code .. since it always works :-) )

assign values of nested dict to list in python

I have file that is list of JSON objects. It looks like this :
[
{
"id": 748,
"location": {
"slug": "istanbul",
"parent": {
"id": 442,
"slug": "turkey"
}
},
"rank": 110
},
{
"id": 769,
"location": {
"slug": "dubai",
"parent": {
"id": 473,
"slug": "uae"
}
},
"rank": 24
}
]
I want to create a list of hotel parent names, so i write this code to do this, I read the JSON file and assigned it to a variable, that part is correct. But look at this code :
with open('hotels.json', 'r', encoding="utf8") as hotels_data:
hotels = json.load(hotels_data)
parents_list = []
for item in hotels:
if item["location"]["parent"]["slug"] not in parents_list:
parents_list.append(item["location"]["parent"])
when i run this code, i give this error :
if item["location"]["parent"]["slug"] not in parents_list:
TypeError: 'NoneType' object is not subscriptable
This code does not work, so I tried to print the JSON objects so I wrote this in the loop:
print(item["location"]["parent"]["slug"])
This code prints the values I want, but also give me the exact same error.
thank you for any help.
I tried running the code and it seems to be working fine with your dataset.
However, instead of opening the file to read the data, I just assigned hotels with your dataset, hotels = [...].
The result I got was this:
[{'id': 442, 'slug': 'turkey'}, {'id': 473, 'slug': 'uae'}]
What is your result if you print hotels, is it the same as you shown here?
If you actually have a lot more data in your dataset, then I can presume that some of the dictionaries don't contain item["location"]["parent"]["slug"]. If that is the case, you should skip those by checking if that element exists in each item first before reading off from the parents_list.
For example:
try:
item["location"]["parent"]["slug"]
except (KeyError, TypeError) as e:
pass
else:
if item["location"]["parent"]["slug"] not in parents_list:
parents_list.append(item["location"]["parent"])
I cannot replicate the same error as you. The only thing that I can think of is that the last item in each object in the JSON shouldn't have a comma after it. See if that fixes your error

Access deeper elements of a JSON using postgresql 9.4

I want to be able to access deeper elements stored in a json in the field json, stored in a postgresql database. For example, I would like to be able to access the elements that traverse the path states->events->time from the json provided below. Here is the postgreSQL query I'm using:
SELECT
data#>> '{userId}' as user,
data#>> '{region}' as region,
data#>>'{priorTimeSpentInApp}' as priotTimeSpentInApp,
data#>>'{userAttributes, "Total Friends"}' as totalFriends
from game_json
WHERE game_name LIKE 'myNewGame'
LIMIT 1000
and here is an example record from the json field
{
"region": "oh",
"deviceModel": "inHouseDevice",
"states": [
{
"events": [
{
"time": 1430247045.176,
"name": "Session Start",
"value": 0,
"parameters": {
"Balance": "40"
},
"info": ""
},
{
"time": 1430247293.501,
"name": "Mission1",
"value": 1,
"parameters": {
"Result": "Win ",
"Replay": "no",
"Attempt Number": "1"
},
"info": ""
}
]
}
],
"priorTimeSpentInApp": 28989.41467999999,
"country": "CA",
"city": "vancouver",
"isDeveloper": true,
"time": 1430247044.414,
"duration": 411.53,
"timezone": "America/Cleveland",
"priorSessions": 47,
"experiments": [],
"systemVersion": "3.8.1",
"appVersion": "14312",
"userId": "ef617d7ad4c6982e2cb7f6902801eb8a",
"isSession": true,
"firstRun": 1429572011.15,
"priorEvents": 69,
"userAttributes": {
"Total Friends": "0",
"Device Type": "Tablet",
"Social Connection": "None",
"Item Slots Owned": "12",
"Total Levels Played": "0",
"Retention Cohort": "Day 0",
"Player Progression": "0",
"Characters Owned": "1"
},
"deviceId": "ef617d7ad4c6982e2cb7f6902801eb8a"
}
That SQL query works, except that it doesn't give me any return values for totalFriends (e.g. data#>>'{userAttributes, "Total Friends"}' as totalFriends). I assume that part of the problem is that events falls within a square bracket (I don't know what that indicates in the json format) as opposed to a curly brace, but I'm also unable to extract values from the userAttributes key.
I would appreciate it if anyone could help me.
I'm sorry if this question has been asked elsewhere. I'm so new to postgresql and even json that I'm having trouble coming up with the proper terminology to find the answers to this (and related) questions.
You should definitely familiarize yourself with the basics of json
and json functions and operators in Postgres.
In the second source pay attention to the operators -> and ->>.
General rule: use -> to get a json object, ->> to get a json value as text.
Using these operators you can rewrite your query in the way which returns correct value of 'Total Friends':
select
data->>'userId' as user,
data->>'region' as region,
data->>'priorTimeSpentInApp' as priotTimeSpentInApp,
data->'userAttributes'->>'Total Friends' as totalFriends
from game_json
where game_name like 'myNewGame';
Json objects in square brackets are elements of a json array.
Json arrays may have many elements.
The elements are accessed by an index.
Json arrays are indexed from 0 (the first element of an array has an index 0).
Example:
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
-- returns "Mission1"
select
data->'states'->0->'events'->1->>'name'
from game_json
where game_name like 'myNewGame';
This did help me