How join/merge/update JSON dictionaries without overwriting data - json

I have a JSON list of dictionaries like so:
data = [{
"title": "Bullitt",
"release_year": "1968",
"locations": "1153-57 Taylor Street",
"fun_facts": "Embarcadero Freeway, which was featured in the film was demolished in 1989 because of structural damage from the 1989 Loma Prieta Earthquake)",
"production_company": "Warner Brothers / Seven Arts\nSeven Arts",
"distributor": "Warner Brothers",
"director": "Peter Yates",
"writer": "Alan R. Trustman",
"actor_1": "Steve McQueen",
"actor_2": "Jacqueline Bisset",
"actor_3": "Robert Vaughn",
"id": 498
},
{
"title": "Bullitt",
"release_year": "1968",
"locations": "John Muir Drive (Lake Merced)",
"production_company": "Warner Brothers / Seven Arts\nSeven Arts",
"distributor": "Warner Brothers",
"director": "Peter Yates",
"writer": "Alan R. Trustman",
"actor_1": "Steve McQueen",
"actor_2": "Jacqueline Bisset",
"actor_3": "Robert Vaughn",
"id": 499
}]
How do I combine these dictionaries without overwriting the data?
So, the final result which I am trying to get is:
data = {
"title": "Bullitt",
"release_year": "1968",
"locations": ["1153-57 Taylor Street", "John Muir Drive (Lake Merced)"]
"fun_facts": "Embarcadero Freeway, which was featured in the film was demolished in 1989 because of structural damage from the 1989 Loma Prieta Earthquake)",
"production_company": "Warner Brothers / Seven Arts\nSeven Arts",
"distributor": "Warner Brothers",
"director": "Peter Yates",
"writer": "Alan R. Trustman",
"actor_1": "Steve McQueen",
"actor_2": "Jacqueline Bisset",
"actor_3": "Robert Vaughn",
"id": 498, 499
}
I looked into merging JSON objects but all I came across was overwriting data. I do not want to overwrite anything. Not really sure how to approach this problem.
Would I have to make an empty list for the locations field and search through the entire data set looking for titles that are the same and take their locations and append them to the empty list and then finally update the dictionary? Or is there a better way/best practice when it comes to something like this?

This is one approach using a simple iteration.
Ex:
result = {}
tolook = ('locations', 'id')
for d in data:
if d['title'] not in result:
result[d['title']] = {k: [v] if k in tolook else v for k, v in d.items()}
else:
for i in tolook:
result[d['title']][i].append(d[i])
print(result) # Or result.values()
Output:
{'Bullitt': {'actor_1': 'Steve McQueen',
'actor_2': 'Jacqueline Bisset',
'actor_3': 'Robert Vaughn',
'director': 'Peter Yates',
'distributor': 'Warner Brothers',
'fun_facts': 'Embarcadero Freeway, which was featured in the film '
'was demolished in 1989 because of structural damage '
'from the 1989 Loma Prieta Earthquake)',
'id': [498, 499],
'locations': ['1153-57 Taylor Street',
'John Muir Drive (Lake Merced)'],
'production_company': 'Warner Brothers / Seven Arts\nSeven Arts',
'release_year': '1968',
'title': 'Bullitt',
'writer': 'Alan R. Trustman'}}

python Dictionary
-----------------
Dictionaries store data values in key:value pairs. A collection which is unordered, changeable and does not allow duplicates.
thisdict = {
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
python List
-----------
Lists are used to store multiple items in a single variable.
We can change, add, and remove items in a list after it has been created.
Since lists are indexed, lists can have items with the same value:
mylist = ["apple", "banana", "cherry"]
heres my logic, hope it helps.
------------------------------
temp = {}
for each dictionary in data[{},{}] {
for each key in dictionary.keys {
does temp.keys contain key {
for each value in dictionary.key.values {
does value exist in temp.key.values {
# do nothing
}
else {add value to corresponding temp.key.values}
}
} else {(add key value pair)}
}
}

Related

Create table from JSON file in PostgreSQL

I'm just learning how to use PostgreSQL and JSON. I came across this great tutorial, but the syntax was made for SQL Server. I am trying to take the following JSON file and begin parsing it into a table with columns for squad, name, age, and powers.
The JSON code is
CREATE TABLE heroes (
id serial NOT NULL PRIMARY KEY,
info json NOT NULL
);
insert into heroes (info)
values (('
{
"squadName": "Super hero squad",
"homeTown": "Metro City",
"formed": 2016,
"secretBase": "Super tower",
"active": true,
"members": [
{
"name": "Molecule Man",
"age": 29,
"secretIdentity": "Dan Jukes",
"powers": [
"Radiation resistance",
"Turning tiny",
"Radiation blast"
]
},
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
}
'::json));
I can access the first level info of the JSON with no issue, eg
SELECT info -> 'squadName' AS squad from heroes; or SELECT info -> 'active' AS active from heroes;
However, when trying to dig deeper into the JSON, I end up with a single row, the correct squad name and NULL for member names:
SELECT info -> 'squadName' AS Squad,
info ->'members' ->> 'name' AS Name
from heroes;
The tutorial uses CROSS APPLY OPENJSON(..) to handle this, but I am not sure of what to do in PostgreSQL.
Any help would be appreciated. I am using this as a learning exercise.
You can do a lateral cross join of json_array_elements().
SELECT h.info->>'squadName' AS squad,
m.m->>'name' AS name
FROM heroes h
CROSS JOIN LATERAL json_array_elements(h.info->'members') m
(m);
db<>fiddle
But as a side note: The schema of the JSON looks pretty static to me. You should consider not to abuse JSON but use relational means like (lookup and/or linking) tables and columns instead.

json.load loads a string instead of json

I have a list of dictionaries written to a data.txt file. I was expecting to be able to read the list of dictionaries in a normal way when I load, but instead, I seem to load up a string.
For example - when I print(data[0]), I was expecting the first dictionary in the list, but instead, I got "[" instead.
Below attached is my codes and txt file:
read_json.py
import json
with open('./data.txt', 'r') as json_file:
data = json.load(json_file)
print(data[0])
data.txt
"[
{
"name": "Disney's Mulan (Mandarin) PG13 *",
"cast": [
"Jet Li",
"Donnie Yen",
"Yifei Liu"
],
"genre": [
"Action",
"Adventure",
"Drama"
],
"language": "Mandarin with no subtitles",
"rating": "PG13 - Some Violence",
"runtime": "115",
"open_date": "18 Sep 2020",
"description": "\u201cMulan\u201d is the epic adventure of a fearless young woman who masquerades as a man in order to fight Northern Invaders attacking China. The eldest daughter of an honored warrior, Hua Mulan is spirited, determined and quick on her feet. When the Emperor issues a decree that one man per family must serve in the Imperial Army, she steps in to take the place of her ailing father as Hua Jun, becoming one of China\u2019s greatest warriors ever."
},
{
"name": "The New Mutants M18",
"cast": [
"Maisie Williams",
"Henry Zaga",
"Anya Taylor-Joy",
"Charlie Heaton",
"Alice Braga",
"Blu Hunt"
],
"genre": [
"Action",
"Sci-Fi"
],
"language": "English",
"rating": "M18 - Some Mature Content",
"runtime": "94",
"open_date": "27 Aug 2020",
"description": "Five young mutants, just discovering their abilities while held in a secret facility against their will, fight to escape their past sins and save themselves."
}
]"
The above list is formatted properly for easy reading but the actual file is a single line and the different lines are denoted with "\n". Thanks for any help.
remove double quote in data.txt is useful for me。
eg. modify
"[{...},{...}]"
to
[{...},{...}]
Hope it helps!

reshape jq nested file and make csv

I've been struggling with this one for the whole day which i want to turn to a csv.
It represents the officers attached to company whose number is "OC418979" in the UK Company House API.
I've already truncated the json to contain just 2 objects inside "items".
What I would like to get is a csv like this
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
...
There are 2 extra complication: there are 2 types of "officers", some are people, some are companies, so not all key in people are present in the other and viceversa. I'd like these entries to be 'null'. Second complication is those nested objects like "name" which contains a comma in it! or address, which contains several sub-objects (which I guess I could flatten in pandas tho).
{
"total_results": 13,
"resigned_count": 9,
"links": {
"self": "/company/OC418979/officers"
},
"items_per_page": 35,
"etag": "bc7955679916b089445c9dfb4bc597aa0daaf17d",
"kind": "officer-list",
"active_count": 4,
"inactive_count": 0,
"start_index": 0,
"items": [
{
"officer_role": "llp-designated-member",
"name": "BARRICK, David James",
"date_of_birth": {
"year": 1984,
"month": 1
},
"appointed_on": "2017-09-15",
"country_of_residence": "England",
"address": {
"country": "United Kingdom",
"address_line_1": "Old Gloucester Street",
"locality": "London",
"premises": "27",
"postal_code": "WC1N 3AX"
},
"links": {
"officer": {
"appointments": "/officers/d_PT9xVxze6rpzYwkN_6b7og9-k/appointments"
}
}
},
{
"links": {
"officer": {
"appointments": "/officers/M2Ndc7ZjpyrjzCXdFZyFsykJn-U/appointments"
}
},
"address": {
"locality": "Tadcaster",
"country": "United Kingdom",
"address_line_1": "Westgate",
"postal_code": "LS24 9AB",
"premises": "5a"
},
"identification": {
"legal_authority": "UK",
"identification_type": "non-eea",
"legal_form": "UK"
},
"name": "PREMIER DRIVER LIMITED",
"officer_role": "corporate-llp-designated-member",
"appointed_on": "2017-09-15"
}
]
}
What I've been doing is creating new json objects extracting the fields I needed like this:
{officer_address:.items[]?.address, appointed_on:.items[]?.appointed_on, country_of_residence:.items[]?.country_of_residence, officer_role:.items[]?.officer_role, officer_dob:items.date_of_birth, officer_nationality:.items[]?.nationality, officer_occupation:.items[]?.occupation}
But the query runs for hours - and I am sure there is a quicker way.
Right now I am trying this new approach - creating a json whose root is the company number and as argument a list of its officers.
{(.links.self | split("/")[2]): .items[]}
Using jq, it's easier to extract values from the top-level object that will be shared and generate the desired rows. You'll want to limit the amounts of times you go through the items to at most once.
$ jq -r '(.links.self | split("/")[2]) as $companyCode
| .items[]
| [ $companyCode, .country_of_residence, .officer_role, .appointed_on ]
| #csv
' input.json
Ok, you want to scan the list of officers, extract some fields from there if they are present and write that in csv format.
First part is to extract the data from the json. Assuming you loaded it is a data Python object, you have:
print(data['items'][0]['officer_role'], data['items'][0]['appointed_on'],
data['items'][0]['country_of_residence'])
gives:
llp-designated-member 2017-09-15 England
Time to put everything together with the csv module:
import csv
...
with open('output.csv', 'w', newline='') as fd:
wr = csv.writer(fd)
for officer in data['items']:
_ = wr.writerow(('OC418979',
officer.get('country_of_residence',''),
officer.get('officer_role', ''),
officer.get('appointed_on', '')
))
The get method on a dictionnary allows to use a default value (here the empty string) if the key is not present, and the csv module ensures that if a field contains a comma, it will be enclosed in quotation marks.
With your example input, it gives:
OC418979,England,llp-designated-member,2017-09-15
OC418979,,corporate-llp-designated-member,2017-09-15

How to take All JSON Name From File and store it into string

I want to take all names from JSON delhi_hos file and store it in String, So that player can play That String. Right Now It plays one by one name. So Suggest Something Please.
JSON Example
[
{
"id": 1,
"name": "JW Marriott Hotel",
"country": "IN"
},
{
"id": 2,
"name": "Le Méridien Hotel",
"country": "IN"
},
{
"id": 3,
"name": "The Leela Palace Hotel",
"country": "IN"
}
]
My Code
if "Hospital".lower() in sentence.lower():
# print("Which City?")
# print("1.Surat \n 2.Pune \n 3.Delhi")
with open("store.txt", 'a') as store:
store.truncate(0)
if element['name'].lower() in sentence.lower():
for items in delhi_hos:
name3 = items.get('name')
print(name3)
my_text = "Near is " + name3
my_obj = gTTS(text=my_text, lang=language, slow=False)
my_obj.save("welcome.mp3")
os.system("mpg123.exe welcome.mp3")
with open("store.txt", "r+") as text_file:
text_file.truncate(0)
and I want Names From JSON file in string Like This
"JW Marriott Hotel Le Méridien Hotel The Leela Palace Hotel"
and the Store it in the variable.
var = "JW Marriott Hotel Le Méridien Hotel The Leela Palace Hotel"
So, I can use var as a input for my player to play this string.
My main Problem Is to convert all name into this string.
I don't know if I understand your question right. But as far as I understand, you simply want to have the values of the namefield in your JSON stringifyed.
I'm wondering why whats solution doesn't work as expected.
The simplest solution that comes to my mind would be:
names = ""
for item in delhi_hos:
names += item['name']
I suggest to start with small bites of a problem, when things don't work out. First add a print, when this works, assign variables, then add if statements and so on! Good Look!
I hope this solves your problem. Add proper indentation if any error comes.
import functools
obj = [
{
"id": 1,
"name": "JW Marriott Hotel",
"country": "IN"
},
{
"id": 2,
"name": "Le Méridien Hotel",
"country": "IN"
},
{
"id": 3,
"name": "The Leela Palace Hotel",
"country": "IN"
}
]
var = functools.reduce(lambda a, b : a + " " + b["name"], obj, "")
print(var)

iterate through nested JSON object and get values with Python

I am using Python; and I need to iterate through JSON objects and retrieve nested values. A snippet of my data follows:
"bills": [
{
"url": "http:\/\/maplight.org\/us-congress\/bill\/110-hr-195\/233677",
"jurisdiction": "us",
"session": "110",
"prefix": "H",
"number": "195",
"measure": "H.R. 195 (110\u003csup\u003eth\u003c\/sup\u003e)",
"topic": "Seniors' Health Care Freedom Act of 2007",
"last_update": "2011-08-29T20:47:44Z",
"organizations": [
{
"organization_id": "22973",
"name": "National Health Federation",
"disposition": "support",
"citation": "The National Health Federation (n.d.). \u003ca href=\"http:\/\/www.thenhf.com\/government_affairs_federal.html\"\u003e\u003ccite\u003e Federal Legislation on Consumer Health\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from The National Health Federation.",
"catcode": "J3000"
},
{
"organization_id": "27059",
"name": "A Christian Perspective on Health Issues",
"disposition": "support",
"citation": "A Christian Perspective on Health Issues (n.d.). \u003ca href=\"http:\/\/www.acpohi.ws\/page1.html\"\u003e\u003ccite\u003ePart E - Conclusion\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from .",
"catcode": "X7000"
},
{
"organization_id": "27351",
"name": "Natural Health Roundtable",
"disposition": "support",
"citation": "Natural Health Roundtable (n.d.). \u003ca href=\"http:\/\/naturalhealthroundtable.com\/reform_agenda\"\u003e\u003ccite\u003eNatural Health Roundtable SUPPORTS the following bills\u003c\/cite\u003e\u003c\/a\u003e. Retrieved August 6, 2008, from Natural Health Roundtable.",
"catcode": "J3000"
}
]
},
I need to go through each object in "bills" and get "session", "prefix", etc. and I also need go through each "organizations" and get "name", "disposition", etc. I have the following code:
import csv
import json
path = 'E:/Thesis/thesis_get_data'
with open (path + "/" + 'maplightdata110congress.json',"r") as f:
data = json.load(f)
a = data['bills']
b = data['bills'][0]["prefix"]
c = data['bills'][0]["number"]
h = data['bills'][0]['organizations'][0]
e = data['bills'][0]['organizations'][0]['name']
f = data['bills'][0]['organizations'][0]['catcode']
g = data['bills'][0]['organizations'][0]['catcode']
for i in a:
for index in e:
print ('name')
and it returns the string 'name' a bunch of times.
Suggestions?
This might help you.
def func1(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if type(value) == type(dict()):
func1(value)
elif type(value) == type(list()):
for val in value:
if type(val) == type(str()):
pass
elif type(val) == type(list()):
pass
else:
func1(val)
func1(data)
All you have to do is to pass the JSON Object as Dictionary to the Function.
There is also this python library that might help you with this.You can find this here -> JsonJ
PEACE BRO!!!
I found the solution on another forum and wanted to share with everyone here in case this comes up again for someone.
import csv
import json
path = 'E:/Thesis/thesis_get_data'
with open (path + "/" + 'maplightdata110congress.json',"r") as f:
data = json.load(f)
for bill in data['bills']:
for organization in bill['organizations']:
print (organization.get('name'))`
refining to #Joish's answer
def func1(data):
for key,value in data.items():
print (str(key)+'->'+str(value))
if isinstance(value, dict):
func1(value)
elif isinstance(value, list):
for val in value:
if isinstance(val, str):
pass
elif isinstance(val, list):
pass
else:
func1(val)
func1(data)
Same as implemented here
This question is double nested so two for loops makes sense.
Here's an extract from Pluralsight using their GraphGL with an example that goes three levels deep to get either Progress, User or Course info:
{
"data": {
"courseProgress": {
"nodes": [
{
"user": {
"id": "1",
"email": "a#a.com",
"startedOn": "2019-07-26T05:00:50.523Z"
},
"course": {
"id": "22",
"title": "Building Machine Learning Models in Python with scikit-learn"
},
"percentComplete": 34.0248,
"lastViewedClipOn": "2019-07-26T05:26:54.404Z"
}
]
}
}
}
The code to parse this JSON:
for item in items["data"]["courseProgress"]["nodes"]:
print(item["user"].get('email'))
print(item["course"].get('title'))
print(item.get('percentComplete'))
print(item.get('lastViewedClipOn'))