to write a list with nested dictionary to csv file - json

I'm trying to export some API JSON data to a csv. I've tried alot of methods refered here but I'm getting the error.
My json file is as attached:
[{u'src': {}, u'product': u'WEB_MPS', u'name':
u'DOMAIN_MATCH',u'explanation':
{u'osChanges': [],
u'malwareDetected':
{u'malware':
[{
u'md5Sum': u'www.jfaewrwergwea.com',
u'name': u'Raxxxxxe.WCry.InfectionFail'}]
}},
u'vlan': 0, u'sensorIp': u'10.165.100.31', u'occurred': u'2017-08-24 01:54:54 +0530',
u'alertUrl':u'', u'applianceId': u'002590879266', u'rootInfection': 7971, u'id': 860438060,
u'action': u'notified', u'sensor': u'W-inline', u'dst': {u'mac': u'00:1b:17:00:01:10'}, u'severity': u'MINR'},
{u'src': {}, u'product': u'WEB_MPS', u'name': u'DOMAIN_MATCH',
u'explanation': {u'osChanges': [], u'malwareDetected':
{u'malware': [{u'md5Sum': u'www.iuqerea.com', u'name': u'Raxxxxre.WCry.InfectionFail'}]}},
u'vlan': 0, u'sensorIp': u'10.165.100.31', u'occurred': u'2017-08-23 19:52:56 +0530',
u'alertUrl': u'', u'applianceId': u'002590879266', u'rootInfection': 7950, u'id': 860401215,
u'action': u'notified', u'sensor': u'W0-FW-inline', u'dst': {u'mac': u'00:1b:17:00:01:10'}, u'severity': u'MINR'},
How can i get my python code to convert??

Related

How to parse nested JSON file in Pandas

I'm trying to transform a JSON file generated by the Day One Journal to a text file using Python but hit a brick wall.
This is broadly the format:
{'metadata': {'version': '1.0'},
'entries': [{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"attributes":{"line":{"header":1,"identifier":"F78B28DA-488E-489E-9C95-1A0648099792"}},"text":"2022\\n"},{"attributes":{"line":{"header":0,"identifier":"FA8C6594-F43D-4652-B442-DAF72A379799"}},"text":"\\n"},{"attributes":{"line":{"header":0,"identifier":"0923BCC8-B24A-4C0D-963C-73D09561EECD"}},"text":"It’s the beginning of a new year"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]},{"text":"\\n\\n\\n\\n"},{"embeddedObjects":[{"type":"horizontalRuleLine"}]}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 2925.313938140869,
'timeZone': 'Europe/London',
'creationDeviceType': 'Hal 9000',
'uuid': '988D9D9876624FAEB88F9BCC666FD9CD',
'creationDeviceModel': 'MacBookPro15,2',
'starred': False,
'location': {'region': {'center': {'longitude': -0.0095,
'latitude': 51},
'radius': 75},
'localityName': 'London',
'country': 'United Kingdom',
'timeZoneName': 'Europe/London',
'administrativeArea': 'England',
'longitude': -0.0095,
'placeName': 'Somewhere',
'latitude': 51},
'isPinned': False,
'creationDevice': 'somedevice'...,
}
I only want the 'text' (of which there might be a number of 'text' entries and 'creationDate' so I've got a daily record.
My code to pull out the data is straightforward:
import json
# Opening JSON file
f = open('files/2022.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
I've tried using list comprensions and then concatenating the Series in Pandas, but two don't match in length - because multiple entries on one day mix up the dataframe.
I wanted to use this code, but:
result = []
for i in data['entries']:
entry = i['creationDate'] + i['text']
result.append(entry)
but I get this error:
KeyError: 'text'
What do I need to do?
Update:
{'richText': '{"meta":{"version":1,"small-lines-removed":true,"created":{"platform":"com.bloombuilt.dayone-mac","version":1344}},"contents":[{"text":"Later than I planned\\n"}]}',
'duration': 0,
'creationOSVersion': '12.1',
'weather': {'sunsetDate': '2022-01-12T16:15:28Z',
'temperatureCelsius': 7,
'weatherServiceName': 'HAMweather',
'windBearing': 230,
'sunriseDate': '2022-01-12T08:00:44Z',
'conditionsDescription': 'Mostly Clear',
'pressureMB': 1042,
'visibilityKM': 48.28020095825195,
'relativeHumidity': 81,
'windSpeedKPH': 6,
'weatherCode': 'clear-night',
'windChillCelsius': 6.699999809265137},
'editingTime': 672.3099998235703,
'timeZone': 'Europe/London',
'creationDeviceType': 'Computer',
'uuid': 'F53DCC5E05BB4106A49C76954117DBF4',
'creationDeviceModel': 'xompurwe',
'isPinned': False,
'creationDevice': 'Computer',
'text': 'Later than I planned \\\n',
'modifiedDate': '2022-01-05T01:01:29Z',
'isAllDay': False,
'creationDate': '2022-01-05T00:39:19Z',
'creationOSName': 'macOS'},
Sort of managed to work a solution - thank you to everyone who helped this morning, particularly #Tomer S.
My solution was:
result = []
for i in data['entries']:
print (i['creationDate'] + i['text'])
result.append(entry)
It still won't get what I want

json.loads function not giving python dictionary

I am trying to convert the below mentioned json string to python dictionary. I am using python 3's json package for the same. Here is the code that I am using :
a = "[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]"
b = json.loads(json.dumps(a))
print(type(b))
And the output that I am getting from the above code is:
<class 'str'>
I saw the similar questions asked in stackoverflow, but the solutions presented for those questions do not apply to my case.
The json string that you are trying to convert is not properly formatted. Also, you need to only call json.loads to convert string into dict or list.
The updated code would look like:
import json
a = '[{"id": 35, "name": "Comedy"}, {"id": 18, "name": "Drama"}, {"id": 10751, "name": "Family"}, {"id": 10749, "name": "Romance"}]'
b = json.loads(a)
print(type(b))
Hope this explains why you are not getting the expected results.
JSON Array is enclosed in [ ] while JSON object is enclosed in { }
The string in a is a json array so you can change that into a list only.
Your key and value should be enclosed with double quotes, that's the requirement to use json library of python.
b = json.loads(a) will give a list of dictionary objects.
To get further dictionary of dictionary you need to associate a key with each individual dictionary.
d = dict()
ind = 0
for data in b:
d[ind] = data
ind+=1
Now the output that you get will be
{0: {'id': 35, 'name': 'Comedy'}, 1: {'id': 18, 'name': 'Drama'}, 2: {'id': 10751, 'name': 'Family'}, 3: {'id': 10749, 'name': 'Romance'}}
which is a dictionary of dictionary.
Thank you

How Do I Serialize spaCy Custom Span Extensions as JSON?

I am using spaCy 2.1.6 to define a custom extension on a span.
>>> from spacy import load
>>> nlp = load("en_core_web_lg")
>>> from spacy.tokens import Span
>>> Span.set_extension('my_label', default=None)
>>> d = nlp("The fox jumped.")
>>> d[0:2]._.my_label = "ANIMAL"
>>> d[0:2]._.my_label
'ANIMAL'
The custom span extension does not appear when I serialize the document to JSON.
>>> d.to_json()
{'text': 'The fox jumped.',
'ents': [],
'sents': [{'start': 0, 'end': 15}],
'tokens': [{'id': 0,
'start': 0,
'end': 3,
'pos': 'DET',
'tag': 'DT',
'dep': 'det',
'head': 1},
{'id': 1,
'start': 4,
'end': 7,
'pos': 'NOUN',
'tag': 'NN',
'dep': 'nsubj',
'head': 2},
{'id': 2,
'start': 8,
'end': 14,
'pos': 'VERB',
'tag': 'VBD',
'dep': 'ROOT',
'head': 2},
{'id': 3,
'start': 14,
'end': 15,
'pos': 'PUNCT',
'tag': '.',
'dep': 'punct',
'head': 2}]}
(I'm specifically interested in custom annotation of Spans, but the same appears to be true of the JSON serialization of Doc object.)
Pickling and unpickling the document does preserve the custom extension.
How do I get the custom span extensions into the JSON serialization, or is that not supported?
Use this function and add your custom extensions any way you want:
def doc2json(doc: spacy.tokens.Doc, model: str):
json_doc = {
"text": doc.text,
"text_with_ws": doc.text_with_ws,
"cats": doc.cats,
"is_tagged": doc.is_tagged,
"is_parsed": doc.is_parsed,
"is_nered": doc.is_nered,
"is_sentenced": doc.is_sentenced,
}
ents = [
{"start": ent.start, "end": ent.end, "label": ent.label_} for ent in doc.ents
]
if doc.is_sentenced:
sents = [{"start": sent.start, "end": sent.end} for sent in doc.sents]
else:
sents = []
if doc.is_tagged and doc.is_parsed:
noun_chunks = [
{"start": chunk.start, "end": chunk.end} for chunk in doc.noun_chunks
]
else:
noun_chunks = []
tokens = [
{
"text": token.text,
"text_with_ws": token.text_with_ws,
"whitespace": token.whitespace_,
"orth": token.orth,
"i": token.i,
"ent_type": token.ent_type_,
"ent_iob": token.ent_iob_,
"lemma": token.lemma_,
"norm": token.norm_,
"lower": token.lower_,
"shape": token.shape_,
"prefix": token.prefix_,
"suffix": token.suffix_,
"pos": token.pos_,
"tag": token.tag_,
"dep": token.dep_,
"is_alpha": token.is_alpha,
"is_ascii": token.is_ascii,
"is_digit": token.is_digit,
"is_lower": token.is_lower,
"is_upper": token.is_upper,
"is_title": token.is_title,
"is_punct": token.is_punct,
"is_left_punct": token.is_left_punct,
"is_right_punct": token.is_right_punct,
"is_space": token.is_space,
"is_bracket": token.is_bracket,
"is_currency": token.is_currency,
"like_url": token.like_url,
"like_num": token.like_num,
"like_email": token.like_email,
"is_oov": token.is_oov,
"is_stop": token.is_stop,
"is_sent_start": token.is_sent_start,
"head": token.head.i,
}
for token in doc
]
return {
"model": model,
"doc": json_doc,
"ents": ents,
"sents": sents,
"noun_chunks": noun_chunks,
"tokens": tokens,
}
Since I ran into the same issue and the only other answer didnt really help my I thought I mide as well give other persons looking into this some hints.
Since Spacy 2.1 Spacy removed print_tree and added the to_json. to_json does not return custom extensions as "this method will output the same format as the JSON training data expected by spacy train" (https://spacy.io/usage/v2-1).
If you want to output your custom extension you need to write your own to_json function.
To do this I recommend extending the to_json() given by spacy.
Not really a fan of the other two answers here since they seem a bit overkill (extending the Doc object by #Chooklii or the custom but flaky doc2json method solution by #Laksh) so I'll just drop here what I did for one of my projects here and maybe that is useful to someone.
doc = <YOUR_DOC_OBJECT>
extra_fields = [field for field in dir(doc._) if field not in ('get', 'set', 'has')]
doc_json = doc.to_json()
doc_json.update({field: doc._.get(field) for field in extra_fields})
The doc_json should now have all the fields that you set via the Extensions interface provided by spaCy along with the fields set by other spaCy pipelines.

Is this JSON data parsed into Python dict correctly?

Cannot extract components of data parsed from JSON to Python dictionary.
I attempted to print the value corresponding with a dictionary entry but get an error.
import urllib, json, requests
url = "https://storage.googleapis.com/osbuddy-exchange/summary.json"
response = urllib.urlopen(url)
data = json.loads(response.read())
print type(data)
for key, value in data.iteritems():
print value
print ''
print "data['entry']: ", data['99']
print "name: ", data['name']```
I was hoping I could get attributes of an entry. Say the 'buy_average' given a specific key. Instead I get an error when referencing specific components.
<type 'dict'>
22467 {u'sell_average': 3001, u'buy_average': 0, u'name': u'Bastion potion(2)', u'overall_average': 3001, u'sp': 180, u'overall_quantity': 2, u'members': True, u'sell_quantity': 2, u'buy_quantity': 0, u'id': 22467}
22464 {u'sell_average': 4014, u'buy_average': 0, u'name': u'Bastion potion(3)', u'overall_average': 4014, u'sp': 270, u'overall_quantity': 612, u'members': True, u'sell_quantity': 612, u'buy_quantity': 0, u'id': 22464}
5745 {u'sell_average': 0, u'buy_average': 0, u'name': u'Dragon bitter(m)', u'overall_average': 0, u'sp': 2, u'overall_quantity': 0, u'members': True, u'sell_quantity': 0, u'buy_quantity': 0, u'id': 5745}
...
data['entry']: {u'sell_average': 7843, u'buy_average': 7845, u'name': u'Ranarr potion (unf)', u'overall_average': 7844, u'sp': 25, u'overall_quantity': 23838, u'members': True, u'sell_quantity': 15090, u'buy_quantity': 8748, u'id': 99}
name:
Traceback (most recent call last):
File "C:/Users/Michael/PycharmProjects/osrsGE/osrsGE.py", line 16, in <module>
print "name: ", data['name']
KeyError: 'name'
Process finished with exit code 1
There is no key named 'name' in the dict named 'data'.
The first level keys are numbers like: "6", "2", "8",etc
The seconds level object has a key named 'name' so code like:
print(data['2']['name']) # Cannonball
should work

Get JSON's attribute value in Chatterbot and Django integration

statement.text in chatterbot and Django integration returns
{'text': u'How are you doing?', 'created_at': datetime.datetime(2017, 2, 20, 7, 37, 30, 746345, tzinfo=<UTC>), 'extra_data': {}, 'in_response_to': [{'text': u'Hi', 'occurrence': 3}]}
I want a value of text attribute so that it prints How are you doing?
The chatterbot return the json object(dict) so you can use the dictionary operations like following
[1]: data = {'text': u'How are you doing?', 'created_at': datetime.datetime(2017, 2, 20, 7, 37, 30, 746345, tzinfo=<UTC>), 'extra_data': {}, 'in_response_to': [{'text': u'Hi', 'occurrence': 3}]}
[2]: data['text'] or data.get('text')[this approch is good].
What you got is dictionary. Value of dictionary can be obtained by get() function. You can also use dict['text'], but it does not perform error checking. get function returns None if key is not present.