So I am doing some work in python where I have to generate a series of dictionaries. I want to write each of these dictionaries to a single file.
The code to write the dictionaries look like this
with open('some_name.json', 'w') as fh:
data = function_generate_dict() # returns a dictionary
json.dump(data, fh)
That works fine and I can view the outputted file and can even load its content like thus
with open('some_name.json', 'r+') as rh:
for line in rh.readlines():
print(line)
But when I try to reload each dictionary from the file by doing this
with open('some_name', 'r') as rh:
cont = rh.read()
js =json.loads(cont)
I always get a JSONDecodeError: Extra data: line 1 column 220 (char 219)
which I suspect is coming from where one dictionary ends and another begins.
If I do this (json.load() instead of json.loads())
with open('some_name', 'r') as rh:
cont = rh.read()
js =json.load(cont)
I get this error AttributeError: 'str' object has no attribute 'read'
I have even tried using jsonl as the file format. But it doesn't work.
Here is a sample of the dictionaries I'm generating
{"measure_no": "0", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["G"], "alter": ["1"], "octave": ["6"], "duration": ["256"], "syllabic": [], "text": []}{"measure_no": "1", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["G", "G", "G", "G"], "alter": ["1", "1", "1", "1"], "octave": ["6", "6", "6", "6"], "duration": ["384", "128", "256", "256"], "syllabic": [], "text": []}{"measure_no": "2", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["C", "G", "G"], "alter": ["1", "1", "1"], "octave": ["7", "6", "6"], "duration": ["384", "128", "512"], "syllabic": [], "text": []}
Your code dumps your dictionary to the end of the existing line:
This is 1: hard to read for human beings.
2: hard to read for the "readlines" function.
Add a new line after you dump your dictionnary.
with open('some_name.json', 'a') as fh:
''' note the use of 'a' instead of 'w', you want to append your
dictionnaries, not overwrite them every time '''
data = function_generate_dict() # returns a dictionary
json.dump(data, fh)
fh.write('\n')
Then when you want to read: from the file to a dictionary : you can do it with a loop which reads every line as a different json dictionary:
The way I would see it is make an empty dictionnary first to store every line.
jsonlist = []
'''make a list to store the json dictionaries'''
with open('some_name','r') as rh:
for line in rh.readlines():
jsonlist.append(json.loads(line))
You now have variable jsonlist for which each index is one of your json dictionaries, all that is left for you is to manipulate those indexes.
>>> jsonlist[0]
{'measure_no': '0', 'divisions': '256', 'fifths': '5', 'mode': 'major', 'beats': '4', 'beat-type': '4', 'transpose': '-9', 'step': ['G'], 'alter': ['1'], 'octave': ['6'], 'duration': ['256'], 'syllabic': [], 'text': []}
Related
Trying to get Json data to csv i am getting the values but one block is showing as one line in result, new to python so any help appriciated. Have tried the below code to do the same.
import pandas as pd
with open(r'C:\Users\anath\hard.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
df.to_csv(r'C:\Users\anath\csvfile.csv', encoding='utf-8', index=True)
Sample Json in the source file, short snippet
{
"issues": [
{
"issueId": 110052,
"revision": 84,
"definitionId": "DNS1012",
"subject": "urn:h:domain:fitestdea.com",
"subjectDomain": "fitestdea.com",
"title": "Nameserver name doesn\u0027t resolve to an IPv6 address",
"category": "DNS",
"severity": "low",
"cause": "urn:h:domain:ns1.gname.net",
"causeDomain": "ns1.gname.net",
"open": true,
"status": "active",
"auto": true,
"autoOpen": true,
"createdOn": "2022-09-01T02:29:09.681451Z",
"lastUpdated": "2022-11-23T02:26:28.785601Z",
"lastChecked": "2022-11-23T02:26:28.785601Z",
"lastConfirmed": "2022-11-23T02:26:28.785601Z",
"details": "{}"
},
{
"issueId": 77881,
"revision": 106,
"definitionId": "DNS2001",
"subject": "urn:h:domain:origin-mx.stagetest.test.com.test.com",
"subjectDomain": "origin-mx.stagetest.test.com.test.com",
"title": "Dangling domain alias (CNAME)",
"category": "DNS",
"severity": "high",
"cause": "urn:h:domain:origin-www.stagetest.test.com.test.com",
"causeDomain": "origin-www.stagetest.test.com.test.com",
"open": true,
"status": "active",
"auto": true,
"autoOpen": true,
"createdOn": "2022-08-10T09:34:36.929071Z",
"lastUpdated": "2022-11-23T09:33:32.553663Z",
"lastChecked": "2022-11-23T09:33:32.553663Z",
"lastConfirmed": "2022-11-23T09:33:32.553663Z",
"details": "{\"#type\": \"hardenize/com.hardenize.schemas.dns.DanglingProblem\", \"rrType\": \"CNAME\", \"rrDomain\": \"origin-mx.stagetest.test.com.test.com\", \"causeDomain\": \"origin-www.stagetest.test.com.test.com\", \"danglingType\": \"nxdomain\", \"rrEffectiveDomain\": \"origin-mx.stagetest.test.com.test.com\"}"
}
}
]
}
Output i am getting is as below was looking a way where could field name in header and values in a column or cell so far getting the entire record in 1 cell. Any way we can just get specific field only like title, severity or issueid not everything but only the feilds i need.
Try:
import json
import pandas as pd
with open("your_file.json", "r") as f_in:
data = json.load(f_in)
df = pd.DataFrame(data["issues"])
print(df[["title", "severity", "issueId"]])
Prints:
title severity issueId
0 Nameserver name doesn't resolve to an IPv6 address low 110052
1 Dangling domain alias (CNAME) high 77881
To save as CSV you can do:
df[["title", "severity", "issueId"]].to_csv('data.csv', index=False)
try this...
df = pd.json_normalize(inputfile)
in place of the line you have.
Finally this worked for me #Andrej Kesely thanks for the inputs. sharing as might help others.
import pandas as pd
import json
with open(r'C:\Users\anath\hard.json', encoding='utf-8') as inputfile:
data = json.load(inputfile)
df = pd.DataFrame(data["issues"])
print(df[["title", "severity", "issueId"]])
df[["title", "severity", "issueId"]].to_csv('data.csv', index=False)
Say if I have JSON entry as follows(The JSON file generated by fetching data from a Firebase DB):
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
How do I extract the content corresponding to 'value' which is present inside column 'bills' . Any way to do this ?
My python code is as follows. With this I was only able to get data within bills column. But I need only the entry corresponding to 'value' which is present inside bills.
import json
filedata = open('firebase-dataset.json','r')
data = json.load(filedata)
listoffields = [] # To produce it into a list with fields
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
print(listoffields)
The JSON you posted contains misplaced quotes.
I think you are trying to extract the value of 'value' column within bills.
try this
print(listoffields[0][0]['value'])
which will print you 100.0 as str. use float() to use it in calculations.
---edit---
Say the JSON you having contains many JSON objects separated by commas as..
[{ first-entry },{ second-entry },{ third.. }, ....and so on]
..and you want to find the value of each bill in the each JSON obj..
may be the code below will work.-
bill_value_list = [] # to store 'value' of each bill
for bill_list in listoffields:
bill_value_list.append(float(bill_list[0]['value'])) # blill_list[0] will contain complete bill dictionary.
print(bill_value_list)
print(sum(bill_value_list)) # do something usefull
Paste it after the code you posted.(no changes to your code .. since it always works :-) )
I have a text file which contains raw data. I want to parse that data and clean it so that it can be used further.The following is the rawdata.
"{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x221a0d4d6e-0c00-11e7-a16f-0242ac110002\x22,\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-03-22T03:21:11+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
I want to remove all the hexadecimal characters,I tried parsing the data and storing in an array and cleaning it using re.sub() but it gives the same data.
for line in f:
new_data = re.sub(r'[^\x00-\x7f],\x22',r'', line)
data.append(new_data)
\x0A is the hex code for newline. After s = <your json string>, print(s) gives
>>> print(s)
{
"identifier": {
"company_code": "TSC",
"product_type": "airtime-ctg",
"host_type": "android"
},
"id": {
"type": "guest",
"group": "guest",
"uuid": "1a0d4d6e-0c00-11e7-a16f-0242ac110002",
"device_id": "423e49efa4b8b013"
},
"stats": [
{
"timestamp": "2017-03-22T03:21:11+0000",
"software_id": "A-ACTG",
"action_id": "open_app",
"values": {
"device_id": "423e49efa4b8b013",
"language": "en"
}
}
]
}
You should parse this with the json module load (from file) or loads (from string) functions. You will get a dict with 2 dicts and a list with a dict.
I'm using Talend ETL Tool and extracting data from json files and storing them in Mysql database.
But I get the error while reading in very first json. For reading json I'm using tExtractJSONFileds component.
I'm sure about the configuation set up in talend etl tool its right. I believe there is some problem in json file.
While extracting the component shows error like this
Exception in component tExtractJSONFields_1
javax.xml.stream.XMLStreamException: java.io.IOException: Unexpected symbol: COMMA
at de.odysseus.staxon.base.AbstractXMLStreamReader.initialize(AbstractXMLStreamReader.java:218)
at de.odysseus.staxon.json.JsonXMLStreamReader.<init>(JsonXMLStreamReader.java:65)
at de.odysseus.staxon.json.JsonXMLInputFactory.createXMLStreamReader(JsonXMLInputFactory.java:148)
at de.odysseus.staxon.json.JsonXMLInputFactory.createXMLStreamReader(JsonXMLInputFactory.java:44)
at de.odysseus.staxon.base.AbstractXMLInputFactory.createXMLEventReader(AbstractXMLInputFactory.java:118)
I dont know how to deal with JSONs, So Acc to this error can anyone help me where could be the error in JSON file ?
Is there any value passed as NULL or something else ?
Sample JSON
[
[, {
"tstamp": "123456",
"event": "tgegfght",
"is_duplicate": false,
"farm": "dyhetygdht",
"uid": "tutyvbrtyvtrvy",
"clientip": "52351365136",
"device_os_label": "MICROSOFT_WINDOWS_7",
"device_browser_label": "MOZILLA_FIREFOX",
"geo_country_code": "MA",
"geo_region_code": "55",
"geo_city_name_normalized": "agadir",
"referer": "www.abc.com",
"txn": "etvevv5r",
"txn_isnew": true,
"publisher_id": 126,
"adspot_id": 11179502,
"ad_spot": 5188,
"format_id": 1611,
"misc": {
"PUBLISHER_FOLDER": "retvrect",
"NO_PROMO": "rctrctrc",
"SECTION": "evtrevr",
"U_COMMON_ALLOW": "0",
"U_Auth": "0"
},
"handler": "uint"
}, , ]
Thanks in advance !!
You have extra empty commas in your sample json.
Your Sample Json should look like
[{
"tstamp": "123456",
"event": "tgegfght",
"is_duplicate": false,
"farm": "dyhetygdht",
"uid": "tutyvbrtyvtrvy",
"clientip": "52351365136",
"device_os_label": "MICROSOFT_WINDOWS_7",
"device_browser_label": "MOZILLA_FIREFOX",
"geo_country_code": "MA",
"geo_region_code": "55",
"geo_city_name_normalized": "agadir",
"referer": "www.abc.com",
"txn": "etvevv5r",
"txn_isnew": true,
"publisher_id": 126,
"adspot_id": 11179502,
"ad_spot": 5188,
"format_id": 1611,
"misc": {
"PUBLISHER_FOLDER": "retvrect",
"NO_PROMO": "rctrctrc",
"SECTION": "evtrevr",
"U_COMMON_ALLOW": "0",
"U_Auth": "0"
},
"handler": "uint"
}]
OR
[
{
"somethinghere": "its value"
},
"somethingelse": "its value"
]
Your sample json is not valid json, due to the spurious extra commas on the second and last lines. Json only allows commas BETWEEN elements of a vector or object, and empty elements are not allowed.
I have data that I retrieved from a server in JSON format. I now want to pre-process these data in R.
My raw .json file (if opened in a text editor) looks like this:
{"id": 1,"data": "{\"unid\":\"wU6993\",\"age\":\"21\",\"origin\":\"Netherlands\",\"biling\":\"2\",\"langs\":\"Dutch\",\"selfrating\":\"80\",\"selfarrest\":\"20\",\"condition\":1,\"fly\":\"2\",\"flytime\":0,\"purpose\":\"na\",\"destin\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch up with friends.\"}"}
I want to parse it back for further use to its intended format:
`{
"id": 1,
"data": {
"unid": "wU6993",
"age": "21",
"origin": "Netherlands",
"biling": "2",
"langs": "Dutch",
"selfrating": "80",
"selfarrest": "20",
"condition": 1,
"fly": "2",
"flytime": 0,
"purpose": "na",
"destin": "Madrid",
"txtQ1": "I'm flying to Madrid to catch up with friends."
}
}`
Using jsonlite I can't read it in at all:
parsed = jsonlite::fromJSON(txt = 'exp1.json')
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: inside a string, '\' occurs before a character which it may not.
in\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch u
(right here) ------^
I think the error tells me that some characters are escaped that should have been.
How can I solve this and read my file?
You have extra quotes around the nested braces defining "data", the value of which is actually stored as one huge string instead of valid JSON. Take them out, and
my_json <- '{"id": 1,"data": "{\"unid\":\"wU6993\",\"age\":\"21\",\"origin\":\"Netherlands\",\"biling\":\"2\",\"langs\":\"Dutch\",\"selfrating\":\"80\",\"selfarrest\":\"20\",\"condition\":1,\"fly\":\"2\",\"flytime\":0,\"purpose\":\"na\",\"destin\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch up with friends.\"}"}'
my_json <- sub('"\\{', '\\{', my_json)
my_json <- sub('\\}"', '\\}', my_json)
jsonlite::prettify(my_json)
# {
# "id": 1,
# "data": {
# "unid": "wU6993",
# "age": "21",
# "origin": "Netherlands",
# "biling": "2",
# "langs": "Dutch",
# "selfrating": "80",
# "selfarrest": "20",
# "condition": 1,
# "fly": "2",
# "flytime": 0,
# "purpose": "na",
# "destin": "Madrid",
# "txtQ1": "I'm flying to Madrid to catch up with friends."
# }
# }