Format JSON file in R: lexical error with character encoding - json

I have data that I retrieved from a server in JSON format. I now want to pre-process these data in R.
My raw .json file (if opened in a text editor) looks like this:
{"id": 1,"data": "{\"unid\":\"wU6993\",\"age\":\"21\",\"origin\":\"Netherlands\",\"biling\":\"2\",\"langs\":\"Dutch\",\"selfrating\":\"80\",\"selfarrest\":\"20\",\"condition\":1,\"fly\":\"2\",\"flytime\":0,\"purpose\":\"na\",\"destin\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch up with friends.\"}"}
I want to parse it back for further use to its intended format:
`{
"id": 1,
"data": {
"unid": "wU6993",
"age": "21",
"origin": "Netherlands",
"biling": "2",
"langs": "Dutch",
"selfrating": "80",
"selfarrest": "20",
"condition": 1,
"fly": "2",
"flytime": 0,
"purpose": "na",
"destin": "Madrid",
"txtQ1": "I'm flying to Madrid to catch up with friends."
}
}`
Using jsonlite I can't read it in at all:
parsed = jsonlite::fromJSON(txt = 'exp1.json')
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: inside a string, '\' occurs before a character which it may not.
in\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch u
(right here) ------^
I think the error tells me that some characters are escaped that should have been.
How can I solve this and read my file?

You have extra quotes around the nested braces defining "data", the value of which is actually stored as one huge string instead of valid JSON. Take them out, and
my_json <- '{"id": 1,"data": "{\"unid\":\"wU6993\",\"age\":\"21\",\"origin\":\"Netherlands\",\"biling\":\"2\",\"langs\":\"Dutch\",\"selfrating\":\"80\",\"selfarrest\":\"20\",\"condition\":1,\"fly\":\"2\",\"flytime\":0,\"purpose\":\"na\",\"destin\":\"Madrid\",\"txtQ1\":\"I\'m flying to Madrid to catch up with friends.\"}"}'
my_json <- sub('"\\{', '\\{', my_json)
my_json <- sub('\\}"', '\\}', my_json)
jsonlite::prettify(my_json)
# {
# "id": 1,
# "data": {
# "unid": "wU6993",
# "age": "21",
# "origin": "Netherlands",
# "biling": "2",
# "langs": "Dutch",
# "selfrating": "80",
# "selfarrest": "20",
# "condition": 1,
# "fly": "2",
# "flytime": 0,
# "purpose": "na",
# "destin": "Madrid",
# "txtQ1": "I'm flying to Madrid to catch up with friends."
# }
# }

Related

json file into lua table

How is it possible to get the input of an json file:
{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}
into a Lua table? All json.lua scripts I found did not work for JSON with new lines. Does someone know a solution?
So piglets solution works for the string in the same script.
But how does it work with a JSON file?
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local myTable = json.decode(file)
print(myTable)
Here then comes the error "bad argument #1 to 'strfind' (string expected, got FILE*)" on. Does someone see my fault?
local json = require("dkjson")
local yourString = [[{ "name": "John",
"work": "chef",
"age": "29",
"messages": [
{
"msg_name": "Hello",
"msg": "how_are_you"
},
{ "second_msg_name": "hi",
"msg": "fine"
}
]
}]]
local myTable = json.decode(yourString)
http://dkolf.de/src/dkjson-lua.fsl/home
I found the solution:
local json = require("dkjson")
local file = io.open("C:\\Users\\...\\Documents\\Lua_Plugins\\test_file_reader\\test.json", "r")
local content = file:read "*a"
file:close()
local myTable = json.decode(content)

Parsing and cleaning text file in Python?

I have a text file which contains raw data. I want to parse that data and clean it so that it can be used further.The following is the rawdata.
"{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x221a0d4d6e-0c00-11e7-a16f-0242ac110002\x22,\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-03-22T03:21:11+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x22423e49efa4b8b013\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
I want to remove all the hexadecimal characters,I tried parsing the data and storing in an array and cleaning it using re.sub() but it gives the same data.
for line in f:
new_data = re.sub(r'[^\x00-\x7f],\x22',r'', line)
data.append(new_data)
\x0A is the hex code for newline. After s = <your json string>, print(s) gives
>>> print(s)
{
"identifier": {
"company_code": "TSC",
"product_type": "airtime-ctg",
"host_type": "android"
},
"id": {
"type": "guest",
"group": "guest",
"uuid": "1a0d4d6e-0c00-11e7-a16f-0242ac110002",
"device_id": "423e49efa4b8b013"
},
"stats": [
{
"timestamp": "2017-03-22T03:21:11+0000",
"software_id": "A-ACTG",
"action_id": "open_app",
"values": {
"device_id": "423e49efa4b8b013",
"language": "en"
}
}
]
}
You should parse this with the json module load (from file) or loads (from string) functions. You will get a dict with 2 dicts and a list with a dict.

Save series of dictionaries to json file and retrieve it later

So I am doing some work in python where I have to generate a series of dictionaries. I want to write each of these dictionaries to a single file.
The code to write the dictionaries look like this
with open('some_name.json', 'w') as fh:
data = function_generate_dict() # returns a dictionary
json.dump(data, fh)
That works fine and I can view the outputted file and can even load its content like thus
with open('some_name.json', 'r+') as rh:
for line in rh.readlines():
print(line)
But when I try to reload each dictionary from the file by doing this
with open('some_name', 'r') as rh:
cont = rh.read()
js =json.loads(cont)
I always get a JSONDecodeError: Extra data: line 1 column 220 (char 219)
which I suspect is coming from where one dictionary ends and another begins.
If I do this (json.load() instead of json.loads())
with open('some_name', 'r') as rh:
cont = rh.read()
js =json.load(cont)
I get this error AttributeError: 'str' object has no attribute 'read'
I have even tried using jsonl as the file format. But it doesn't work.
Here is a sample of the dictionaries I'm generating
{"measure_no": "0", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["G"], "alter": ["1"], "octave": ["6"], "duration": ["256"], "syllabic": [], "text": []}{"measure_no": "1", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["G", "G", "G", "G"], "alter": ["1", "1", "1", "1"], "octave": ["6", "6", "6", "6"], "duration": ["384", "128", "256", "256"], "syllabic": [], "text": []}{"measure_no": "2", "divisions": "256", "fifths": "5", "mode": "major", "beats": "4", "beat-type": "4", "transpose": "-9", "step": ["C", "G", "G"], "alter": ["1", "1", "1"], "octave": ["7", "6", "6"], "duration": ["384", "128", "512"], "syllabic": [], "text": []}
Your code dumps your dictionary to the end of the existing line:
This is 1: hard to read for human beings.
2: hard to read for the "readlines" function.
Add a new line after you dump your dictionnary.
with open('some_name.json', 'a') as fh:
''' note the use of 'a' instead of 'w', you want to append your
dictionnaries, not overwrite them every time '''
data = function_generate_dict() # returns a dictionary
json.dump(data, fh)
fh.write('\n')
Then when you want to read: from the file to a dictionary : you can do it with a loop which reads every line as a different json dictionary:
The way I would see it is make an empty dictionnary first to store every line.
jsonlist = []
'''make a list to store the json dictionaries'''
with open('some_name','r') as rh:
for line in rh.readlines():
jsonlist.append(json.loads(line))
You now have variable jsonlist for which each index is one of your json dictionaries, all that is left for you is to manipulate those indexes.
>>> jsonlist[0]
{'measure_no': '0', 'divisions': '256', 'fifths': '5', 'mode': 'major', 'beats': '4', 'beat-type': '4', 'transpose': '-9', 'step': ['G'], 'alter': ['1'], 'octave': ['6'], 'duration': ['256'], 'syllabic': [], 'text': []}

Python pandas convert CSV upto level "n" nested JSON?

I want to convert the csv file into nested json upto n level. I am using below code to get the desired output from this link. But I am getting an error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-109-89d84e9a61bf> in <module>()
40 # make a list of keys
41 keys_list = []
---> 42 for item in d['children']:
43 keys_list.append(item['name'])
44
TypeError: 'datetime.date' object has no attribute '__getitem__'
Below is the code:
# CSV 2 flare.json
# convert a csv file to flare.json for use with many D3.js viz's
# This script creates outputs a flare.json file with 2 levels of nesting.
# For additional nested layers, add them in lines 32 - 47
# sample: http://bl.ocks.org/mbostock/1283663
# author: Andrew Heekin
# MIT License
import pandas as pd
import json
df = pd.read_csv('file.csv')
# choose columns to keep, in the desired nested json hierarchical order
df = df[['Group','year','quarter']]
# order in the groupby here matters, it determines the json nesting
# the groupby call makes a pandas series by grouping 'the_parent' and 'the_child', while summing the numerical column 'child_size'
df1 = df.groupby(['Group','year','quarter'])['quarter'].count()
df1 = df1.reset_index(name = "count")
#print df1.head()
# start a new flare.json document
flare = dict()
flare = {"name":"flare", "children": []}
#df1['year'] = [str(yr) for yr in df1['year']]
for line in df1.values:
the_parent = line[0]
the_child = line[1]
child_size = line[2]
# make a list of keys
keys_list = []
for item in d['children']:
keys_list.append(item['name'])
# if 'the_parent' is NOT a key in the flare.json yet, append it
if not the_parent in keys_list:
d['children'].append({"name":the_parent, "children":[{"name":the_child, "size":child_size}]})
# if 'the_parent' IS a key in the flare.json, add a new child to it
else:
d['children'][keys_list.index(the_parent)]['children'].append({"name":the_child, "size":child_size})
flare = d
# export the final result to a json file
with open('flare.json', 'w') as outfile:
json.dump(flare, outfile)
Expected Output in below format:
{
"name": "stock",
"children": [
{"name": "fruits",
"children": [
{"name": "berries",
"children": [
{"count": 20, "name": "blueberry"},
{"count": 70, "name": "cranberry"},
{"count": 96, "name": "raspberry"},
{"count": 140, "name": "strawberry"}]
},
{"name": "citrus",
"children": [
{"count": 20, "name": "grapefruit"},
{"count": 120, "name": "lemon"},
{"count": 50, "name": "orange"}]
},
{"name": "dried fruit",
"children": [
{"count": 25, "name": "dates"},
{"count": 10, "name": "raisins"}]
}]
},
{"name": "vegtables",
"children": [
{"name": "green leaf",
"children": [
{"count": 19, "name": "cress"},
{"count": 18, "name": "spinach"}]
},
{
"name": "legumes",
"children": [
{"count": 27, "name": "beans"},
{"count": 12, "name": "chickpea"}]
}]
}]
}
Could any one please help how to resolve this error.
Thanks

Unable to loop through JSON output from webservice Python

I have a web-service call (HTTP Get) that my Python script makes in which returns a JSON response. The response looks to be a list of Dictionaries. The script's purpose is to iterate through the each dictionary, extract each piece of metadata (i.e. "ClosePrice": "57.74",) and write each dictionary to its own row in Mssql.
The issue is, I don't think Python is recognizing the JSON output from the API call as a list of dictionaries, and when I try a for loop, I'm getting the error must be int not str. I have tried converting the output to a list, dictionary, tuple. I've also tried to make it work with List Comprehension, with no luck. Further, if I copy/paste the data from the API call and assign it to a variable, it recognizes that its a list of dictionaries without issue. Any help would be appreciated. I'm using Python 2.7.
Here is the actual http call being made: http://test.kingegi.com/Api/QuerySystem/GetvalidatedForecasts?user=kingegi&market=us&startdate=08/19/13&enddate=09/12/13
Here is an abbreviated JSON output from the API call:
[
{
"Id": "521d992cb031e30afcb45c6c",
"User": "kingegi",
"Symbol": "psx",
"Company": "phillips 66",
"MarketCap": "34.89B",
"MCapCategory": "large",
"Sector": "basic materials",
"Movement": "up",
"TimeOfDay": "close",
"PredictionDate": "2013-08-29T00:00:00Z",
"Percentage": ".2-.9%",
"Latency": 37.48089483333333,
"PickPosition": 2,
"CurrentPrice": "57.10",
"ClosePrice": "57.74",
"HighPrice": null,
"LowPrice": null,
"Correct": "FALSE",
"GainedPercentage": 0,
"TimeStamp": "2013-08-28T02:31:08 778",
"ResponseMsg": "",
"Exchange": "NYSE "
},
{
"Id": "521d992db031e30afcb45c71",
"User": "kingegi",
"Symbol": "psx",
"Company": "phillips 66",
"MarketCap": "34.89B",
"MCapCategory": "large",
"Sector": "basic materials",
"Movement": "down",
"TimeOfDay": "close",
"PredictionDate": "2013-08-29T00:00:00Z",
"Percentage": "16-30%",
"Latency": 37.4807215,
"PickPosition": 1,
"CurrentPrice": "57.10",
"ClosePrice": "57.74",
"HighPrice": null,
"LowPrice": null,
"Correct": "FALSE",
"GainedPercentage": 0,
"TimeStamp": "2013-08-28T02:31:09 402",
"ResponseMsg": "",
"Exchange": "NYSE "
}
]
Small Part of code being used:
import os,sys
import subprocess
import glob
from os import path
import urllib2
import json
import time
try:
data = urllib2.urlopen('http://api.kingegi.com/Api/QuerySystem/GetvalidatedForecasts?user=kingegi&market=us&startdate=08/10/13&enddate=09/12/13').read()
except urllib2.HTTPError, e:
print "HTTP error: %d" % e.code
except urllib2.URLError, e:
print "Network error: %s" % e.reason.args[1]
list_id=[x['Id'] for x in data] #test to see if it extracts the ID from each Dict
print(data) #Json output
print(len(data)) #should retrieve the number of dict in list
UPDATE
Answered my own question, here is the method below:
`url = 'some url that is a list of dictionaries' #GetCall
u = urllib.urlopen(url) # u is a file-like object
data = u.read()
newdata = json.loads(data)
print(type(newdata)) # printed data type will show as a list
print(len(newdata)) #the length of the list
newdict = newdata[1] # each element in the list is a dict
print(type(newdict)) # this element is a dict
length = len(newdata) # how many elements in the list
for a in range(1,length): #a is a variable that increments itself from 1 until a number
var = (newdata[a])
print(var['Correct'], var['User'])`