UniqueDecodeError from urllib2 output from webpage with no non-unicode characters - json

I am trying to read data off an api webpage using urllib2 in Python2.7. I am using the following lines to read the page:
url = 'https://api.edamam.com/api/nutrition-data?app_id=<my_app_id>&app_key=<my_app_key>&ingr=1cheeseburger'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
These lines give me this error (the error is on the last line in the above code):
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 0: invalid start byte
I understand that this error means that there are non-unicode characters in json_obj but I am not sure why this is the case, because the same url opens in a browser and the first few lines on the webpage looks like the following:
{
"uri" : "http://www.edamam.com/ontologies/edamam.owl#recipe_2a58ff3e1fec41d79da72f0be446baaa
"calories" : 312,
"totalWeight" : 119.0,
"dietLabels" : [ "BALANCED" ],
"healthLabels" : [ "PEANUT_FREE", "TREE_NUT_FREE", "ALCOHOL_FREE" ],
"cautions" : [ ],
"totalNutrients" : {
"ENERC_KCAL" : {
"label" : "Energy",
"quantity" : 312.96999999999997,
"unit" : "kcal"
},
As you can see, there are no non-unicode characters on this webpage, so I don't really follow what is going on.

Related

Reading data from the JSON object

I have JSON data in a file json_format.py as follows:
{
"name" : "ramu",
"place" : "hyd",
"height" : 5.10,
"list" : [1,2,3,4,5,6],
"tuple" : (0,1,2),
"colors" : {"mng":"white","aft" : "blue","night":"red"},
"car" : "None",
"bike" : "True",
}
I'm reading the above with this code:
import json
from pprint import pprint
with open (r'C:/PythonPrograms\Json_example/json_format.py') as jobj:
fp = jobj.readlines()
b = json.dumps(fp) # ---> I get string
print(type(b))
c = json.loads(b)
print(type(c)) # ---> List
pprint(c)
print(c[0])
pprint(c["name"])
Now, I would like to access the JSON object as c['name'] and the output should be ramu.
Since c is a list, I can't do so. How can I read my JSON data so that I can access it with keys?
Thanks in advance!
You're effectively doing c = json.loads(json.dumps(jobj.readlines())) when you just need:
c = json.load(jobj)
print(c["name"]) # ramu
Also, your JSON is malformed.
There are no tuples in JSON: "tuple" : (0,1,2),
Your last item should not end with a comma: "bike" : "True",

Look for JSON example with all allowed combinations of structure in max depth 2 or 3

I've wrote a program which process JSON objects. Now I want to verify if I've missed something.
Is there an JSON-example of all allowed JSON structure combinations? Something like this:
{
"key1" : "value",
"key2" : 1,
"key3" : {"key1" : "value"},
"key4" : [
[
"string1",
"string2"
],
[
1,
2
],
...
],
"key5" : true,
"key6" : false,
"key7" : null,
...
}
As you can see at http://json.org/ on the right hand side the grammar of JSON isn't quite difficult, but I've got several exceptions because I've forgotten to handles some structure combinations which are possible. E.g. inside an array there can be "string, number, object, array, true, false, null" but my program couldn't handle arrays inside an array until I ran into an exception. So everything was fine until I got this valid JSON object with arrays inside an array.
I want to test my program with a JSON object (which I'm looking for). After this test I want to be feel certain that my program handle every possible valid JSON structure on earth without an exception.
I don't need nesting in depth 5 or so. I only need something in nested depth 2 or max 3. With all base types which nested all allowed base types, inside this base type.
Have you thought of escaped characters and objects within an object?
{
"key1" : {
"key1" : "value",
"key2" : [
"String1",
"String2"
],
},
"key2" : "\"This is a quote\"",
"key3" : "This contains an escaped slash: \\",
"key4" : "This contains accent charachters: \u00eb \u00ef",
}
Note: \u00eb and \u00ef are resp. charachters ë and ï
Choose a programming language that support json.
Try to load your json, on fail the exception's message is descriptive.
Example:
Python:
import json, sys;
json.loads(open(sys.argv[1]).read())
Generate:
import random, json, os, string
def json_null(depth = 0):
return None
def json_int(depth = 0):
return random.randint(-999, 999)
def json_float(depth = 0):
return random.uniform(-999, 999)
def json_string(depth = 0):
return ''.join(random.sample(string.printable, random.randrange(10, 40)))
def json_bool(depth = 0):
return random.randint(0, 1) == 1
def json_list(depth):
lst = []
if depth:
for i in range(random.randrange(8)):
lst.append(gen_json(random.randrange(depth)))
return lst
def json_object(depth):
obj = {}
if depth:
for i in range(random.randrange(8)):
obj[json_string()] = gen_json(random.randrange(depth))
return obj
def gen_json(depth = 8):
if depth:
return random.choice([json_list, json_object])(depth)
else:
return random.choice([json_null, json_int, json_float, json_string, json_bool])(depth)
print(json.dumps(gen_json(), indent = 2))

Importing json file in new fresh mnogodb

I just want to ask how can I import this example.json file in new mongodb I expect to have each seassion object as row in the table I tried
mongoimport --db foo --collection myCollections < dataBuys.json
2015-05-07T21:19:15.828+0300 connected to: localhost
2015-05-07T21:19:18.831+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:21.826+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:24.828+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:27.828+0300 foo.myCollections 168.5 MB
2015-05-07T21:19:28.849+0300 warning: attempting to insert document with size 124.6 MB (exceeds 16.0 MB limit)
2015-05-07T21:19:28.986+0300 error inserting documents: write tcp 127.0.0.1:27017: broken pipe
2015-05-07T21:19:28.986+0300 imported 0 documents
and this
mongoimport -d mydb -c mycollection --jsonArray < dataBuys.json
2015-05-07T21:20:02.139+0300 connected to: localhost
2015-05-07T21:20:02.139+0300 Failed: error reading separator after document #1: bad JSON array format - found no opening bracket '[' in input source
2015-05-07T21:20:02.139+0300 imported 0 documents
The file I want to import have the following format and it size is 170mb for this one and 2.97GB for the other one.
{
"Sessions": {
"420374" : {
"Purchases" : [
{
"Price" : "12462",
"Quantity" : "1",
"Timestamp" : "2014-04-06T18:44:58.314Z",
"ItemId" : "214537888"
},
{
"Price" : "10471",
"Quantity" : "1",
"Timestamp" : "2014-04-06T18:44:58.325Z",
"ItemId" : "214537850"
}
]
},
"281626" : {
"Purchases" : [
{
"Price" : "1883",
"Quantity" : "1",
"Timestamp" : "2014-04-06T09:40:13.032Z",
"ItemId" : "214535653"
}
]
},
"420368" : {
"Purchases" : [
{
"Price" : "6073",
"Quantity" : "1",
"Timestamp" : "2014-04-04T06:13:28.848Z",
"ItemId" : "214530572"
},
{
"Price" : "2617",
"Quantity" : "1",
"Timestamp" : "2014-04-04T06:13:28.858Z",
"ItemId" : "214835025"
}
]
}
}
}
Do I have to reformat the json ? is it possible to make it work like this ?
the first error message says:
warning: attempting to insert document with size 124.6 MB (exceeds 16.0 MB limit)
This implies you are trying to insert a document that is 124.6MB in size.
A json document starts with an open brace character "{" and ends with a closed brace character "}". The error message implies that you have 124.6MB between such characters.
I think you need to examine your input file and verify that each session object is defined as a separate document - another words starts and ends with a brace.
I suspect the problem is that the session objects are in fact embedded in a master document - sort of a container document. This would make mongoimport try to map the master container document to its collection - and not the session objects as you require.
First of all, for verifying that your GeoJSON file is accurate, you could use Geojsonlint, QGIS and so on.
After than, to import your data into your collection, use Mongoimport:
mongoimport --db MY_DATABASE_NAME -c MY_COLLECTION_NAME --type json --file "MY_GEOJSON_FILENAME"
Replace the 3 variables above whith your valid names. Obviously, make sure that your current directory contains the file.
Profit! :)

Parsing complex json in pig?

I have json file in follwoing format:
{ "_id" : "foo.com", "categories" : [], "h1" : { "bar==" : { "first" : 1281916800, "last" : 1316995200 }, "foo==" : { "first" : 1281916800, "last" : 1316995200 } }, "name2" : [ "foobarl.com", "foobar2.com" ], "rep" : null }
So, how do i parse this json in pig..
also, the categories and rep can have some char in it..and might not be always empty.
I made the following attempt.
a = load 'sample_json.json' using JsonLoader('id:chararray,categories:[chararray], hostt:{ (variable_a: {(first:int,last:int)})}, ns:[chararray],rep:chararray ');
But i get this error:
org.codehaus.jackson.JsonParseException: Unexpected character ('D' (code 68)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream#4795b8e9; line: 1, column: 50]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1291)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:1582)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:386)
at org.apache.pig.builtin.JsonLoader.readField(JsonLoader.java:173)
at org.apache.pig.builtin.JsonLoader.getNext(JsonLoader.java:157)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
You can use elephant bird pig jar for parsing json. It can parse all sort of json data.
Here are certain examples for parsing json via elephant bird pig using this jar.
https://github.com/twitter/elephant-bird/tree/master/examples/src/main/pig
It doesn't break even if an expected json tag isn't present.

Aptana gives error with JSON format

Format:
{
"lastUpdate" : "20/9/2012-12:12",
"data":[{
"user" : "_name_",
"username" : "_fullname_",
"photoURL" : "_url_"
}, {
"user" : "_name_",
"username" : "_fullname_",
"photoURL" : "_url_"
}, {
"user" : "_name_",
"username" : "_fullname_",
"photoURL" : "_url_"
}]
}
Aptana gives errors at the :
Screenshot Aptana JSON format
Why is that? It seems I'm not having any problems receiving and processing the data.
[EDIT 1] Error given: Syntax Error: unexpected token ":"
In Aptana json is parsed "as json" only when you create/open a file with extension .json.
When have a json object inside a .js file works only the javascript parser, for that you see the error, is not a valid token for JS.