Feedparser SAXParseException, bozo:1 - exception

I'm using feedparser in a script that's generally working for RSS URLs, but there's one URL that's giving me a headache: tabbforum.com/feed.atom
I get a SAXParseException('not well-formed (invalid token)',).
import feedparser
def read_from_feed(self, rss_url):
feed = feedparser.parse(rss_url)
for entry in feed.entries:
print('do stuff')
>>>>feed
{'feed': {}, 'entries': [], 'bozo': 1, 'encoding': 'utf-8', 'version': '', 'bozo_exception': SAXParseException('not well-formed (invalid token)',), 'namespaces': {}}
I'm thinking that there's something wrong with the xml(?). Has anybody had previous experiences and have been able to find a work-around? Or have an idea what the problem is?

Had a similar problem. In my case I forgot to put http:// in front of URL and feedparser was treating it not as a url, but as an RSS XML.

Related

Python reqparse.Add_arguments(Type) won't give me the correct type in my json request

so I have a problem. My JSON data is getting sent over as a full string but its an object. How do i send an object instead of a string through my request. My reqparser is setup like this
search_parse = reqparse.RequestParser()
search_parse.add_argument('indexId', required=True, action='append')
search_parse.add_argument("pagination", required=False)
search_parse.add_argument("FilterCriteria", required=True)
and the JSON request I sent looks like this
{
"indexId": [
"testing"
],
"pagination": {
"Skip": 1,
"Take": 4
},
"FilterCriteria": {
"HasPatents": false,
"IsAuthor": false
}
}
and my payload is built up like this in my controller
sovren_payload = {
"PaginationSettings": pagination,
"IndexIdsToSearchInto": indexId,
"FilterCriteria": test
}
The problem I'm having is that the FilterCriteria is getting sent in the json as a string so this is how its supposed to look in JSON format
'FilterCriteria': {'coolguy': True, 'notcool': False}
But what I actually get is this
'FilterCriteria': "{'coolguy': True, 'notcool': False}"
How do I get rid of these annoying "" brackets in my json request. I print my sovren_payload and there it shows me the 'FileCriteria': "{random data}"
I also realise i have the same problem with the "pagination" variable. But if I fix it for FileCriteria I fix it for pagination as well.
Any insight would be greatly appreciated
Yeah okay, after 3 hours of struggling I found it out, should've put type=dict
I feel so stupid. My main language is C# and I only use python for AI stuff so its still new to me. Hope this helps some poor soul like me in the future :D

Decode http response in Angular - JSON.parse issue

After quite a bit of messing around, I've got to the point where I can recieve the following body in a 404 error response from my backend. I'm struggling to parse the content into angular so I can use it. I'm know this is simple stuff, so sorry to ask such a basic question. The _body looks like this.
_body: "{\"httpStatus\":404,\"errorType\":\"NotFound\",\"message\":\"Device does not exist!\"}"
These are fine:
console.log("Err = ", err);
Err = Response {_body: "
{\"httpStatus\":404,\"errorType\":\"NotFound\",\"message\":\"Device does not exist!\"}", status: 404, ok: false, statusText: "OK", headers: Headers, …}
and:
console.log("Err Body : ", err._body);
Err Body :
{\"httpStatus\":404,\"errorType\":\"NotFound\",\"message\":\"Device does not exist!\"}
But this doesn't work:
let errorObject = eval(errorString);
Uncaught (in promise): SyntaxError: Invalid or unexpected token
...
var errBody = JSON.parse(errorString);
console.log("JS err body", errBody);
Error: Uncaught (in promise): SyntaxError: Unexpected token \ in JSON at position 1
But I can't figure out how to get the individual fields out. I'm aware that the above efforts are naive and wrong. I'm sure anyone with JS or angular skills can solve this in a minute.
PS cut me some slack. I'm a hardware designer. I'm here because I don't know something, which is always the best reason to ask a question.
Edit:
Thanks for the answers. JSON.parse does not work for me!?
SyntaxError: Unexpected token \ in JSON at position 1
I looked more closely at what you had success with, and I agree it works fine in the console. But it does not work for me in Angular. What did work was:
let errBody = JSON.parse("\"" + err._body + "\"");
Although it seems ridiculous to do. Especially since, afterwards, the result is not quite right:
err body {"httpStatus":404,"errorType":"NotFound","message":"Device does not exist!"}
If I then try to get at errBody.message, it's undefined!... This is totally absurd. What am I doing wrong? How do you guys do this for a living? It's killing me!
I'm assuming errorString is err._body? Either way, parsing that string to JSON should be as simple as:
let error = JSON.parse(err._body);
I came back to this recently. And finally managed to figure it out. I needed to remove some unwanted backslashes in the body before trying JSON Parse.
const errorStringReplaced = err._body.replace(/\\/g, '');
const errBody = JSON.parse(errorStringReplaced);
this.outcomeMessage = errBody.message;
Having done that, I could then grab the innards properly. I'd still prefer to send the object properly in the first place, but this will have to do for now.

Deleting a file from a bucket

I'm trying to delete an object from a bucket. Reading the docs it all sounds super simple, but I just can't seem to get it working.
I'm following the instructions here to try and delete this object, which I can see using https://developer.api.autodesk.com/oss/v2/buckets/my-persistent-bucket/objects:
bucketKey => 'my-persistent-bucket'
objectKey => '--test2.dwg'
objectId => 'urn:adsk.objects:os.object:my-persistent-bucket/--test2.dwg'
sha1 => '477085439a60779064d91fd1971d53c77c7a163a'
size => (int) 188600
location => 'https://developer.api.autodesk.com/oss/v2/buckets/my-persistent-bucket/objects/--test2.dwg'
According to the docs we use this end point:
https://developer.api.autodesk.com/oss/v2/buckets/:bucketKey/objects/:objectName
Where
:bucketKey is url encoded 'my-persistent-bucket'
:objectName is url encoded 'urn:adsk.objects:os.object:my-persistent-bucket/--test2.dwg'
I've tried using PHP's urlencode() and the following base64 encode function:
private function _base64url_encode($data) {
return rtrim(strtr(base64_encode($data), '+/', '-_'), '=');
}
to encode the :bucketKey and :objectName but no matter how I try to encode it I always get:
404 : Object not found
Could anyone help me understand where I'm going wrong?
Thanks a lot
Of course, after I've made a SO post I find the answer.
For anyone having the same issues you must encode your :objectName, which is just the filename, in my example '--test2.dwg', using PHP's rawurlencode() function rather than urlencode().

JSON.parse returning undefined object

Blizzard just shut down their old API, and made a change so you need an apikey. I changed the URL to the new api, and added the API key. I know that the URL is valid.
var toonJSON = UrlFetchApp.fetch("eu.api.battle.net/wow/character/"+toonRealm+"/"+toonName+"?fields=items,statistics,progression,talents,audit&apikey="+apiKey, {muteHttpExceptions: true})
var toon = JSON.parse(toonJSON.getContentText())
JSON.pase returns just an empty object
return toon.toSorce() // retuned ({})
I used alot of time to see if i could find the problem. have come up empty. Think it has something to do with the "responce headers".
Responce headers: http://pastebin.com/t30giRK1 (i got them from dev.battle.net (blizzards api site)
JSON: http://pastebin.com/CPam4syG
I think it is the code you're using.
I was able to Parse it by opening the raw url of your pastebin JSON http://pastebin.com/raw/CPam4syG
And using the following code
var text = document.getElementsByTagName('pre')[0].innerHTML;
var parse = JSON.parse(text);
So to conclude I think it is the UrlFetchApp.fetch that's returning {}
So i found the problems:
I needed https:// in the URL since i found after some hours that i had an SSL error
If you just use toString instead of getContentText it works. Thow why getContentText do not work, i am not sure of.
was same problem, this works for me (dont forget to paste your key)
var toonJSON = UrlFetchApp.fetch("https://eu.api.battle.net/wow/character/"+toonRealm+"/"+toonName+"?fields=items%2Cstatistics%2Cprogression%2Caudit&locale=en_GB&apikey= ... ")

Proper way to deal with variations in JSON serialization

I have a web service that uses Python's SimpleJSON to serialize JSON, and a javascript/ client that uses Google's Visualization API. When I try to read in the JSON response using Google Data Table's Query method, I am getting a "invalid label" error.
I noticed that Google spreadsheet outputs JSON without quotes around the object keys. I tried reading in JSON without the quotes and that works. I was wondering what was the best way to get SimpleJSON output to be read into Google datable using
query = new google.visualization.Query("http://www.myuri.com/api/").
I could use a regex to remove the quotes, but that seems sloppy. The javascript JSON parsing libraries I've tried won't read in JSON syntax without quotes around the object keys.
Here's some good background reading re: quotes around object keys:
http://simonwillison.net/2006/Oct/11/json/.
Are you certain the Google API is expecting JSON? In my experience Google's APIs tend not to be massively broken in a manner you're describing -- it could be that they're actually expecting a different format that merely resembles JSON.
Further poking around reveals instructions for retrieving data in the format Google expects:
For example, to get the dataSourceUrl
from a Google Spreadsheet, do the
following:
In your spreadsheet, select the range of cells.
Select 'Insert' and then 'Gadget' from the menu.
Open the gadget's menu by clicking on the top-right selector.
Select menu option 'Get data source URL'.
I did this and opened the URL in my browser. The data it was returning was certainly not JSON:
google.visualization.Query.setResponse(
{requestId:'0',status:'ok',signature:'1464883469881501252',
table:{cols: [{id:'A',label:'',type:'t',pattern:''},
{id:'B',label:'',type:'t',pattern:''}],
rows: [[{v:'a'},{v:'h'}],[{v:'b'},{v:'i'}],[{v:'c'},{v:'j'}],[{v:'d'},{v:'k'}],[{v:'e'},{v:'l'}],[{v:'f'},{v:'m'}],[{v:'g'},{v:'n'}]]}});
It looks like the result is intended to be directly executed by the browser. Try modifying your code to do something like this:
# old
return simplejson.dumps ({"requestId": 1, "status": "ok", ...})
# new
json = simplejson.dumps ({"requestId": 1, "status": "ok", ...})
return "google.visualization.Query.setResponse(%r);" % json
The "invalid label" error is usually due to a blind eval() on the JSON string, resulting in property names being mistaken as labels (because they have the same syntax -- "foo:").
eval("{ foo: 42, bar: 43 }"); // Results in invalid label
The quick remedy is to make sure your JSON string has parenthesis enclosing the curly braces:
eval("({ foo: 42, bar: 43 })"); // Works
Try enclosing your JSON string in parenthesis to see if the "invalid label" error goes away.
As it turns out :mod:json would also choke at strings in single quotes. This will sort things out though:
Parse JavaScript object as JSON in python:
solution:
>>> from re import sub
>>> import json
>>> js = "{ a: 'a' }"
>>> json.loads(sub("'", '"', sub('\s(\w+):', r' "\1":', js)))
{u'a': u'a'}
Edit: (edge cases reviewed)
So it was brought up that the suggested solution would not cope with all cases and specifically with something like
e.g. {foo: "a sentence: right here!"} will get changed to {"foo": "a "sentence": right here!"}
– Jason S Apr 12 at 18:03
To resolve that we simply need to ensure that we are in fact working with a key and not simply a colon in a string so we do a little look behind magic to hint at a comma(,) or a curly brace({) presence to ensure we have it proper, like so:
colon in string:
>>> js = "{foo: 'a sentence: right here!'}"
>>> json.loads(sub("'", '"', sub('(?<={|,)\s*(\w+):', r' "\1":', js)))
{u'foo': u'a sentence: right here!'}
Which of course is the same as doing:
>>> js = "{foo: 'a sentence: right here!'}"
>>> json.loads(sub('(?<={|,)\s*(\w+):', r' "\1":', js).replace("'",'"'))
{u'foo': u'a sentence: right here!'}
But then I pointed out that this is not the only flaw because what about quotes:
If we are also concerned about escaped quotes we will have to be slightly more specific as to what constitutes a string. The first quote will follow either a curly brace({) a space(\s) or a colon(:) while the last matching quote will come before either a comma(,) or a closing curly brace(}) then we can consider everything in between as part of the same string, like so:
additional quotes in string:
>>> js = "{foo: 'a sentence: it\'s right here!'}"
>>> json.loads(
... sub("(?<=\s|{|:)'(.*?)'(?=,|})",
... r'"\1"',
... sub('(?<={|,)\s*(\w+):', r' "\1":', js))
... )
{u'foo': u"a sentence: it's right here!"}
Watch this space as more edge cases are revealed and solved. Can you spot another?
Or for something more complex perhaps, a real world example as returned by npm view:
From:
{ name: 'chuck',
description: 'Chuck Norris joke dispenser.',
'dist-tags': { latest: '0.0.3' },
versions: '0.0.3',
maintainers: 'qard ',
time: { '0.0.3': '2011-08-19T22:00:54.744Z' },
author: 'Stephen Belanger ',
repository:
{ type: 'git',
url: 'git://github.com/qard/chuck.git' },
version: '0.0.3',
dependencies: { 'coffee-script': '>= 1.1.1' },
keywords:
[ 'chuck',
'norris',
'jokes',
'funny',
'fun' ],
bin: { chuck: './bin/chuck' },
main: 'index',
engines: { node: '>= 0.4.1 < 0.5.0' },
devDependencies: {},
dist:
{ shasum: '3af700056794400218f99b7da1170a4343f355ec',
tarball: 'http://registry.npmjs.org/chuck/-/chuck-0.0.3.tgz' },
scripts: {},
directories: {},
optionalDependencies: {} }
To:
{u'author': u'Stephen Belanger ',
u'bin': {u'chuck': u'./bin/chuck'},
u'dependencies': {u'coffee-script': u'>= 1.1.1'},
u'description': u'Chuck Norris joke dispenser.',
u'devDependencies': {},
u'directories': {},
u'dist': {u'shasum': u'3af700056794400218f99b7da1170a4343f355ec',
u'tarball': u'http://registry.npmjs.org/chuck/-/chuck-0.0.3.tgz'},
u'dist-tags': {u'latest': u'0.0.3'},
u'engines': {u'node': u'>= 0.4.1 < 0.5.0'},
u'keywords': [u'chuck', u'norris', u'jokes', u'funny', u'fun'],
u'main': u'index',
u'maintainers': u'qard ',
u'name': u'chuck',
u'optionalDependencies': {},
u'repository': {u'type': u'git', u'url': u'git://github.com/qard/chuck.git'},
u'scripts': {},
u'time': {u'0.0.3': u'2011-08-19T22:00:54.744Z'},
u'version': u'0.0.3',
u'versions': u'0.0.3'}
Works for me =)
nJoy!