JSON Parse : The Famous Unexpected Token - json

When I do an Ajax async call using JQuery it fails with the following message:
Syntax Error: Unexpected Token
So I capture the output and its
{"formattedBasePrice":"<span class=\\"amount\\">$30,000<\/span>","formattedTotalPrice":"<span class=\\"amount\\">$30,000<\/span>","formattedVariationTotal":"<span class=\\"amount\\">$0<\/span>"}
The funny part is that if I copy/paste that JSON from the Browser Console and try to parse it, it WORKS!
So, I change my files to UTF8 without BOM but the problem still happend.
So the next is to remove the invalid invisible characters (the only reason that make the JSON parse works if I enter the string directly)
And my question is, Anyone knows what are the most common invalid invisible characters? I already try null (\0) but nothing happend.
Thanks!

The JSON is not valid.
Can you try it like this:
{"formattedBasePrice": "<span class=\"amount\">$30,000</span>", "formattedTotalPrice": "<span class=\"amount\">$30,000</span>", "formattedVariationTotal": "<span class=\"amount\">$0</span>"}

Related

How to decode an HTTP request with utf-8 and treat the surrogate keys (Emojis)

I'm having a hard time dealing with some parsing issues related to Emojis.
I have a json requested through the brandwatch site using urllib.(1) Then, I must decode it in utf-8, however, when I do so, I'm getting surrogate keys and the json.loader cannot deal with them. (2)
I've tried to use BeautifulSoup4, which works great, however, when there's a &quot on the site result, it is transformed to ", and then, the json.loader cannot deal with it for it says that a , is missing. After tons of search, I gave up trying to escape the " which would be the ideal.(3)
So now, I'm stuck with both "solutions/problems". Any ideas on how to proceed?
Obs: This is a program that fetchs data from the brandwatch and put it inside an MySQL database. So performance is an issue here.
Obs2: PyJQ is a JQ for Python with does the request and I can change the opener.
(1) - Dealing with the first approach using urllib, these are the relevants parts of the code used for it:
def downloader(url):
return json.loads(urllib.request.urlopen(url).read().decode('utf8'))
...
parsed = pyjq.all(jqparser,url=url, vars={"today" : start_date}, opener=downloader)
Error Casted:
Exception ignored in: '_pyjq.pyobj_to_jv'
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 339: surrogates not allowed
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f5f806303f0 ***
If I print the result of urllib.request.urlopen(url).read().decode('utf8') instead of sending it to json.loader, that's what appears. These keys seems to be Emojis.
"fullname":"Botinhas\uD83D\uDC62"
(2) Dealing with the second approach using BeautifulSoup4, here's the relevant part of the code. (Same as above, just changed the downloader function)
def downloader(url):
return json.loads(BeautifulSoup(urllib.request.urlopen(url), 'lxml').get_text())
...
parsed = pyjq.all(jqparser,url=url, vars={"today" : start_date}, opener=downloader)
And this is the error casted:
Expecting ',' delimiter: line 1 column 4814765 (char 4814764)
Doing the print, the " before Diretas Já should be escaped.
"title":"Por "Diretas Já", manifestações pelo país ocorrem em preparação ao "Ocupa Brasília" - Sindicato dos Engenheiros no Estado do Rio de Janeiro"
I've thought of running a regex, however, I'm not sure whether this would be the most appropriate solution to this case as performance is an issue.
(3) - Part of Brandwatch result with the &quot problem mentioned above
UPDATE:
As Martin stated in the comments, I ran a replace swapping &quot for nothing. Then, it raised the former problem, of the emoji.
Exception ignored in: '_pyjq.pyobj_to_jv'
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud83d' in position 339: surrogates not allowed
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f5f806303f0 ***
UPDATE2:
I've added this to the downloader function:
re.sub(r'\\u(d|D)([a-z|A-Z|0-9]{3})', "", urllib.request.urlopen(url).read().decode('utf-8','ignore'))
It solved the issue, however, I don't think it's the best way to solve it. If anybody knows a better option.

JSON get parsed in browser but not by node.js

i'm about to write some test for my client UI.
the weird thing, my JSON string:
{"match":"\s?5\.7\s?\<=\>\s?7","success":"null-coalesce-operator"}
used to be parsed by JSON.parse by browser(Chrome) and looks like this:
{
match: "\s?5\.7\s?\<=\>\s?7",
success:"null-coalesce-operator"
}
everything is fine,
but when i run that part by mocha within node.js env, i get:
{"match":"\s?5\.7\s?\<=\>\s?7","success":"null-coalesce-operator"}
^
SyntaxError: Unexpected token s
at Object.parse (native)
...
did anyone experienced stuff like this. thx for any tipp.
node version is v5.7.1
mocha version is 2.4.5
UPDATE html string that i test is:
<!doctype html><html><body><div data-meta="{"match":"\\s?5\\.7\\s?\\<=\\>\\s?7","success":"null-coalesce-operator"}"></div></body></html>
it just a single line string without any \n newlines and the same.
I think it is because it also parse specials characaters (e.g \n => line feed, \r => carriage return, etc), what chrome did not. So because you want an antislash in you regex, before parsing in node, you need to replace each\ by \\:
json_string = json_string.replace(new RegExp('\\\\', 'g'), '\\\\') //we have to use regex, because when using replace with string, it only replaces the first occurence...
otherwise, when parsing, it will tell, à \s : 'It is a special character, identified by s. But I haven't any tokens s. So I throw an error."

invalid character in json response

in my cakephp controller i send the json response with
$response = array('success' => 1);
return json_encode($response);
i am alway getting a wrong json in my view: SyntaxError: JSON.parse: unexpected character
i have tested with JSLint, the error is "unsafe Character" char 0 line 1
firebug console output returns 65279 for the following statement.
console.log(response.charCodeAt(0));
what can i do? is this an UTF-8 issue?
You may have the character  in your json string which is the Unicode Character ZERO WIDTH NO-BREAK SPACE (U+FEFF). It may be that you copied it into your code via a copy/paste without realizing it. It is not visible so it is hard to debug. Try copy the $response text into a text editor and erase the space.
Here is a post that may be related.
https://stackoverflow.com/a/9691839/2777098

Why does JSON.parse choke on encoded characters in nodejs?

I'm attempting to look up the word "flower" in Google's dictionary semi-api. Source:
https://gist.github.com/DelvarWorld/0a83a42abbc1297a6687
Long story short, I'm calling JSONP with a callback paramater then regexing it out.
But it hits this snag:
undefined:1
ple","terms":[{"type":"text","text":"I stopped to buy Bridget some \x3cem\x3ef
^
SyntaxError: Unexpected token x
at Object.parse (native)
Google is serving me escaped HTML characters, which is fine, but JSON.parse cannot handle them?? What's weirding me out is this works just fine:
$ node
> JSON.parse( '{"a":"\x3cem"}' )
{ a: '<em' }
I don't get why my thingle is crashing
Edit These are all nice informational repsonses, but none of them help me get rid of the stacktrace.
\xHH is not part of JSON, but is part of JavaScript. It is equivalent to \u00HH. Since the built-in JSON doesn't seem to support it and I doubt you'd want to go through the trouble of modifying a non-built-in JSON implementation, you might just want to run the code in a sandbox and collect the resulting object.
According to http://json.org, a string character in a JSON representation of string may be:
any-Unicode-character-
except-"-or--or-
control-character
\"
\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
So according to that list, the "json" you are getting is malformed at \x3
The reason why it works is because these two are equivalent.
JSON.parse( '{"a":"\x3cem"}' )
and
JSON.parse( '{"a":"<em"}' )
you string is passed to JSON.parse already decoded since its a literal \x3cem is actually <em
Now, \xxx is valid in JavaScript but not in JSON, according to http://json.org/ the only characters you can have after a \ are "\/bfnrtu.
answer is correct, but needs couple of modifications. you might wanna try this one: https://gist.github.com/Selmanh/6973863

Json parsing with unicode characters

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/ and i always get "unexpected token - eval fails"
I'm sorry, there realy was a problem with the syntax, it came this way from the client.
Can someone please help me? Thanks
Quoth the RFC:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?
There might be an obscure Unicode whitespace character hidden in your string.
This URL contains more detail:
http://timelessrepo.com/json-isnt-a-javascript-subset
In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.
The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.
string Padding = "000";
for (int f = 1; f <= 256; f++)
{
string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
HTML = HTML.Replace(Hex, Dec);
}
HTML = System.Web.HttpUtility.HtmlDecode(HTML);
Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.
I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked
I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.
I changed to Python3 and it works well now.
If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").