Encoding prob while reading JSON file from URL - json

So I am reading a JSON file on a remote server via standard node http.get. This JSON contains strings like this: NÖM Portionsbutter PG=20g 100er, but on my server this same string looks like this: N�M Portionsbutter PG=20g 100er.
The problem, I think, lies in the discrepancy between the header of the http request (content-type: text/plain;charset=UTF-8) and the actual encoding of the JSON file (ISO-8859-1).
I tried several versions of fixing this, like using iconv-lite's iconv.decode(data, 'latin1'), but again, these special chars ("Umlaute" in German) show up wrong.
Fun fact: Downloading the file via the browser, inspecting it via file -I file.json and getting text/plain; charset=iso-8859-1 and then using iconv.decode(data, 'latin1') works perfectly fine and the Umlaute are correct.
I am out of ideas right here ... what is the perfect way to properly parse a JSON file like this?

If the server uses an incorrect encoding, it's broken and should be fixed.
(It should use application/json, in which case the charset parameter is undefined/unused).

#Julian Reschke is obviously right but I found a way around this and am now able to read the json with all its Umlaute in it properly.
What I did is use request's encoding option and set it to null, so request is not trying to "parse" the response in any way. I then use iconv.decode(data, 'latin1') to set the proper encoding of the day and then JSON.parse it. works beautifully!

Related

RAD Server Delphi - using savetostream und loadfromstream does not work because of mutated vowels after Json conversion

I try to exchange Data via RadServer IIS Package and Delphi Client with EMSEndpoint.
What I try looks simple to me but I can't get it done now.
In the Package there is a TFDConnection pointing to a MSSql Server. TFDQuery is connected with that Connection.
With this code I create the JSON Response (Serverside):
var lStream: TStringStream := TStringStream.create;
FDQuery.SaveToStream(lStream,sfJSON);
AResponse.Body.SetStream(lStream,'application/json' ,True);
with that code I try to load the Dataset into TFDMemtable (Clientside):
lstrstream: TStringStream := TStringStream.create(EMSBackendEndpoint.Response.Content);
aMemtable.LoadFromStream(lstrstream, sfJSON);
The Memtable says [FireDac][Stan]-719 invalid JSON storage format
How could that be? I know where the Problem is, there are äöü Symbols in my Stream, but when I load that from one Component to the other it should work, shouldn't it?
Any suggestions what I can try? What I have tryed so far:
Loading JSON in Client over UTF8toUnicode. That let me load the Memtable but results in missing Letters like öäü
Changing UTF8toUnicode on the Serverside and backwords on the Client side. That leads to not readable JSON for the Memtable
Loading JSON into JSONString and Format it localy before loading into Memtable. That leads to not Readable JSON because also the Array and Object chars are quoted out.
JSON is most commonly exchanged using UTF-8, but by default TStringStream does not use UTF-8 on Windows, only on Posix systems. Try using TStringStream.Create(..., TEncoding.UTF8) to force UTF-8.
This assumes that FDQuery.SaveToStream() saves using UTF-8, and aMemtable.LoadFromStream() loads using UTF-8, otherwise you will still have an encoding mismatch.

binarized jason which looks good on browsers

I'm looking into an HTTP interface that returns (essentially) a JSON object.
When I access the URL by chrome or firefox, the JSON data is shown with appropriate indents. However, when I download it with curl etc, the data is binary.
I think the browsers know this binary encoding method and show it in a pretty format. (If I save it as a file from the browsers, it is a text file with the indents.)
What do you think this binary encoding is?
(Unfortunately, I can not upload the binary data here...)
[SOLVED]
Browsers send requests with headers but curl doesn't send header by default. That is the reason why I get the different response by these methods. My API returns binarized (compressed) json when called without a header.
You should have a look in the header of the HTTP response message which contains the binary data. There should be values about encoding, content-type and compression.
With this values you can decode the binary data.

getting percent ('%') characters and unicode codepoints in HTTP response

I'm connecting to a HTTP API through a simple GET request.
The expected response is a string, representing a JSON, that may contain hebrew (unicode) characters, but i get something like this (pasted only the beginning):
%u007b%u0020%u0022%u0053%u0074%u0061%u0074%u0075%u0073...
the result is the same whether i use ajax or the browser navigation bar directly.
The only place i get the expected json string is in Firefox console, by logging the response object, selecting it, and viewing the responseText property.
I can also replace the percent characters with backslashes, put the result in a unicode parser and get the correct string.
Anybody has any ideas as to what is going on?
The response appears to be encoded with the deprecated javascript function escape() which yields the %uXXXX encoding. If that is the case then the service should instead use encodeURIComponent() or encodeURI() referenced in the link above.
Your current workaround of manual un-encoding is the right way to go until the service is updated.

AFNetworking received non-English character: how to convert?

I am getting JSON response from some web server, say the server returns:
"kən.grætju'leiʃən"
I use AFNetworking and JSONKit, but what I've received is:
"æm'biʃən"
Not sure if it's AFNetworking's problem or JSONKit's problem, but any way, how to I parse and convert the string so it looks the same as from server?
Thanks
The server may be returning characters encoded in a way that violates the official JSON spec. If those characters are encoded as escaped unicode IDs (like \U1234) then JSONKit and NSJSONSerialization should both handle them fine.
If you can't change the server, you can work around the issue by URL-decoding the string - see https://stackoverflow.com/a/10691541/1445366 for some code to handle it. But if your server isn't following the correct specs, you're likely to run into other issues.

Why does SJCL report "this is not JSON" when trying to decode this JSON snippet?

I'm using SJCL, and it works fine with small ASCII strings.
But when I try to decode this piece of JSON (the result of the encryption of an HTML page) I get a "this is not JSON!" error.
The JSON has been produced by SJCL, and while I did encode it and decode it using LZW and base64 I don't get this error for small strings with the same workflow.
I tracked the error message origin to the decode function. I assume the regexes are failing but I don't understand why as this seems to be a perfectly formed JSON string to me.
However, I can be wrong as if I do a JavaScript eval on it it fails on a syntax error. But if I dump it in a file Python parse it fine.
The json that is at your this piece of json link starts and ends with a double-quote character. Is that actually part of the contents of the json? If it is, I believe that is your problem. Otherwise, it looks like valid json to me.
Ok I made a double passed base64 encoding. One before encryption, and one after. It seems that removing the first pass make it work.