json character encoding problem - json

When I encode an array to JSON I get "u00e1" instead of á.
How could I solve the character encoding?
Thanks

Your input data is not Unicode. 0xE1 is legacy latin1/ISO-8859-*/Windows-1252 for á. \u00e1 is the JSON/JavaScript to encode that. JSON must use a Unicode encoding.
Solve it by either fixing your input or converting it using something like iconv.

The browser's default encoding is probably Unicode UTF-8. Try
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">.

One problem can be if you check the response only (the response is only a text but the JSON must be an object).
You have to parse the response text to be a javascript object first (JSON.parse in javascript) and after that the characters will become the same as on the server side.
Example:
On the server in the php code:
$myString = "árvízrtűrő tükörfúrógép";
echo json_encode($myString); //this sends the encoded string via a protocol that maybe can handle only ascii characters, so the result on the client side is:
On the client side
alert(response); //check the text sent by the php
output: "\u00e1rv\u00edzrt\u0171r\u0151 t\u00fck\u00f6rf\u00far\u00f3g\u00e9p"
Make a js object from the respopnse
parsedResponse = JSON.parse(response);
alert(parsedResponse);
output: "árvízrtűrő tükörfúrógép"

Related

CICS TS(DFHJS2LS): Chinese characters are getting corrupted when received into MAINFRAME from POSTMAN tool

We have developed a webservice having CICS as the HTTP SERVER (service provider). This Webservice takes the input JSON (which has both English and Chinese characters) from any client/POSTMAN tool and will be processed in Mainframe (CICS).
DFHJS2LS: JSON schema to high-level language conversion for request-response services
We are using this proc - DFHJS2LS to enable webservices in Mainframe. ThisI BM provided procedure does the conversion of JSON to MF copybook and vice-versa. Also it converts the UTF-8 code unit into UTF-16 when it reaches mainframe copybook.
Issue:
The issue what we face now is on the Chinese characters. The Chinese characters which we pass in JSON are not getting converted properly and they are getting corrupted when it is received inside mainframe. The conversion from UTF-8 to UTF-16 is not happening (this is my suspect).
市 - this is the chinese character passed in JSON (POSTMAN).
Expected value in Mainframe copybook is 5E02(UTF-16 - hex value)
but we got 00E5 00B8 0082(UTF-8 hex value)
we have tried all header values and still no luck.....
content type = application/json
charset=UTF-8 / UTF-16
Your inputs are much appreciated in addressing this DBCS/unicode/chinese character issue.
In the COBOL are you declaring the filed that will receive the Chinese characters as Pic G :
01 China-Test-Message.
03 Msg-using-pic-x Pic X(10).
03 Msg-using-pic-g Pic G(4) Usage Display-1.
Try "USAGE NATIONAL" which shoul dmap to UTF-16 which is probably the code page for the chinese character.
RTFM here:-
https://www.ibm.com/support/knowledgecenter/SS6SG3_6.3.0/pg/concepts/cpuni01.html
The chinese conversion is resolved once we changed our HTTP header to this -
Content-Type = application/json;charset=UTF-8
thanks everyone for the support.

Strip backslashes from encoded JSON response

Building a Json respose with erlang. First I construct the data in terms and then use jsx to convert it to JSON:
Response = jsx:term_to_json(MealsListResponse),
The response actually is valid JSON according to the validators I have used:
The problem is when parsing the response in the front end. Is there a way to strip the backslashes from the Erlang side, so that the will not appear on the payload response?
The backslashes are not actually part of the string. They're just used when the string is printed as a term - that is, in the same way you'd write it in an Erlang source file. This works in the same way as character escapes in strings in C and similar languages: inside double quotes, double quotes that should be part of the string need to be escaped with backslashes, but the backslashes don't actually make it into the string.
To print the string without character escapes, you can use the ~s directive of io:format:
io:format("~s~n", [Response]).
If you're sending the response over a TCP socket, all you need to do is converting the string to binary with an appropriate Unicode conversion. Most of the time you'll want UTF-8, which you can get with:
gen_tcp:send(MySocket, unicode:characters_to_binary(Response)).

How to decode base64 unicode string using T-SQL

Can't decode turkish characters in base64 string.
Base64 string = "xJ/DvGnFn8Onw7bDlsOHxLDEnsOcw5w="
When I decode it must be like this : 'ğüişçöÖÇİĞÜÜ'
I try to decode like this :
SELECT CAST(
CAST(N'' AS XML).value('xs:base64Binary("xJ/DvGnFn8Onw7bDlsOHxLDEnsOcw5w=")' , 'VARBINARY(MAX)')
AS NVARCHAR(MAX)
) UnicodeEncoding ;
Based on this answer : Base64 encoding in SQL Server 2005 T-SQL
But have response like this : '鿄볃앩쎟쎧쎶쎖쒇쒰쎞쎜'
Base64 string is correct because when I try decode in Base64decode.org it works.
Is there any way to decode turkish characters?
Your base-64 encoded data contains an UTF-8 string. MS SQL doesn't support UTF-8, only UTF-16, so it fails for any characters outside of ASCII.
The solution is to either send the data as nvarchar right away, or to encode the string as UTF-16 (and send it as varbinary or base-64, as needed).
Based on Erlang documentation, this might require an external library, unicode: http://www.erlang.org/doc/apps/stdlib/unicode_usage.html
Basically, the default seems to be UTF-8, you need to specify UTF-16 manually. UTF-16 support seems a bit clunky, but it should be quite doable.

Scrambled umlauts since upgrade from JSON1 to JSON2 in Perl

I wondered why some german umlauts were scrambled on our page.
Then i found out that the recent version of JSON (i use 2.07) does convert strings in an other manner than JSON 1.5.
Problem here is that i have a hash with strings like
use Data::Dumper;
my $test = {
'fields' => 'überrascht'
};
print Dumper(to_json($test)); gives me
$VAR1 = "{ \"fields\" : \"\x{fc}berrascht\" } ";
Using the old module using
$json = JSON->new();
print Dumper ($json->to_json($test));
gives me (the correct result)
$VAR1 = '{"fields":[{"title":"überrascht"}]}';
So umlauts are scrammbled using the new JSON 2 module.
What do i need to get them correct?
Update: It might be bad to use Data::Dumper to show output, because Dumper uses its own encoding. Well, a difference in the result from Dumper shows that anything is treated differently here. It might be better to describe the backend as Brad mentioned:
The json string gets printed using Template-Toolkit and then gets assigned to a javascript variable for further use. The correct javascript shows something like this
{
"title" : "Geändert",
},
using the new module i get
{
"title" : "Geändert",
},
The target page is in 8859-1 (latin1).
Any suggestions?
\x{fc} is ü, at least in Latin-1, Latin-9 etc. Also, ü is codepoint U+00FC in Unicode. However, we want UTF-8 (I suppose). The easiest solution to get UTF-8 string literals is to save your Perl source code with this encoding, and put a use utf8; at the top of your script.
Then, encoding the string as JSON yields correct output:
use strict; use warnings; use utf8;
use Data::Dumper; use JSON;
print Dumper encode_json {fields => "nicht überrascht"};
The encode_json assumes UTF-8. Read the documentation for more info.
Output:
$VAR1 = '{"fields":"nicht überrascht"}';
(JSON module version: 2.53)
my $json_text = to_json($data);
is short for
my $json_text = JSON->new->encode($data);
This returns a string of Unicode Code Points. U+00FC is indeed the correct Unicode code point for "ü", so the output is correct. (As proof, the HTML source for that is actually "ü".)
It's hard to tell what your original output actually contained (since you showed non-ASCII characters), so it's hard to determine what your problem is actually.
But one thing you must do before outputing the string is to convert it from a string of code points into bytes, say, by using Encode's encode or encode_utf8.
my $json_cp1252 = encode('cp1252', to_json($data));
my $json_utf8 = encode_utf8(to_json($data));
If the appropriate encoding is UTF-8, you can also use any of the following:
my $json_utf8 = to_json($data, { utf8 => 1 });
my $json_utf8 = encode_json($data);
my $json_utf8 = JSON->new->utf8->encode($data);
Use encode_json instead. According to the manual it converts the given Perl data structure to a UTF-8 encoded, binary string.
Regarding your update: If you actually want to produce JSON in Latin1 (ISO-8859-1), you can try:
to_json($test, { latin1 => 1 })
Or
JSON->new->latin1->encode($test)
Note that if you dump the result, getting \x{fc} for ü is correct in this case. I guess that the root of your problem is that you receive text in Perl's UTF-8 format from somewhere. In this case, the latin1 option of the JSON module is needed.
You can also try to use ascii instead of latin1 as the safest option.
Another solution might be to specify an output encoding for Template-Toolkit. I don't know if that's possible. Or, you could encode your result as Latin1 in the final step before sending it to the client.
Strictly-speaking, Latin-1-encoded JSON is not valid JSON. The JSON spec allows UTF-8, UTF-16 or UTF-32 encodings.
If you want to be standards-compliant or you want to ensure your JSON will be compatible with both your current pages and future UTF-8-based pages, you need to use JSON->new->utf8->encode($str). Being strict about generated valid JSON could save you lots of headaches in the future.
You can translate UTF-8 JSON to Latin-1 using client-side Javascript if you need to, using this trick.
The ascii option also produces valid JSON, by escaping any non-ASCII characters using valid JSON unicode escapes. But the latin1 option does not, and therefore should be avoided IMHO. The utf8(0) option should be avoided too unless you specify an encoding when writing the data out to clients: utf8(0) is subtly different from the utf8 option in that it generates Perl character strings instead of byte strings. If you do any I/O using character strings without specifying an encoding, Perl will translate it on-the-fly back to Latin-1. The utf8 option generates raw UTF-8 bytes, which are perfect for doing raw I/O.

Json parsing with unicode characters

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/ and i always get "unexpected token - eval fails"
I'm sorry, there realy was a problem with the syntax, it came this way from the client.
Can someone please help me? Thanks
Quoth the RFC:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?
There might be an obscure Unicode whitespace character hidden in your string.
This URL contains more detail:
http://timelessrepo.com/json-isnt-a-javascript-subset
In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.
The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.
string Padding = "000";
for (int f = 1; f <= 256; f++)
{
string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
HTML = HTML.Replace(Hex, Dec);
}
HTML = System.Web.HttpUtility.HtmlDecode(HTML);
Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.
I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked
I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.
I changed to Python3 and it works well now.
If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").