How To Convert JSON String That Contains Encoded Unicode - json

Could anyone tell me how to convert the following json object string, which contains encoded unicode characters (Chinese in this case) to human readable one using c# in asp.net?
records:[{"description":"\u849c\u8089","id":282}]
The string is submitted via Ajax from an Ext JS web application.
Any help is much appreciated.

There is no need to convert this string in any special manner. Any JSON decoder that more or less sticks to the specification will automatically create a correct string for the description attribute.
Update:
However, your current sample is not valid JSON. It's missing brackets or braces around the complete sample and it's missing double qutoes around records.
A correct JSON snippet would be:
{"records":[{"description":"\u849c\u8089","id":282}]}
Giving:
records:
[]
description: 蒜肉
id: 282

I am guessing it should be done as follows:
var bytes = Encoding.Unicode.GetBytes("<unicode string>");
// Return the Base64-encoded string.
string str = Convert.ToBase64String(b);

Related

Processing JSON from a .txt file and converting to a DataFrame in Julia

Cross posting from Julia Discourse in case anyone here has any leads.
I’m just looking for some insight into why the below code is returning a dataframe containing just the first line of my json file. If you’d like to try working with the file I’m working with, you can download the aminer_papers_0.zip from the Microsoft Open Academic Graph site, I’m using the first file in that group of files.
using JSON3, DataFrames, CSV
file_name = "path/aminer_papers_0.txt"
json_string = read(file_name, String)
js = JSON3.read(json_string)
df = DataFrame([js])
The resulting DataFrame has just one line, but the column titles are correct, as is the first line. To me the mystery is why the rest isn’t getting processed. I think I can rule out that read() is only reading the first JSON object, because I can index into the resulting object and see many JSON objects:
enter image description here
My first guess was maybe the newline \n was causing escape issues, and tried to use chomp to get rid of them, but couldn’t get it to work.
Anyway - any help would be greatly appreciated!
I think the problem is that the file is in JSON Lines format, and the JSON3 library only returns the first valid JSON value that it finds at the start of a string unless told otherwise.
tl;dr
Call JSON3.read with the keyword argument jsonlines=true.
Why?
By default, JSON3 interprets a string passed to its read function as a single "JSON text", defined by RFC 8259 section 1.3.2:
A JSON text is a serialized value....
(My emphasis on the use of the indefinite singular article "a.") A "JSON value" is defined in section 1.3.3:
A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true.
A string with multiple JSON values in it is technically multiple "JSON texts." It is up to the parser to determine what part of the string argument you give it is a JSON text, and the authors of JSON3 chose as the default behavior to parse from the start of the string to the end of the first valid JSON value.
In order to get JSON3 to read the string as multiple JSON values, you have to give it the keyword option jsonlines=true, which is documented as:
jsonlines: A Bool indicating that the json_str contains newline delimited JSON strings, which will be read into a JSON3.Array of the JSON values. See jsonlines for reference. [default false]
Example
Take for example this simple string:
two_values = "3.14\n2.72"
Each one of these lines is a valid JSON serialization of a number. However, when passed to JSON3.read, only the first is parsed:
using JSON3
#assert JSON3.read(two_values) == 3.14
Using jsonlines=true, both values are parsed and returned as a JSON3.Array struct:
#assert JSON3.read(two_values, jsonlines=true) == [3.14, 2.72]
Other Packages
The JSON.jl library, which people might use by default given the name, does not implement parsing of JSON Lines strings at all, leaving it up to the caller to properly split the string as needed:
using JSON
JSON.parse(two_values)
# ERROR: Expected end of input
# Line: 1
# Around: ...3.14 2.72...
# ^
A simple way to implement reading multiple values is to use eachline:
#assert [JSON.parse(line) for line in eachline(IOBuffer(two_values))] == [3.14, 2.72]

Swift Json Single Quote Parse

I have a problem while parsing json which includes single quote. I am using JSONDecoder. I added response from API at below and I don't want to do any replacement or some regex operations. Are there any workaround for that?
"{\'value1\': true, \'value2\':\'2021-02-08\'}"
Your string is simply not valid JSON. Since it's not valid JSON there's no way to configure JSONDecoder to decode it.
If you aren't in charge of the service that outputs that string, the only thing you can do to use it with JSONDecoder is to modify the string to become valid JSON by, as you suggested, doing text replacement.

Packing an emoji as plain text unicode string php

I have a website and Unity project that communicate with one another through a web server using web sockets. I am encoding/decoding the messages I am sending using json. On the Unity side, I am using Newtonsoft for json and websocketsharp for WebSockets. Messages send fine and everything is working, but now I am trying to implement emojis in Unity to display correctly. I was able to create a sprite sheet of all emojis, create a dictionary with the key's being their Unicode and values being their position in the sprite sheet. The issue is that when I receive an emoji (for example the 🤐emoji Unicode: U+1F910), Unity receives it as "\uD83E\uDD10". Is there a way to send the emoji as a string literal of its Unicode? If not is there a way to parse the c# interpreted Unicode back to the original Unicode? I have found regex which converts more common symbols from the above format back to the corresponding symbol but does not give me back the Unicode as a string. Here is what I am currently using to do that:
var result = Regex.Replace(
arrivedMessages[0],
#"\\[Uu]([0-9A-Fa-f]{4})",
m => char.ToString(
(char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));
With the above code, if the user were to send a symbol such as º, the decoded json will read \u00ba, but the above regex will convert it back to º. When I try to send an emoji, such as the 🤐symbol, the json will read "\ud83e\udd10" and the regex result will be blank. Is there an issue with the regex? Or is there a better way to go about doing this? Thanks!
Edit:
To simplify the overall question: Is there a way to convert "\uD83E\uDD10" back to a string literal of the Unicode "U+1F910"
Here is the function I ended up using to convert the surrogate pairs as #Mr Lister pointed out:
string returnValue = "";
for (var i = 0; i < SurrogatePairString.Length; i += char.IsSurrogatePair(SurrogatePairString, i) ? 2 : 1)
{
var codepoint = char.ConvertToUtf32(SurrogatePairString, i);
// keep it uppercase for the regex, then when it is found, .ToLower()
returnValue = String.Format("U+{0:X4}", codepoint);
}

Raw string field value in JSON file

In my JSON file, one of the fields has to carry the content of another file (a string).
The string has CRLFs, single/double quotes, tabs.
Is there a way to consider my whole string as a raw string so I don't have to escape special characters?
Is there an equivalent in JSON to the string raw delimiter in C++?
In C++, I would just put the whole file content inside : R"( ... )"
Put simply, no there isn't. Depending on what parser you use, it may have a feature that allows this and/or there may be a variant of JSON that allows this (examples of variants include JSONP and JSON-C, though I'm not aware of one specifically that allows for the features you are looking for), but the JSON standard ubiquitous on the web does not support multiline strings or unescaped special characters.
A workaround for the lack of raw string support in JSON is to Base64 encode your string before adding it to your JSON.

Does HTML need to be encoded when passed back in JSON?

When passing HTML back through a response in JSON format, does it need to get encoded?
Yes. You would pass the HTML code in a string, so any quotation marks and backslashes in the code would have to be encoded.
Example:
<div onclick="alert('Line 1\nLine2');">show</div>
would be encoded into a string like this:
"<div onclick=\"alert('Line 1\\nLine2');\">show</div>"
and for example put in a JSON object representation like this:
{"html":"<div onclick=\"alert('Line 1\\nLine2');\">show</div>"}
Simple answer "no" JSON does not need to be encoded when passed back in JSON. JSON object should be DIRECTLY parsable by a javascript engine. Check the following:
JSON Standard
JSON Lint