Packing an emoji as plain text unicode string php - json

I have a website and Unity project that communicate with one another through a web server using web sockets. I am encoding/decoding the messages I am sending using json. On the Unity side, I am using Newtonsoft for json and websocketsharp for WebSockets. Messages send fine and everything is working, but now I am trying to implement emojis in Unity to display correctly. I was able to create a sprite sheet of all emojis, create a dictionary with the key's being their Unicode and values being their position in the sprite sheet. The issue is that when I receive an emoji (for example the 🤐emoji Unicode: U+1F910), Unity receives it as "\uD83E\uDD10". Is there a way to send the emoji as a string literal of its Unicode? If not is there a way to parse the c# interpreted Unicode back to the original Unicode? I have found regex which converts more common symbols from the above format back to the corresponding symbol but does not give me back the Unicode as a string. Here is what I am currently using to do that:
var result = Regex.Replace(
arrivedMessages[0],
#"\\[Uu]([0-9A-Fa-f]{4})",
m => char.ToString(
(char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));
With the above code, if the user were to send a symbol such as º, the decoded json will read \u00ba, but the above regex will convert it back to º. When I try to send an emoji, such as the 🤐symbol, the json will read "\ud83e\udd10" and the regex result will be blank. Is there an issue with the regex? Or is there a better way to go about doing this? Thanks!
Edit:
To simplify the overall question: Is there a way to convert "\uD83E\uDD10" back to a string literal of the Unicode "U+1F910"

Here is the function I ended up using to convert the surrogate pairs as #Mr Lister pointed out:
string returnValue = "";
for (var i = 0; i < SurrogatePairString.Length; i += char.IsSurrogatePair(SurrogatePairString, i) ? 2 : 1)
{
var codepoint = char.ConvertToUtf32(SurrogatePairString, i);
// keep it uppercase for the regex, then when it is found, .ToLower()
returnValue = String.Format("U+{0:X4}", codepoint);
}

Related

Parse JSON in to Strings with escape characters for GWT Test Case

I've come up with a doubt around JSON files.
So, we're building a test case for a GWT application. The data it feeds from is in JSON files generated from a SQL database.
When testing the methods that work with data, we do it from sources held in String files, so to keep integrity with the original data, we just clone the original JSON values in to a String with escape sequences.
The result of this being that if a JSON entry shows like this:
{"country":"India","study_no":87}
The parsed result will come up like this in order for our tools to recognise them:
"[" + "{\"country\":\"India\",\"study_no\":87}" + "]"
The way we do it now is taking each JSON object and putting it between "" in IntelliJ, which automatically parses all double quotes in to escape sequences. This is ok if we only wanted a few objects, but What if we wanted a whole dataset?
So my question is, does anyone know or has created an opensource script to automate this tedious task?
One thing you could do is to wrap window.escape() using JsInterop or JSNI. For example:
#JsType(isNative="true", name="window")
public class window {
public native String escape(String toBeEscape);
}
and then apply to your results.

Weather Underground API is returning JSON with formatting characters (lots of \n\t and such). Is there any way to get unformatted JSON from them?

This is the response I get from Weather Underground:
"\n{\n \"response\": {\n \"version\":\"0.1\",\n \"termsofService\":\"http://www.wunderground.com/weather/api/d/terms.html\",\n \"features\": {\n \"geolookup\": 1\n }\n\t}\n\t\t,\t\"location\": {\n\t\t\"type\":\"INTLCITY\",\n\t\t\"country\":\"EG\",\n\t\t\"country_iso3166\":\"EG\",\n\t\t\"country_name\":\"Egypt\",\n\t\t\"state\":\"\",\n\t\t\"city\":\"Wadi El Natroon\",\n\t\t\"tz_short\":\"EET\",\n\t\t\"tz_long\":\"Africa/Cairo\",\n\t\t\"lat\":\"30.000000\",\n\t\t\"lon\":\"30.000000\",\n\t\t\"zip\":\"00000\",\n\t\t\"magic\":\"1\",\n\t\t\"wmo\":\"62357\",\n\t\t\"l\":\"/q/zmw:00000.1.62357\",\n\t\t\"requesturl\":\"global/stations/62357.html\",\n\t\t\"wuiurl\":\"http://www.wunderground.com/global/stations/62357.html\",\n\t\t\"nearby_weather_stations\": {\n\t\t\"airport\": {\n\t\t\"station\": [\n\t\t{ \"city\":\"Wadi El Natroon\", \"state\":\"\", \"country\":\"Egypt\", \"icao\":\"\", \"lat\":\"30.40250015\", \"lon\":\"30.36333275\" }\n\t\t,{ \"city\":\"Alexandria Borg El Arab\", \"state\":\"\", \"country\":\"EG\", \"icao\":\"HEBA\", \"lat\":\"30.91769981\", \"lon\":\"29.69639969\" }\n\t\t,{ \"city\":\"Alexandria\", \"state\":\"\", \"country\":\"EG\", \"icao\":\"HEAX\", \"lat\":\"31.18166733\", \"lon\":\"29.94638824\" }\n\t\t]\n\t\t}\n\t\t,\n\t\t\"pws\": {\n\t\t\"station\": [\n\t\t]\n\t\t}\n\t\t}\n\t}\n}\n"
As you can see there are a bunch of characters that aren't supposed to be there. Is there a different query to get unformatted JSON or do I have to parse all this garbage out before handing it off to a JSON parser? Am I in some sort of debug mode or something?
I think you are using the Restful for web services and you encoding the array in Json if you using the rest don't do encoding it's working by default in Json.
The newline (\n) and tab (\t) characters are probably listed by your debugger while the actual response contains formatted data (so the newlines display as an actual newline). This would not pose any problem to a JSON parser, just feed the data to it.
Oops.. It was a Gson issue (or an issue with my use of Gson), not a Weather Underground issue. Need to use:
val jsonObj = JsonParser().parse(it).asJsonObject
Instead of:
val jsonObj = gson.toJsonTree(it)
It is the JSON string. Code is in Kotlin.

JSON feed in UTF-8 without byte order marker

I have an WCF application written in C# that deliver my data in JSON or XML, depending on what the user asks for in the query string. Here is a quick snippet of my code that delivers the data:
Encoding utf8 = new System.Text.UTF8Encoding(false);
return WebOperationContext.Current.CreateTextResponse(data, "application/json", utf8);
When I deliver the data using above method, the special characters are all messed up. So Chávez looks like Chávez. On the other hand, if I create the utf8 variable above with the BOM or use the enum (Encoding.UTF8), the special characters are working fine. But then, some of my consumers are complaining that their code is throwing exception when consuming my API. This of course is happening because of the BOM in the feed. Is there a way for me to correctly display the special characters without the BOM in the feed?
It looks like the output is correct, but whatever you are using to display it expects ANSI encoded text. Chávez is what you get when you encode Chávez in UTF-8 and interpret the result as if it was Latin 1.

How To Convert JSON String That Contains Encoded Unicode

Could anyone tell me how to convert the following json object string, which contains encoded unicode characters (Chinese in this case) to human readable one using c# in asp.net?
records:[{"description":"\u849c\u8089","id":282}]
The string is submitted via Ajax from an Ext JS web application.
Any help is much appreciated.
There is no need to convert this string in any special manner. Any JSON decoder that more or less sticks to the specification will automatically create a correct string for the description attribute.
Update:
However, your current sample is not valid JSON. It's missing brackets or braces around the complete sample and it's missing double qutoes around records.
A correct JSON snippet would be:
{"records":[{"description":"\u849c\u8089","id":282}]}
Giving:
records:
[]
description: 蒜肉
id: 282
I am guessing it should be done as follows:
var bytes = Encoding.Unicode.GetBytes("<unicode string>");
// Return the Base64-encoded string.
string str = Convert.ToBase64String(b);

DataContractJsonSerializer not deserializing html entities

I am receiving data from a web service, and some of the strings have html entities in them, for example:
{"prop": "htmlentity - é"}
The é is not being parsed to é.
My question is twofold:
Is this even supposed to happen?
I looked through the JSON spec the best I could, but couldn't find any reference to html entities.
What is the right way to do this with a DataContractJsonSerializer?, if there is a right way?
You can call HttpUtility.HtmlDecode on the strings that contain HTML entities.
This is not the job of DataContractJsonSerializer, as the JSON spec only requires quotation mark, reverse solidus, and the control characters to be escaped.
This isn't a JSON serialization issue, this will be due to the data being sent over the web.
Serialization does not automatically encode HTML entities.
See:
var orig = new MyObj {prop = "htmlentity - é"};
var ser = new DataContractJsonSerializer(typeof(MyObj));
var ms = new MemoryStream();
ser.WriteObject(ms, orig);
var serialized = Encoding.UTF8.GetString(ms.GetBuffer(), 0, (int)ms.Length);
MessageBox.Show(serialized); // {"prop":"htmlentity - é"}
If you have control of the web service then you can verify this on the server side. If not, check with the provider of the web service.