DataContractJsonSerializer not deserializing html entities - json

I am receiving data from a web service, and some of the strings have html entities in them, for example:
{"prop": "htmlentity - é"}
The é is not being parsed to é.
My question is twofold:
Is this even supposed to happen?
I looked through the JSON spec the best I could, but couldn't find any reference to html entities.
What is the right way to do this with a DataContractJsonSerializer?, if there is a right way?

You can call HttpUtility.HtmlDecode on the strings that contain HTML entities.
This is not the job of DataContractJsonSerializer, as the JSON spec only requires quotation mark, reverse solidus, and the control characters to be escaped.

This isn't a JSON serialization issue, this will be due to the data being sent over the web.
Serialization does not automatically encode HTML entities.
See:
var orig = new MyObj {prop = "htmlentity - é"};
var ser = new DataContractJsonSerializer(typeof(MyObj));
var ms = new MemoryStream();
ser.WriteObject(ms, orig);
var serialized = Encoding.UTF8.GetString(ms.GetBuffer(), 0, (int)ms.Length);
MessageBox.Show(serialized); // {"prop":"htmlentity - é"}
If you have control of the web service then you can verify this on the server side. If not, check with the provider of the web service.

Related

How to display Blob /Json data field in Liferay 7.2?

I am using a service builder that is retrieving the form data fine from mysql db. I have a field that has the json data and I tried to map it using object mapper and using com.fasterxml.jackson.databind.ObjectMapper to display the json content. However, the Blob Data is shown as: com.mysql.cj.jdbc.Blob#4ca74f7f
How do I actually get/extract the data from the storing link above? Here is my code snippet:
for (ddmcontent MyTemp : myList) {
System.out.println("Content ID : "+myList.getContentId());
System.out.println("User Blob Data : "+myList.getData());
Blob responseBody =myList.getData();
ObjectMapper objectMapper = new ObjectMapper();
List<ModelData> myDat = objectMapper.readValue((DataInput)
responseBody,objectMapper.getTypeFactory().constructCollectionType
(List.class,ModelData.class));
for (ModelData dt : myDat) {
System.out.println("User Name : "+dt.Name);
System.out.println("Users Email : "+dt.Email);
}
}
Please note, I have defined my ModelData elements as all String.
Any suggestion? What am I missing?
Thanks in advance!
The toString() representation hints at the object's type com.mysql.cj.jdbc.Blob. If you look up its javadoc (or the interface it implements) you'll see the options that you have to decode the contents of the Blob, namely getting an InputStream or a byte[] representation, which you'd have to subject to the correct character set decoding to turn it into a String.
Make sure you nail the character set by testing it with all kinds of Unicode content, so that you don't have to fix badly encoded database content later when your table contains a lot of data in unknown encodings.
As you're using Liferay's Service Builder, you might want to share the relevant parts of your service.xml (optional model-hints.xml) to check for an easier implementation.
I finally got this working by changing the field in question to String and addining a max length to certain number of Char
Thanks!

Packing an emoji as plain text unicode string php

I have a website and Unity project that communicate with one another through a web server using web sockets. I am encoding/decoding the messages I am sending using json. On the Unity side, I am using Newtonsoft for json and websocketsharp for WebSockets. Messages send fine and everything is working, but now I am trying to implement emojis in Unity to display correctly. I was able to create a sprite sheet of all emojis, create a dictionary with the key's being their Unicode and values being their position in the sprite sheet. The issue is that when I receive an emoji (for example the 🤐emoji Unicode: U+1F910), Unity receives it as "\uD83E\uDD10". Is there a way to send the emoji as a string literal of its Unicode? If not is there a way to parse the c# interpreted Unicode back to the original Unicode? I have found regex which converts more common symbols from the above format back to the corresponding symbol but does not give me back the Unicode as a string. Here is what I am currently using to do that:
var result = Regex.Replace(
arrivedMessages[0],
#"\\[Uu]([0-9A-Fa-f]{4})",
m => char.ToString(
(char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));
With the above code, if the user were to send a symbol such as º, the decoded json will read \u00ba, but the above regex will convert it back to º. When I try to send an emoji, such as the 🤐symbol, the json will read "\ud83e\udd10" and the regex result will be blank. Is there an issue with the regex? Or is there a better way to go about doing this? Thanks!
Edit:
To simplify the overall question: Is there a way to convert "\uD83E\uDD10" back to a string literal of the Unicode "U+1F910"
Here is the function I ended up using to convert the surrogate pairs as #Mr Lister pointed out:
string returnValue = "";
for (var i = 0; i < SurrogatePairString.Length; i += char.IsSurrogatePair(SurrogatePairString, i) ? 2 : 1)
{
var codepoint = char.ConvertToUtf32(SurrogatePairString, i);
// keep it uppercase for the regex, then when it is found, .ToLower()
returnValue = String.Format("U+{0:X4}", codepoint);
}

is string invalid json object?

I have a question, and I can't find any doc about it.
Is string invalid object for json?
For an example, you can do this in any browser:
JS:
console.log(JSON.parse(JSON.stringify("asdf")));
Java (jackson):
ObjectMapper mapper = new ObjectMapper();
String string = mapper.writeValueAsString("asdf");
TextNode node = (TextNode)mapper.readTree(string);
System.out.println(node.getTextValue());
PHP:
echo json_decode(json_encode("asdf"));
But, as I can see, this parsers did not work with string as root object:
http://json.parser.online.fr
http://jsonparseronline.com
Also, from SWIFT documentation -
The top level object is an NSArray or NSDictionary.
According to this question, is it invalid to return json-formatted string from your controller (endpoint)?
example.com/notes/2/title
According to https://jsonlint.com, "asdf" is valid JSON. Some parsers are stricter than others. You definitely can't use it as the root for any other data though, because it's just a string, not an object or array.
Having said that, if you want an absolute definition, try reading the relevant RFC rather than documentation of a particular programming language. https://www.rfc-editor.org/rfc/rfc8259 dated Decemeber 2017 is the latest (at the time of writing this answer), as far as I know.
Specifically https://www.rfc-editor.org/rfc/rfc8259#section-2 says
A JSON text is a sequence of tokens. The set of tokens includes six
structural characters, strings, numbers, and three literal names.
A JSON text is a serialized value. Note that certain previous
specifications of JSON constrained a JSON text to be an object or an
array.
And later
Here are three small JSON texts containing only values:
"Hello world!"
42
true
So I would assume that the different parsers mentioned are implementing different versions of the spec.

Weather Underground API is returning JSON with formatting characters (lots of \n\t and such). Is there any way to get unformatted JSON from them?

This is the response I get from Weather Underground:
"\n{\n \"response\": {\n \"version\":\"0.1\",\n \"termsofService\":\"http://www.wunderground.com/weather/api/d/terms.html\",\n \"features\": {\n \"geolookup\": 1\n }\n\t}\n\t\t,\t\"location\": {\n\t\t\"type\":\"INTLCITY\",\n\t\t\"country\":\"EG\",\n\t\t\"country_iso3166\":\"EG\",\n\t\t\"country_name\":\"Egypt\",\n\t\t\"state\":\"\",\n\t\t\"city\":\"Wadi El Natroon\",\n\t\t\"tz_short\":\"EET\",\n\t\t\"tz_long\":\"Africa/Cairo\",\n\t\t\"lat\":\"30.000000\",\n\t\t\"lon\":\"30.000000\",\n\t\t\"zip\":\"00000\",\n\t\t\"magic\":\"1\",\n\t\t\"wmo\":\"62357\",\n\t\t\"l\":\"/q/zmw:00000.1.62357\",\n\t\t\"requesturl\":\"global/stations/62357.html\",\n\t\t\"wuiurl\":\"http://www.wunderground.com/global/stations/62357.html\",\n\t\t\"nearby_weather_stations\": {\n\t\t\"airport\": {\n\t\t\"station\": [\n\t\t{ \"city\":\"Wadi El Natroon\", \"state\":\"\", \"country\":\"Egypt\", \"icao\":\"\", \"lat\":\"30.40250015\", \"lon\":\"30.36333275\" }\n\t\t,{ \"city\":\"Alexandria Borg El Arab\", \"state\":\"\", \"country\":\"EG\", \"icao\":\"HEBA\", \"lat\":\"30.91769981\", \"lon\":\"29.69639969\" }\n\t\t,{ \"city\":\"Alexandria\", \"state\":\"\", \"country\":\"EG\", \"icao\":\"HEAX\", \"lat\":\"31.18166733\", \"lon\":\"29.94638824\" }\n\t\t]\n\t\t}\n\t\t,\n\t\t\"pws\": {\n\t\t\"station\": [\n\t\t]\n\t\t}\n\t\t}\n\t}\n}\n"
As you can see there are a bunch of characters that aren't supposed to be there. Is there a different query to get unformatted JSON or do I have to parse all this garbage out before handing it off to a JSON parser? Am I in some sort of debug mode or something?
I think you are using the Restful for web services and you encoding the array in Json if you using the rest don't do encoding it's working by default in Json.
The newline (\n) and tab (\t) characters are probably listed by your debugger while the actual response contains formatted data (so the newlines display as an actual newline). This would not pose any problem to a JSON parser, just feed the data to it.
Oops.. It was a Gson issue (or an issue with my use of Gson), not a Weather Underground issue. Need to use:
val jsonObj = JsonParser().parse(it).asJsonObject
Instead of:
val jsonObj = gson.toJsonTree(it)
It is the JSON string. Code is in Kotlin.

How to use JSON Sanitizer at Server Side?

I want to implement the 'JSON Sanitizer' validation as mentioned by OWASP.
My understanding is that this needs to be done in two places:
JSON data (in Request) received from Client or Other Systems - This needs to be sanitized at Server side before being processed
JSON data (in Response) to be sent to Client - This needs to be sanitized at Server side before being sent to client
Is it sufficient that I just call a sanitizing method in JSON
Sanitizing library on that JSON Data ?
Will that perform all sanitization or are there any other validations to be done in this regard ?
The OWASP JSON Sanitizer converts JSON-like input to syntactically valid & embeddable JSON.
It is typically used to take “JSON” produced by ad-hoc methods on the server like
"{ \"output\": " + stringOfJson + " }"
and make sure it's syntactically valid so that it can be passed to JSON.parse on the client, and embeddable so that it can be embedded in a larger HTML or XML response like
<script>var jsonUsedByScriptsOnPage = {$myJson};</script>
You can definitely use it on your server if your clients are likely to send dodgy JSON.
Note that your server still needs to treat the JSON as untrusted just as it would any other string it receives in a response that does not arrive with valid credentials.
https://github.com/OWASP/json-sanitizer#security explains
sanitizing JSON cannot protect an application from Confused Deputy attacks
var myValue = JSON.parse(sanitizedJsonString);
addToAdminstratorsGroup(myValue.propertyFromUntrustedSource);
The OWASP JSON Sanitizer doesn't cope with quotes screening - it splits string into several fields instead. So I've written own sanitize method, quite primitive though - if you see any security caveats, I'm open to suggestions, please share.
/**
* Helper methods to validate data.
*/
#UtilityClass
public class ValidationUtils {
/**
* Removes disallowed symbols from string to prevent input injection.
* #param input User input with possible injection.
* #return Value without injection-sensible symbols.
*/
public String sanateInjection(String input){
return input.replaceAll("[^A-Za-z0-9 ]", "");
}
}
I want to know whether some json string contains <script> tags which can later be used to execute dynamic content. But since the return value of the sanitize()method would escape it there is no way to detect whether something like that is in there. So the following works for me:
public static String checkJsonForScripts(String input) {
if (!JsonSanitizer.sanitize(input).equals(input)) {
log.error("Problematic string found" + input);
throw new YourException(...);
}
return input;
}