Jackson: deserializing (parsing) null Unicode - json

I use Jackson to deserialize (parse) a simple JSON event, with code like this:
JsonParser parser = ... // Initialized via JsonFactory for simple JSON String
ObjectMapper mapper = new ObjectMapper();
HashMap<String, Object> attributes = mapper.readValue(parser,
new TypeReference<HashMap<String, Object>>() {});
The code works as expected for the several cases I have tested it against, apart from when the JSON input contains the Unicode null value (\u0000).
More specifically, if the JSON String above has a key-value pair that contains Unicode, e.g.
{
... (start K-V pairs),
"UniKey":"\u0000...",
... (end K-V pairs)
}
the parser correctly reads all "start K-V pairs" (which contain no null Unicode) into the attributes HashMap but stops deserialization immediately on encountering the null Unicode value of "UniKey", returning an empty value and never parsing the rest of the JSON String (i.e., the "end K-V pairs").
Is there any way of telling Jackson to ignore null Unicode in deserialization?

Strings containing null (\u0000) are read/printed by some Java methods and not by others, so they are only displayed as truncated. So the value may actually be there, but not displayed by something like System.out.println().

Related

GSON | Extract JSON's Root Name | JsonPath Or JsonPointer

I am looking at extracting the root element of a JSON document. It looks like this is possible neither using JsonPointer nor JsonPath as my attempts to look up for such an expression has been unsuccessful. Any tips would be appreciated. TIA.
Sample document:
{
"MESSAGE1_ROOT_INPUT": {
"CTRL_SEG": "test"
}
}
The below using gson 2.9.0:
$.*~
produces:
{"CTRL_SEG": "test"}
while JSONPath Online produces this:
[
"MESSAGE1_ROOT_INPUT"
]
The attempt is to get text "MESSAGE1_ROOT_INPUT" using JsonPath/JsonPointer expression(s). Note that, extracting this the traditional (substring or regex on a stringified json text) way, would preferably be my last resort.
Background: We are building an API service that accepts JSON documents with different roots. Such as, MESSAGE2_ROOT_INPUT, MESSAGE3_ROOT_INPUT, etc. It is based on this, the routing of a message further will occur.
Supported/Employed Languages: Java/GSON Library/RegEx
Gson does not natively support JSONPath or JSON Pointer. However, you can quite efficiently obtain the name of the first property using JsonReader:
public static String getFirstPropertyName(Reader reader) throws IOException {
// Don't have to call JsonReader.close(); that would just close the provided reader
JsonReader jsonReader = new JsonReader(reader);
jsonReader.beginObject();
return jsonReader.nextName();
}
There are however two things to keep in mind:
This only reads the beginning of the JSON document; it neither verifies that the complete JSON document has valid syntax, nor checks if there might be more top-level properties
This consumes some data from the Reader; to further process the data you have to buffer the data to allow re-reading it again (you can also first store the JSON in a String and pass a StringReader to JsonReader)

How to split the data of NodeObject in Apache Flink

I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc).
In my case, the data source is Pravega, which provided me a flink connector.
My data source is sending me some JSON data as below:
{"key": "value"}
{"key": "value2"}
{"key": "value3"}
...
...
Here is my piece of code:
PravegaDeserializationSchema<ObjectNode> adapter = new PravegaDeserializationSchema<>(ObjectNode.class, new JavaSerializer<>());
FlinkPravegaReader<ObjectNode> source = FlinkPravegaReader.<ObjectNode>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(adapter)
.build();
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> dataStream = env.addSource(source).name("Pravega Stream");
dataStream.map(new MapFunction<ObjectNode, String>() {
#Override
public String map(ObjectNode node) throws Exception {
return node.toString();
}
})
.keyBy("word") // ERROR
.timeWindow(Time.seconds(10))
.sum("count");
As you see, I used the FlinkPravegaReader and a proper deserializer to get the JSON stream coming from Pravega.
Then I try to transform the JSON data into a String, KeyBy them and count them.
However, I get an error:
The program finished with the following exception:
Field expression must be equal to '*' or '_' for non-composite types.
org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:342)
org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:340)
myflink.StreamingJob.main(StreamingJob.java:114)
It seems that KeyBy threw this exception.
Well, I'm not a Flink expert so I don't know why. I've read the source code of the official example WordCount. In that example, there is a custtom splitter, which is used to split the String data into words.
So I'm thinking if I need to use some kind of splitter in this case too? If so, what kind of splitter should I use? Can you show me an example? If not, why did I get such an error and how to solve it?
I guess you have read the document about how to specify keys
Specify keys
The example codes use keyby("word") because word is a field of POJO type WC.
// some ordinary POJO (Plain old Java Object)
public class WC {
public String word;
public int count;
}
DataStream<WC> words = // [...]
DataStream<WC> wordCounts = words.keyBy("word").window(/*window specification*/);
In your case, you put a map operator before keyBy, and the output of this map operator is a string. So there is obviously no word field in your case. If you actually want to group this string stream, you need to write it like this .keyBy(String::toString)
Or you can even implement a customized keySelector to generate your own key.
Customized Key Selector

How can I define a ReST endpoint that allows json input and maps it to a JsonSlurper

I want to write an API ReST endpoint, using Spring 4.0 and Groovy, such that the #RequestBody parameter can be any generic JSON input, and it will be mapped to a Groovy JsonSlurper so that I can simply access the data via the slurper.
The benefit here being that I can send various JSON documents to my endpoint without having to define a DTO object for every format that I might send.
Currently my method looks like this (and works):
#RequestMapping(value = "/{id}", method = RequestMethod.PUT, consumes = MediaType.APPLICATION_JSON_VALUE)
ResponseEntity<String> putTest(#RequestBody ExampleDTO dto) {
def json = new groovy.json.JsonBuilder()
json(
id: dto.id,
name: dto.name
);
return new ResponseEntity(json.content, HttpStatus.OK);
}
But what I want, is to get rid of the "ExampleDTO" object, and just have any JSON that is passed in get mapped straight into a JsonSlurper, or something that I can input into a JsonSlurper, so that I can access the fields of the input object like so:
def json = new JsonSlurper().parseText(input);
String exampleName = json.name;
I initially thought I could just accept a String instead of ExampleDTO, and then slurp the String, but then I have been running into a plethora of issues in my AngularJS client, trying to send my JSON objects as strings to the API endpoint. I'm met with an annoying need to escape all of the double quotes and surround the entire JSON string with double quotes. Then I run into issues if any of my data has quotes or various special characters in it. It just doesn't seem like a clean or reliable solution.
I open to anything that will cleanly translate my AngularJS JSON objects into valid Strings, or anything I can do in the ReST method that will allow JSON input without mapping it to a specific object.
Thanks in advance!
Tonya

Why is GSON not parsing these fields properly? (FieldNamingPolicy.LOWER_CASE_WITH_UNDERSCORES)

I have a JSONArray of JSONObjects that I'm trying to parse with GSON. I'm using FieldNamingPolicy.LOWER_CASE_WITH_UNDERSCORES. It's parsing correctly for most fields (so the FieldNamingPolicy is set correct), but I'm getting null returned for
{
"image_sq_48x48_url": "url1",
"image_sq_64x64_url": "url2",
"image_sq_96x96_url": "url3"
}
with field names
imageSq48x48Url
imageSq64x64Url
imageSq96x96Url
Maybe a better question would be what is the proper camelCase? I have also tried
imageSq48X48Url
imageSq48X48url
If I map with #SerializedName("image_sq_96x96_url") it parses/populates correctly.
Unfortunately those fieldnames in your JSON don't conform to what Gson looks for using that strategy.
If you create a POJO and serialize it, you can see what the issue is:
class MyPojo
{
String imageSq48x48Url = "hi";
}
The resulting JSON from Gson using that strategy is:
{"image_sq48x48_url":"hi"}
It doesn't consider/look at numeric digits as leading indicators / start of a "word".
If you rename the field to:
String imageSq_48x48Url;
It would work with your JSON example and that strategy.
Basically, you either need to create your own class that implements FieldNamingStrategy that will handle those JSON fieldnames the way you want, or do what you're doing with the #SerializedName annotation.

JSON accepted format according to Newtonsoft Json

I am trying to parse some JSON objects which is made just of (string,string) pairs, in order to emulate Resjson behaviour. The file I am parsing contains this.
{
"greeting":"Hello world",
"_greeting.comment":"Hello comment.",
"_greeting.source":"Original Hello",
}
Please note the last comma is incorrect, and I also used http://jsonlint.com/ to test JSON syntax. It tells me it is incorrect, as I expected. My - slightly modified - code is :
string path = #"d:\resjson\example.resjson";
string jsonText = File.ReadAllText(path);
IDictionary<string, string> dict;
try
{
dict = JsonConvert.DeserializeObject<IDictionary<string, string>>(jsonText);
}
catch(Exception ex)
{
// code never reaches here
}
My above code returns the IDictionary with the 3 keys as if the formatting was correct. If I serialize back, the string obtained is without the last comma.
My questions are :
Is Newtonsoft.Json so permissive that it allows users slight errors ?
If so, can I set the permissiveness so that it is more strict ?
Is there a way to check if a string is valid JSON format, using
Newtonsoft.Json with and/or without the permissiveness?