Use Jackson To Stream Parse an Array of Json Objects - json

I have a file that contains a json array of objects:
[
{
"test1": "abc"
},
{
"test2": [1, 2, 3]
}
]
I wish to use use Jackson's JsonParser to take an inputstream from this file, and at every call to .next(), I want it to return an object from the array until it runs out of objects or fails.
Is this possible?
Use case:
I have a large file with a json array filled with a large number of objects with varying schemas. I want to get one object at a time to avoid loading everything into memory.
EDIT:
I completely forgot to mention. My input is a string that is added to over time. It slowly accumulates json over time. I was hoping to be able to parse it object by object removing the parsed object from the string.
But I suppose that doesn't matter! I can do this manually so long as the jsonParser will return the index into the string.

Yes, you can achieve this sort of part-streaming-part-tree-model processing style using an ObjectMapper:
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser(new File(...));
if(parser.nextToken() != JsonToken.START_ARRAY) {
throw new IllegalStateException("Expected an array");
}
while(parser.nextToken() == JsonToken.START_OBJECT) {
// read everything from this START_OBJECT to the matching END_OBJECT
// and return it as a tree model ObjectNode
ObjectNode node = mapper.readTree(parser);
// do whatever you need to do with this object
}
parser.close();

What you are looking for is called Jackson Streaming API. Here is a code snippet using Jackson Streaming API that could help you to achieve what you need.
JsonFactory factory = new JsonFactory();
JsonParser parser = factory.createJsonParser(new File(yourPathToFile));
JsonToken token = parser.nextToken();
if (token == null) {
// return or throw exception
}
// the first token is supposed to be the start of array '['
if (!JsonToken.START_ARRAY.equals(token)) {
// return or throw exception
}
// iterate through the content of the array
while (true) {
token = parser.nextToken();
if (!JsonToken.START_OBJECT.equals(token)) {
break;
}
if (token == null) {
break;
}
// parse your objects by means of parser.getXxxValue() and/or other parser's methods
}

This example reads custom objects directly from a stream:
source is a java.io.File
ObjectMapper mapper = new ObjectMapper();
JsonParser parser = mapper.getFactory().createParser( source );
if ( parser.nextToken() != JsonToken.START_ARRAY ) {
throw new Exception( "no array" );
}
while ( parser.nextToken() == JsonToken.START_OBJECT ) {
CustomObj custom = mapper.readValue( parser, CustomObj.class );
System.out.println( "" + custom );
}

This is a late answer that builds on Ian Roberts' answer. You can also use a JsonPointer to find the start position if it is nested into a document. This avoids custom coding the slightly cumbersome streaming token approach to get to the start point. In this case, the basePath is "/", but it can be any path that JsonPointer understands.
Path sourceFile = Paths.get("/path/to/my/file.json");
// Point the basePath to a starting point in the file
JsonPointer basePath = JsonPointer.compile("/");
ObjectMapper mapper = new ObjectMapper();
try (InputStream inputSource = Files.newInputStream(sourceFile);
JsonParser baseParser = mapper.getFactory().createParser(inputSource);
JsonParser filteredParser = new FilteringParserDelegate(baseParser,
new JsonPointerBasedFilter(basePath), false, false);) {
// Call nextToken once to initialize the filteredParser
JsonToken basePathToken = filteredParser.nextToken();
if (basePathToken != JsonToken.START_ARRAY) {
throw new IllegalStateException("Base path did not point to an array: found "
+ basePathToken);
}
while (filteredParser.nextToken() == JsonToken.START_OBJECT) {
// Parse each object inside of the array into a separate tree model
// to keep a fixed memory footprint when parsing files
// larger than the available memory
JsonNode nextNode = mapper.readTree(filteredParser);
// Consume/process the node for example:
JsonPointer fieldRelativePath = JsonPointer.compile("/test1");
JsonNode valueNode = nextNode.at(fieldRelativePath);
if (!valueNode.isValueNode()) {
throw new IllegalStateException("Did not find value at "
+ fieldRelativePath.toString()
+ " after setting base to " + basePath.toString());
}
System.out.println(valueNode.asText());
}
}

Related

Why does UserAuthExtensions.PopulateFromMap(session, jwtPayload) does not deserialize json values with escape correctly in ServiceStack.Auth?

We want to get the UserName from the ServiceStack session, but we find that the backslashes in the UserName are not deserialized as expected. The UserName has this format 'domainname\username' and serialized in a jwt token this looks like:
{
"typ": "JWT",
"alg": "HS256"
}.{
"iss": "ssjwt",
"iat": 1635952233,
"exp": 1635955833,
"name": "Robin Doe",
"preferred_username": "domainname\\robindoe"
}.[Signature]
After calling:
var sessionFromJwt = JwtAuthProviderReader.CreateSessionFromJwt(req);
userName = sessionFromJwt.UserName;
The userName variable contains the value 'domainname\\robindoe' instead of 'domainname\robindoe'.
After digging in the ServiceStack code, we pin this down to the PopulateFromMap() method in https://github.com/ServiceStack/ServiceStack/blob/36df74a8b1ba7bf06f85262c1155e1425c082906/src/ServiceStack/Auth/UserAuth.cs#L388.
To demonstrate this problem we have written a small program to prove the point:
class Program
{
static void Main(string[] args)
{
var jwtPayload = JsonObject.Parse(#"{
""iss"": ""ssjwt"",
""iat"": 1635952233,
""exp"": 1635955833,
""name"": ""John Doe"",
""preferred_username"": ""domainname\\username""
}");
var session = new AuthUserSession();
// The PopulateFromMap implementation does not deserialize the json values according to json standards
UserAuthExtensions.PopulateFromMap(session, jwtPayload);
// Notice that the session.UserName still has the escape character 'domainname\\username' instead of the expected 'domainname\username'
Console.WriteLine(session.UserName);
// The PopulateFromMap should deserialize also the values, like in test Can_dynamically_parse_JSON_with_escape_chars()
Can_dynamically_parse_JSON_with_escape_chars();
}
private const string JsonCentroid = #"{""place"":{ ""woeid"":12345, ""placeTypeName"":""St\\a\/te"" } }";
// Source: https://github.com/ServiceStack/ServiceStack.Text/blob/master/tests/ServiceStack.Text.Tests/JsonObjectTests.cs
public static void Can_dynamically_parse_JSON_with_escape_chars()
{
var placeTypeName = JsonObject.Parse(JsonCentroid).Object("place").Get("placeTypeName");
if (placeTypeName != "St\\a/te")
throw new InvalidCastException(placeTypeName + " != St\\a/te");
placeTypeName = JsonObject.Parse(JsonCentroid).Object("place").Get<string>("placeTypeName");
if (placeTypeName != "St\\a/te")
throw new InvalidCastException(placeTypeName + " != St\\a/te");
}
}
Why does UserAuthExtensions.PopulateFromMap(session, jwtPayload) does not deserialize json values with escape correctly in ServiceStack.Auth?
The issue is due to enumerating a JsonObject didn't return the same escaped string value as indexing it which has been resolved from this commit.
This change is available from v5.12.1+ that's now available on MyGet.

How can I deserialize an invalid json ? Truncated list of objects

My json file is mostly an array that contain objects but the list is incomplete, so I can't use the last entry. I would like to deserialize the rest of the file while discarding the last invalid entry
[ { "key" : "value1" }, { "key " : "value2"}, { "key
Please tell me if there is a way using Newtonsoft.Json library, or do I need some preprocessing.
Thank you!
Looks like on Json.NET 8.0.3 you can stream your string from a JsonTextReader to a JTokenWriter and get a partial result by catching and swallowing the JsonReaderException that gets thrown when parsing the truncated JSON:
JToken root;
string exceptionPath = null;
using (var textReader = new StringReader(badJson))
using (var jsonReader = new JsonTextReader(textReader))
using (JTokenWriter jsonWriter = new JTokenWriter())
{
try
{
jsonWriter.WriteToken(jsonReader);
}
catch (JsonReaderException ex)
{
exceptionPath = ex.Path;
Debug.WriteLine(ex);
}
root = jsonWriter.Token;
}
Console.WriteLine(root);
if (exceptionPath != null)
{
Console.WriteLine("Error occurred with token: ");
var badToken = root.SelectToken(exceptionPath);
Console.WriteLine(badToken);
}
This results in:
[
{
"key": "value1"
},
{
"key ": "value2"
},
{}
]
You could then finish deserializing the partial object with JToken.ToObject. You could also delete the incomplete array entry by using badToken.Remove().
It would be better practice not to generate invalid JSON in the first place though. I'm also not entirely sure this is documented functionality of Json.NET, and thus it might not work with future versions of Json.NET. (E.g. conceivably Newtonsoft could change their algorithm such that JTokenWriter.Token is only set when writing is successful.)
You can use the JsonReader class and try to parse as far as you get. Something like the code below will parse as many properties as it gets and then throw an exception. This is of course if you want to deserialize into a concrete class.
public Partial FromJson(JsonReader reader)
{
while (reader.Read())
{
// Break on EndObject
if (reader.TokenType == JsonToken.EndObject)
break;
// Only look for properties
if (reader.TokenType != JsonToken.PropertyName)
continue;
switch ((string) reader.Value)
{
case "Id":
reader.Read();
Id = Convert.ToInt16(reader.Value);
break;
case "Name":
reader.Read();
Name = Convert.ToString(reader.Value);
break;
}
}
return this;
}
Code taken from the CGbR JSON Target.
the second answer above is really good and simple, helped me out!
static string FixPartialJson(string badJson)
{
JToken root;
string exceptionPath = null;
using (var textReader = new StringReader(badJson))
using (var jsonReader = new JsonTextReader(textReader))
using (JTokenWriter jsonWriter = new JTokenWriter())
{
try
{
jsonWriter.WriteToken(jsonReader);
}
catch (JsonReaderException ex)
{
exceptionPath = ex.Path;
}
root = jsonWriter.Token;
}
return root.ToString();
}

Antlr4 StringTemplate not compatible with Json.net dynamic items

I would like to read a dynamic object from a json file and then use this in a stringTemplate.
The following code works.
dynamic data = new { bcName = "Lixam B.V", periodName = "July 2013" };
var engine = new Template("<m.bcName> <m.periodName>");
engine.Add("m", data);
engine.Render().Should().Be("Lixam B.V July 2013");
The following code fails
var json = "{bcName : 'Lixam B.V', periodName : 'July 2013'}";
dynamic data = JsonConvert.DeserializeObject(json);
string name = (data.bcName);
name.Should().Be("Lixam B.V"); // this passes
var engine = new Template("<m.bcName> <m.periodName>");
engine.Add("m", data);
engine.Render().Should().Be("Lixam B.V July 2013"); //fails
Is there another way to configure JsonConverter to be compatible with StringTemplate
You need to create an IModelAdaptor for whatever the compiled type representing dynamic is, and register it using TemplateGroup.RegisterModelAdaptor.
Inspired on Mr. Harwell's answer, I've implemented an IModelAdaptor that enable the usage of Newtonsoft.Json parsed objects.
Here it goes:
internal class JTokenModelAdaptor : Antlr4.StringTemplate.IModelAdaptor
{
public object GetProperty(
Antlr4.StringTemplate.Interpreter interpreter,
Antlr4.StringTemplate.TemplateFrame frame,
object obj,
object property,
string propertyName)
{
var token = (obj as JToken)?.SelectToken(propertyName);
if (token == null)
return null;
if (token is JValue)
{
var jval = token as JValue;
return jval.Value;
}
return token;
}
}
You just need to register the adaptor in your template group, like this:
template.Group.RegisterModelAdaptor(typeof(JToken), new JTokenModelAdaptor());

How do I pull out the JSON field I want using Jackson TreeNode and JsonNode?

I'm a little stumped why I can't pull the "Type" field out of my JSON stream to make a decision. It seems like this should be so easy.
I have the following JSON that I have as input:
[
{
"Institution":"ABC",
"Facility":"XYZ",
"Make":"Sunrise",
"Model":"Admission",
"SerialNumber":"",
"Revision":"1",
"Type":"ABC_Admission",
"ArchiveData":"<CSV file contents>"
}
]
In my Java I have a try-catch block with a JsonHolder class that implements Serializable to hold the JSON. Here's the Java I currently have:
try {
// Parse and split the input
JsonHolder data = JsonHolder.getField("text", input);
DataExtractor.LOG.info("JsonHolder data= " + data);
TreeNode node = data.getTreeNode();
DataExtractor.LOG.info("node size= " + node.size());
node = node.path("Type");
JsonNode json = (JsonNode) node;
DataExtractor.LOG.info("json= " + json.asText());
// code to decide what to do based on Type found
if (json.asText().equals("ABC_Admission")) {
// do one thing
} else {
// do something else
}
} catch (IOException iox) {
DataExtractor.LOG.error("Error extracting data", iox);
this.collector.fail(input);
}
When I run my code I get the following output (NOTE: I changed my package name where the class is to just for this output display)
25741 [Thread-91-DataExtractor] INFO <proprietary package name>.DataExtractor - JsonHolder data= [
{
"Institution":"ABC",
"Facility":"XYZ",
"Make":"Sunrise",
"Model":"Admission",
"SerialNumber":"",
"Revision":"1",
"Type":"ABC_Admission",
"ArchiveData":"<CSV file contents>"
}
]
25741 [Thread-91-DataExtractor] INFO <proprietary package name>.DataExtractor - node size= 1
25741 [Thread-91-DataExtractor] INFO <proprietary package name>.DataExtractor - json=
As you can see I don't get anything out. I just want to extract the value of the field "Type", so I was expecting to get the value "ABC_Admission" in this case. I would have thought the node path would separate out just that field from the rest of the JSON tree.
What am I doing wrong?
After consulting with another developer I found out the issue is my JSON is inside an array. So, I need to iterate over that array and then pull out the Type field from the object.
The updated code to resolve this is below:
try {
// Parse and split the input
JsonHolder data = JsonHolder.getField("text", input);
DataExtractor.LOG.info("JsonHolder data= " + data);
TreeNode node = data.getTreeNode();
String type = null;
// if this is an array of objects, iterate through the array
// to get the object, and reference the field we want
if (node.isArray()){
ArrayNode ary = (ArrayNode) node;
for (int i = 0; i < ary.size(); ++i) {
JsonNode obj = ary.get(i);
if (obj.has("Type")) {
type = obj.path("Type").asText();
break;
}
}
}
if (type == null) {
// Do something with failure??
}
DataExtractor.LOG.info("json= " + type);
if (type.equals("ABC_Admission")) {
// do one thing
else {
// do something else
}
} catch (IOException iox) {
DataExtractor.LOG.error("Error extracting data", iox);
this.collector.fail(input);
}

How can I JSON.stringify a Collection in Dart

How can I made a JSON string out of a collection in dart, as I can do it with Maps. The docs say I can pass a map or a an array into the JSON.stringify() method. But there are no Array data type in Dart and passing a collection gives me an exception.
I've a naive workaround, but I wonder if there will be a better way to do this:
String s = '[';
bool first=true;
_set.forEach(function(item){
if (first) {
first = false;
} else {
s+=',';
}
s += JSON.stringify(item);
});
s +=']';
print(s);
return s;
In Dart, you can get a JSON String out of an Object using the JsonEncoder's convert method. Here is an example:
import 'dart:convert';
void main() {
final jsonEncoder = JsonEncoder();
final collection1 = List.from([1, 2, 3]);
print(jsonEncoder.convert(collection1)); // prints [1,2,3]
final collection2 = List.from(['foo', 'bar', 'dart']);
print(jsonEncoder.convert(collection2)); // prints ["foo","bar","dart"]
final object = {'a': 1, 'b': 2};
print(jsonEncoder.convert(object)); // prints {"a":1,"b":2}
}
Passing a list works for me:
in the Dart VM importing dart-sdk/lib/frog/server/dart_json.dart
in Dartium importing json:dart
using this code:
void main() {
var list = new List.from(["a","b","c"]);
print(JSON.stringify(list));
}
prints this JSON snippet:
["a","b","c"]
Doesn't work for new Set.from(...) which is expected, given that JSON only deals in maps and lists.