Issues parsing a 1GB json file using JSON.NET - json

I have gotten an application where the input has been scaled up from 50K location records to 1.1 Million location records.
This has caused serious issues as the entire file was previously de-serialized into a single object.
The size of the object is ~1GB for a production like file with 1.1 Million records.
Due to large object GC issues I want to keep the de-serialized object below the 85K mark.
I'm trying to parse out a single location object at a time and de-serialize it so I can control the number of objects
that get de-serialized and in turn control the size of the object. I'm using the Json.Net libraries to do this.
Below is a sample of the JSON file that I'm receiving as a stream into my application.
{
"Locations": [{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
},
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}]
}
I need to be able to pull out the individual Location objects, so that I would be looking at the following
{
"LocationId": "",
"ParentLocationId": "",
"DisplayFlag": "Y",
"DisplayOptions": "",
"DisplayName": "",
"Address": "",
"SecondaryAddress": "",
"City": "",
"State": "",
"PostalCode": "",
"Country": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"LatLonQuality": 99,
"BusinessLogoUrl": "",
"BusinessUrl": "",
"DisplayText": "",
"PhoneNumber": "",
"VenueGroup": 7,
"VenueType": 0,
"SubVenue": 0,
"IndoorFlag": "",
"OperatorDefined": "",
"AccessPoints": [{
"AccessPointId": "",
"MACAddress": "",
"DisplayFlag": "",
"DisplayOptions": "",
"Latitude": 40.59485,
"Longitude": -73.96174,
"Status": "Up",
"OperatorDefined": "",
"RoamingGroups": [{
"GroupName": ""
},
{
"GroupName": ""
}],
"Radios": [{
"RadioId": "",
"RadioFrequency": "",
"RadioProtocols": [{
"Protocol": ""
}],
"WifiConnections": [{
"BSSID": "",
"ServiceSets": [{
"SSID": "",
"SSID_Broadcasted": ""
}]
}]
}]
}]
}
I'm trying to use the Json.NET JsonTextReader to accomplish this, however I cannot get the reader to contain an entire location in its buffer, due to the size of the records in the stream the reader initially will have down as far as "RadioProtocols", which is mid way through the object, by the time the stream reaches the end of the object, the reader has discarded the start of the object.
The code I'm using to try to get this functionality to work is
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(new StreamReader(stream)))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
do
{
reader.Read();
} while (reader.TokenType != JsonToken.EndObject && reader.Depth == 2);
var singleLocation = ser.Deserialize<Locations>(reader);
}
}
}
Any information on this or an alternative to doing it would be greatly appreciated. As a side note, the way our customers send the information cannot change at this time.

When the reader is positioned at the beginning of the object you want to deserialize (an entry in the Locations array in your case), you can just call ser.Deserialize<T>(reader) and it will work, advancing to the end of the object at that level, and no further. Thus the following should iterate through the Location objects in your file, loading each one separately:
public static IEnumerable<T> DeserializeNestedItems<T>(TextReader textReader)
{
var ser = new JsonSerializer();
using (var reader = new JsonTextReader(textReader))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
var item = ser.Deserialize<T>(reader);
yield return item;
}
}
}
}
And an example of use using your test string:
Debug.Assert(DeserializeNestedItems<Location>(new StringReader(json)).Count() == 2); // No assert.
var list = DeserializeNestedItems<Location>(new StringReader(json)).SelectMany(l => l.AccessPoints).Select(a => new { a.Latitude, a.Longitude }).ToList();
Debug.WriteLine(JsonConvert.SerializeObject(list, Formatting.Indented));
Which outputs:
[
{
"Latitude": 40.59485,
"Longitude": -73.96174
},
{
"Latitude": 40.59485,
"Longitude": -73.96174
}
]
Note - the Location class comes from posting your JSON to http://json2csharp.com/.

Thanks for all the help, I've managed to get it doing what I want which is de-serializing individual location objects.
If the item is converted to a JObject it will read in the full object and de-serialize it, this can be looped to get the solution.
This is the code that was settled on
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject && reader.Depth == 2)
{
location = JObject.Load(reader).ToObject<Location>();
var lv = new LocationValidator(location, FootprintInfo.OperatorId, FootprintInfo.RoamingGroups, true);
var vr = lv.IsValid();
if (vr.Successful)
{
yield return location;
}
else
{
errors.Add(new Error(elNumber, location.LocationId, vr.Error.Field, vr.Error.Detail));
if (errors.Count >= maxErrors)
{
yield break;
}
}
++elNumber;
}
}

Related

Accessing specific JSON values in a deluge script

I have a JSON API response that contains multiple entries, with different types of subscriptions and multiple users.
I need to search the list for a "user_name" AND a "subscription", then return any matching "duration". In some cases, there will be more than one "duration" for a user and subscription. I would need the total (sum) of the duration when there is more than one.
For example, here is a part of an example Json I am working with:
[
{
"id": 139387026,
"user_name": "John Smith",
"note": "",
"last_modify": "2022-03-28 14:16:35",
"date": "2022-03-28",
"locked": "0",
"addons_external_id": "",
"description": "",
"info": [
{
"subscription": "basic",
"duration": "22016",
}
]
},
{
"id": 139387027,
"user_name": "John Smith",
"note": "",
"last_modify": "2022-03-28 14:16:35",
"date": "2022-03-28",
"locked": "0",
"addons_external_id": "",
"description": "",
"info": [
{
"subscription": "advanced",
"duration": "10537",
}
]
},
{
"id": 139387028,
"user_name": "Martin Lock",
"note": "",
"last_modify": "2022-03-28 14:16:35",
"date": "2022-03-28",
"locked": "0",
"addons_external_id": "",
"description": "",
"info": [
{
"subscription": "basic",
"duration": "908",
}
]
},
]
So for example, for user_name: "John Smith" and subscription: "advanced", I need to return duration: "10537".
I've used toJsonlist(); to convert it, then used the code below, but it returns all values in the list. I can't figure out how to search for the specific values or add matching entries together.
rows = subscriptions.toJsonlist();
for each row in rows
{
info row;
user_name = row.getJson("user_name");
info "username: " + user_name;
subscription = row.getJson("subscription");
info "subscription: " + subscription;
subscriptionId = row.getJson("subscriptionId");
info "subscription Id: " + subscriptionId;
}
I'm fairly new to programming. Any help is appreciated!
According to your needs , you want to filter your JSON data and get the corresponding value from your filter in user_name and subcription.
Here is the Deluge Script for that. I use clear variable name so that it will not confused you.
//Your Entry Change this based on your filter
input_user_name = "John Smith";
input_subscription = "advanced";
//Your JSON data
json_string_data = '[ { "id": 139387026, "user_name": "John Smith", "note": "", "last_modify": "2022-03-28 14:16:35", "date": "2022-03-28", "locked": "0", "addons_external_id": "", "description": "", "info": [ { "subscription": "basic", "duration": "22016", } ] }, { "id": 139387027, "user_name": "John Smith", "note": "", "last_modify": "2022-03-28 14:16:35", "date": "2022-03-28", "locked": "0", "addons_external_id": "", "description": "", "info": [ { "subscription": "advanced", "duration": "10537", } ] }, { "id": 139387028, "user_name": "Martin Lock", "note": "", "last_modify": "2022-03-28 14:16:35", "date": "2022-03-28", "locked": "0", "addons_external_id": "", "description": "", "info": [ { "subscription": "basic", "duration": "908", } ] } ]';
//Declare the data as JSON
processed_json_data = json_string_data.toJsonlist();
initial_total_duration = 0;//Donot change this
list_of_duration = List();
total_duration_per_username_per_subscription = Map();
for each row in processed_json_data
{
if (row.get("user_name") == input_user_name )
{
info_list = row.get("info").toJSONList();
for each info_row in info_list
{
if (info_row.get("subscription") == input_subscription)
{
info_row_duration = info_row.get("duration").toLong(); // make it integer
list_of_duration.add(info_row_duration);
}
}
}
}
result_map = Map();
//Sum of list_of_duration
for each duration in list_of_duration
{
initial_total_duration = initial_total_duration + duration;
}
result_map.put("user_name",input_user_name);
result_map.put("subscription",input_subscription);
result_map.put("no_of_subscription",list_of_duration.size());
result_map.put("total_duration",initial_total_duration);
info result_map;
And the result should be
{"user_name":"John Smith","subscription":"advanced","no_of_subscription":1,"total_duration":10537}
You can test these script in https://deluge.zoho.com/tryout.
Thanks,
Von

Structure data inside a Document database

I'm building a app that is going to serve as an agenda or a scheduling book for some services. It's planned to the app to work in a "per customer" way, Angular in the front-end and to use the new Google Firestore, for all the real time and pseudo-backendless stuff.
The question is, as I have never worked with NoSQL databases before, how should I structure the data inside Firestore?
I thought about something that would look like this:
{
//Collection
"customer-a": {
//Document
"info": {
"name": "Customer A",
"Phones": [{
"contact-name": "",
"number": ""
}],
"address": {
"type": "street",
"name": "",
"number": 0,
"neighborhood": "",
"city": "",
"state": "",
"zipcode": "",
"country": "",
"coordinates": {
"latitude": "",
"longitude": ""
}
}
},
//Document
"config": {
"active-theme": "",
"themes": [{}]
},
//Document
"customer-data": {
//SubCollection
"employees": {
//Document
"employee-a": {
"name": "",
"phone": "",
"rattings": [],
"address": {
"type": "street",
"name": "",
"number": 0,
"neighborhood": "",
"city": "",
"state": "",
"zipcode": "",
"country": ""
},
//SubCollection
"schedulings": {
//Document
"2017": {
"total": 300,
"scheduling": [{
"client": {
"name": "",
"phone": "",
"rating": 5,
},
"price": 10,
"services": [{
"name": "",
"price": 10
}],
"datetime": "",
}]
},
"2018": {
"total": 300,
"scheduling": [{
"client": {
"name": "",
"phone": "",
"classificacao": 5,
},
"price": 10,
"services": [{
"name": "",
"price": 10
}],
"datetime": "",
}]
},
}
}
},
//SubCollection
"stock": {}
}
}
}
I'm thinking on splitting the scheduling into years because I think it may grow in size a lot, but as I said, I've never worked with this kind of database, so I don't really know how much data I could nest inside a document, for example...
As an exaggerated amount of data for a base calculation I thought about around 200k "objects" that would be stored for each employee per year, is that much or it's ok to store that much data nested?
Should I keep creating sub collections inside Firestore or it should be one single collections to store only different documents and nest everything?
Hope someone can help me,
Thanks.

Create JSON.NET structure with JTokenWriter

Hey all I have the following json output that I would like to create:
{
"scheduleName": "",
"firstName": "",
"lastName": "",
"theRole": "",
"linker": "",
"Schedule": {
"ID": "",
"totalHrs": "",
"Mon": "",
"Tue": "",
"Wed": "",
"Thu": "",
"Fri": "",
"Sat": ""
},
"empInfo": {
"ID": "",
"Email": "",
"Phone": "",
"Active": "",
"Img": "",
"Badge": ""
},
"availability": {
"ID": "",
"Mon": "",
"Tue": "",
"Wed": "",
"Thu": "",
"Fri": "",
"Sat": ""
},
"training": {
"name": "",
"id": ""
}
}
Using the newtonsoft Create JSON with JTokenWriter I am wondering how to create the "Schedule", "empInfo", etc in my json output since there are no examples on the page of those types.
The only example it shows is structured like so:
{
"name1": "value1",
"name2": [
1,
2
]
}
The first few values are easy to create:
Dim jsonWriter As New JTokenWriter()
jsonWriter.WriteStartObject()
jsonWriter.WritePropertyName("scheduleName")
jsonWriter.WriteValue("value1")
jsonWriter.WritePropertyName("firstName")
jsonWriter.WriteValue("value2")
jsonWriter.WritePropertyName("lastName")
jsonWriter.WriteValue("value3")
jsonWriter.WritePropertyName("theRole")
jsonWriter.WriteValue("value4")
jsonWriter.WritePropertyName("linker")
jsonWriter.WriteValue("value5")
'"?": {
' "?": "?",
' "?": "?",
' etc....
'?
jsonWriter.WriteEndObject()
But that's where I have to stop since I do not know how to go about making the other structure.
To write a nested object as the value of a property, write the property name, then do a nested WriteStartObject(), followed by the properties to be written, and finally a nested WriteEndObject(). E.g.:
Dim jsonWriter As New JTokenWriter()
jsonWriter.WriteStartObject() 'Start the root object
jsonWriter.WritePropertyName("scheduleName")
jsonWriter.WriteValue("value1")
jsonWriter.WritePropertyName("Schedule") 'Write the "Schedule" property name
jsonWriter.WriteStartObject() 'Start the nested "Schedule" object
jsonWriter.WritePropertyName("ID")
jsonWriter.WriteValue("ID Value")
jsonWriter.WriteEndObject() 'End the Schedule object
jsonWriter.WriteEndObject() 'End the root object
Sample fiddle.

Using resttemplate to read the array of json objects where the array has not got name

I will get the response from the server as shown below. How to write the classes so that i can get this data using resttemplate. The thing is here the array has not got name.
[
{
"place_id": "",
"licence": "",
"osm_type": "",
"osm_id": "",
"boundingbox": [],
"lat": "",
"lon": "",
"display_name": "",
"class": "",
"type": "",
"icon":
"address": {
"suburb": "",
"village": "",
"county": "",
"state_district": "",
"state": "",
"country": "",
"country_code": ""
}
},
{
"place_id": "",
"licence": "",
"osm_type": "",
"osm_id": "",
"boundingbox": [],
"lat": "",
"lon": "",
"display_name": "",
"class": "",
"type": "",
"icon":
"address": {
"suburb": "",
"village": "",
"county": "",
"state_district": "",
"state": "",
"country": "",
"country_code": ""
}
}
]
and my controller class would look like this
Addresses[] response = restTemplate.getForObject(url, Addresses[].class);
List<Addresses> objects = new ArrayList<Addresses>()
It doe's not matter how you call the outer class. I call it OuterType in my example:
package hello;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.PropertyNamingStrategy.LowerCaseWithUnderscoresStrategy;
import com.fasterxml.jackson.databind.annotation.JsonNaming;
#JsonIgnoreProperties(ignoreUnknown = true)
#JsonNaming(LowerCaseWithUnderscoresStrategy.class)
public class OuterType {
// For simplicity properties are public
public String placeId;
public String licence;
public String osmType;
public String osmId;
public BoundingBox[] boundingBox;
public Address address;
}
restTemplate call:
OuterType[] response = restTemplate.getForObject(url, OuterType[].class);

How to Parse this Json using Gson and get the field I want?

{
"ws_result":
[
{
"token": "",
"norm_token": "",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child":
[
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child":
[
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child": [ ]
},
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child": [ ]
}
]
},
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child":
[
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child": [ ]
}
]
}
]
},
{
"token": "",
"norm_token":"",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "2",
"child": [ ]
},
{
"token": "",
"norm_token": "",
"len": "",
"type": "",
"pos": "",
"prop": "",
"stag": "",
"child": [ ]
}
]
}
Such that some children are empty some is not, and some children contain more children. How do I actually parse this thing and get what I want. I am totally new with Json, and I am trying to use Gson. What I want is to get a value of a token with specific type in the nested Json. Thanks a lot for any help and directions.
I tried use com.google.gson.stream.JsonReader, but ist not working
JsonReader jsonReader = new JsonReader(new StringReader(result));
jsonReader.beginObject();
while(jsonReader.hasNext()){
String field = jsonReader.nextName();
if (field.equals("type")){
System.out.println(jsonReader.nextString());
} else if (field.equals("token")){
System.out.println(jsonReader.nextString());
} else {
jsonReader.skipValue();
}
}
jsonReader.endObject();
Parse your json recursively like this:
http://snipplr.com/view/71742/java-reflection-and-recursive-json-deserializer-using-gson/
private void parse(JsonObject o, PackagingResponse r){
Iterator<Entry<String, JsonElement>> i = o.entrySet().iterator();
while(i.hasNext()){
Entry<String, JsonElement> e = i.next();
JsonElement el = e.getValue();
if(el.isJsonObject())
parse(el.getAsJsonObject(), r);
//......
}
}