serialize JSON string of infinite size - json

Okay this one has me scratching my head.
I have an API returning a response body that is a JSON string.
The the data is such that it will continue to grow every week. That is I have a single JSON response that is going to potentially grow forever until it either: 1) consumes the world as we know it - or 2) someone on the API side learns about pagination.
So how do I serialize this if I don't know the size?
I'm thinking it has to be done in chunks, as I have already figured out how to chunk the response to file, but how do I serialize in chunks?
This is really a concept problem so responses in any language will work as long as they can get at the heart of the problem.
I've so far tried stream reading character by character in C# to test for JSON artifacts ({},[] -etc) but the performance was glacial. About 20min to read 1000kb.
I feel like i'm reinventing the wheel here...
EDIT: In Response to #stdunbar 's link.
I've just tried using the solution described in the link with JSON.Net's JsonReader class. A couple problems.
1). There is never any info on JSON.NET's documentation about what a "JSON token" is. Is it the node name and the node value? Is it just the node name?
2). If my data set is potentially infinite, that means my node values are potentially infinite. For example there is a node in my JSON response containing a list. That list grows as the data does. So if I load that list using JsonReader it might just be 1TB some day and wreck my server.

Related

Utf8JsonReader from a Stream

I'm trying to read a sequence of JSON objects from a network stream. This involves finding complete JSON objects and returning them one by one. As soon as a complete JSON object was received, I need to have it. Anything else that follows that JSON object is for the next object and must only be used when the next complete object was received.
I would have thought that the Utf8JsonReader class could do that but apparently it cannot accept a Stream of any kind. It even seems to be unwanted to have that possibility.
Now I'm wondering if it's possible at all to use .NET's shiny new JSON parser to read from a stream when I don't know when data arrives and how much of it. Do I need to split the JSON object messages manually or can the already existing reader just stop when it has something and continue when the next thing is available? I mean, if it can do that on a predefined array of bytes, it could surely also do it with some waiting in between until more data is available. That just doesn't seem to be exposed in the public API. Or did I miss something?
The JsonDocument.ParseAsync(Stream) method cannot be used because it would read the stream to the end. That doesn't make sense for a network stream that stays open for a long time and just reads some data from time to time.

What are developers their expectations when receiving a JSON response from a server

I have java library that runs webservices and these return a response in XML. The webservices all revolve around giving a list of details about items. Recently, changes were made to allow the services to return JSON by simply converting the XML to JSON. When looking at the responses, I saw they're not as easy to parse as I thought. For example, a webservice that returns details about items.
If there are no items, the returned JSON is as follows:
{"ItemResponse":""}
If there is 1 item, the response is as follows (now itemResponse has a object as value instead of a string):
{"ItemResponse":{"Items":{"Name":"Item1","Cost":"$5"}}}
If there two or more items, the response is (now items has an array as value instead of an object):
{"ItemResponse":{"Items":[{"Name":"Item1","Cost":"$5"},{"Name":"Item2","Cost":"$3"}]}}
To parse these you need several if/else which I think are clunky.
Would it be an improvement if the responses were:
0 items: []
1 item: [{"Name":"Item1","Cost":"$5"}]
2 items: [{"Name":"Item1","Cost":"$5"},{"Name":"Item2","Cost":"$3"}]
This way there is always an array, and it contains the itemdata. An extra wrapper object is possible:
0 items: {"Items":[]}
1 item: {"Items":[{"Name":"Item1","Cost":"$5"}]}
2 items: {"Items":[{"Name":"Item1","Cost":"$5"},{"Name":"Item2","Cost":"$3"}]}
I'm not experienced in JSON so my question is, if you were a developer having to use these webservices, how would you expect the JSON resonse to be formatted? Is it better to always return a consistent array, even if there are no items or is this usually not important? Or is an array not enough and do you really expect a wrapper object around the array?
What are conventions/standards regarding this?
Don't switch result types, always return an array if there are more items possible. Do not mix, for 1 item an object for more an array. That's not a good idea.
Another best practise is that you should version your API. Use something like yoursite.com/api/v1/endpoint. If you don't do this and you change the response of your API. All your client apps will break. So keep this in mind together with documentation. (I've seen this happen a lot in the past..)
As a developer I personally like your second approach, but again it's a preference. There is no standard for this.
There are several reasons to use json:
much more dense and compact: thus data sent is less
in javascript you can directly access those properties without parsing anything. this means you could convert it into an object read the attributes (often used for AJAX)
also in java you usually don't need to parse the json by yourself - there are several nice libs like www.json.org/java/index.html
if you need to know how json is build ... use google ... there tons of infos.
To your actual questions:
for webservices you often could choose between xml and json as a "consumer" try:
https://maps.googleapis.com/maps/api/place/textsearch/json
and
https://maps.googleapis.com/maps/api/place/textsearch/xml
there is no need to format json visually - is it not meant for reading like xml
if your response doesn't have a result, json-service often still is giving a response text - look again at the upper google map links - those are including a response status which makes sense as it is a service.
Nevertheless it's the question if it is worth converting from xml to json if there isn't a specific requirement. As Dieter mentioned: it depends on who is already using this service and how they are consumed ... which means the surrounding environment is very important.

Debugging json4s read deserialization errors

I am attempting to consume an API that I do not have control over which is somewhat poorly documentented and somewhat inconsistent. This means that sometimes, the API returns a different type than what is documented or what you would normally see. For this example, we'll look at a case when an array was returned in a place where I would normally see a string. That makes a crappy API, but my real problem is: How can I more easily track those things down? Right now, the errors look something like this:
No usable value for identifier
Do not know how to convert JArray(List(JString(3c8723eceb1a), JString(cba8849e7a2f))) into class java.lang.String
After deciphering the problem (why JValue::toString doesn't emit a JSON string is utterly perplexing to me), I can figure out the API returned an array when I made my case class only able to deal with Strings. Great. My issue is that finding this discrepancy between my object model and the contents of the JSON seems significantly more difficult than it should be.
Currently, this is my workflow for hunting down decoding errors:
Hope bad data has some sort of identifying marker. If this is not true, then it is way more guesswork and you will have to repeat the following steps for each entry that looks like the bad bits.
Go through the troubles of converting the JArray(List(JString(...), ...)) from the error message into valid JSON, hoping that I encode JSON the same way at the API endpoint I got the data from does. If this is not true, then I use a JSON formatter (jq) to format all data consistently.
Locate the place in the source data where the decoding error originates from.
Backtrack through arrays and objects to discover how I need to change my object model to more accurately represent what data is coming back to me from the API.
Some background: I'm coming from C++, where I rolled my own JSON deserialization framework for this purpose. The equivalent error when using the library I built is:
Error decoding value at result.taskInstances[914].subtasks[5].identifier: expected std::string but found array value (["3c8723eceb1a","cba8849e7a2f"]) at 1:4084564
This is my process when using my hand-rolled library:
Look at the expected type (std::string) compared with the data that was actually found (["3c8723eceb1a","cba8849e7a2f"]) and alter my data model for the path for the data in the source (result.taskInstances[914].subtasks[5].identifier)
As you can see, I get to jump immediately to the problem that I actually have.
My question is: Is there a way to more quickly debug inconsistencies between my data model and the results I'm getting back from the API?
I'm using json4s-native_2.10 version 3.2.8.
A simplified example:
{ "property": ["3c8723eceb1a", "cba8849e7a2f"] }
Does not mesh with Scala class:
case class Thing(property: String)
The best solution would be to use Try http://www.scala-lang.org/api/current/#scala.util.Try in Scala, but unfortunately json4s API cannot.
So, I think you should use Scala Option type http://www.scala-lang.org/api/current/#scala.Option .
In Scala, and more generally in functional languages, Options are used to represent an object that can be there or not (like à nil value).
For handle parsing failures, you can use parse(str).toOption, which is a function that return an Option[JValue], and you can doing a pattern matching on the resulting value.
For handling extraction of data extraction into case classes, you can use extractOpt function, to do pattern matching on the value.
You can read this answer : https://stackoverflow.com/a/15944506/2330361

Logging Multiple JSON Objects to a Single File - File Format

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.
With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.

JSON vs HTML Ajax response

Which is faster, to return ajax in JSON and then process JSON response to render the html, or just have the Ajax response the raw html in a bunch of <li></li>'s?
Depends. In both cases, the server is simply returning a response with text. If the JSON version of the response requires more characters than the HTML version, that response will take longer to be transmitted back to the client, and vice versa.
But of course there is also the server-side script which must do its work. Perhaps in your case generating JSON is faster than HTML from your server-side script. No way for me to know.
And then there is the client-side processing. You'd have to parse the response to turn it into a true object, and then you'd need to iterate over the resulting object in order to generate the HTML. This will definitely take longer than just taking an HTML response and injecting it into the DOM.
However, I doubt that the performance difference will be noticeable, meaning that your decision about providing a JSON response vs. HTML response should be based on other factors.
As already mentioned, that depends. From a server side point of view it makes a lot of sense to let the client generate the HTML because just serializing JSON is faster and takes a lot of strain off the server because it doesn't have to deal with all the HTML generation. An additional benefit is, that you offer an API when returning JSON that can be used for more than just outputting HTML.
If you want to take the work off the client it makes sense to generate the HTML on the server side.
In the end the speed of it depends a lot on the technologies used. Both ways can perform extremely well but when done wrong either one will be slow.
here as you can see i did the same response by HTML and JSON. the JSON response equals weight half what HTML response as kilobyte ,Thats means faster server-side respone. but in this case you have to rebulid the html from json so let calculate the json rebuild time and see
the first one is the html so it takes more time to do the server respone
now lets see appending it to html document
first one is html again the html proccess last longer than Json