Utf8JsonReader from a Stream - json

I'm trying to read a sequence of JSON objects from a network stream. This involves finding complete JSON objects and returning them one by one. As soon as a complete JSON object was received, I need to have it. Anything else that follows that JSON object is for the next object and must only be used when the next complete object was received.
I would have thought that the Utf8JsonReader class could do that but apparently it cannot accept a Stream of any kind. It even seems to be unwanted to have that possibility.
Now I'm wondering if it's possible at all to use .NET's shiny new JSON parser to read from a stream when I don't know when data arrives and how much of it. Do I need to split the JSON object messages manually or can the already existing reader just stop when it has something and continue when the next thing is available? I mean, if it can do that on a predefined array of bytes, it could surely also do it with some waiting in between until more data is available. That just doesn't seem to be exposed in the public API. Or did I miss something?
The JsonDocument.ParseAsync(Stream) method cannot be used because it would read the stream to the end. That doesn't make sense for a network stream that stays open for a long time and just reads some data from time to time.

Related

Unable to parse JSON from StreamTransformer on unending stream

New to Dart. It seems that .transform(JsonDecoder()) will hang until the stream is closed or throw an error if it starts to see a new Json object. I could cache the entire strings and parse them that way, but I would like to take advantage of the stream an not store more than is needed in memory.
Is there a way to get the JsonDecoder to push an object to the sink as soon as it gets a complete valid Json Object? I've tried extending some of the internal classes, but only got a private library error.
https://github.com/dart-lang/sdk/blob/1278bd5adb6a857580f137e47bc521976222f7b9/sdk/lib/_internal/vm/lib/convert_patch.dart#L1500 . This seems to be the relevant code and it's really a pain in my butt. Would I need to create a dummy stream or something?
If the input is newline separated, you can do:
Stream jsonObjects = inputStream
.transform(utf8.decoder) // if incoming is bytes.
.transform(const LineSplitter())
.map(jsonDecode);
The JsonDecoder converter only works on a single JSON value, because the JSON grammar doesn't allow more than one value in a JSON source text.
The LineSplitter will buffer until it has an entire line, then emit one line at a time, so if each JSON message is on a line by itself, that makes each event from the line-splitted stream a complete JSON value.

serialize JSON string of infinite size

Okay this one has me scratching my head.
I have an API returning a response body that is a JSON string.
The the data is such that it will continue to grow every week. That is I have a single JSON response that is going to potentially grow forever until it either: 1) consumes the world as we know it - or 2) someone on the API side learns about pagination.
So how do I serialize this if I don't know the size?
I'm thinking it has to be done in chunks, as I have already figured out how to chunk the response to file, but how do I serialize in chunks?
This is really a concept problem so responses in any language will work as long as they can get at the heart of the problem.
I've so far tried stream reading character by character in C# to test for JSON artifacts ({},[] -etc) but the performance was glacial. About 20min to read 1000kb.
I feel like i'm reinventing the wheel here...
EDIT: In Response to #stdunbar 's link.
I've just tried using the solution described in the link with JSON.Net's JsonReader class. A couple problems.
1). There is never any info on JSON.NET's documentation about what a "JSON token" is. Is it the node name and the node value? Is it just the node name?
2). If my data set is potentially infinite, that means my node values are potentially infinite. For example there is a node in my JSON response containing a list. That list grows as the data does. So if I load that list using JsonReader it might just be 1TB some day and wreck my server.

Send custom property with value as JSON array

We want to send some events to Application Insights with data showing which features a user owns, and are available for the session. These are variable, and the list of items will probably grow/change as we continue deploying updates. Currently we do this by building a list of properties dynamically at start-up, with values of Available/True.
Since AI fromats each event data as JSON, we thought it would be interesting to send through custom data as JSON so it can be processed in a similar fashion. Having tried to send data as JSON though, we bumped into an issue where AI seems to send through escape characters in the strings:
Eg. if we send a property through with JSON like:
{"Property":[{"Value1"},..]}
It gets saved in AI as:
{\"Property\":[{\"Value1\"},..]} ).
Has anyone successfully sent custom JSON to AI, or is the platform specifically trying to safeguard against such usage? In our case, where we parse the data out in Power BI, it would simplify and speed up a lot some queries by being able to send a JSON array.
AI treats custom properties as strings, you'd have to stringify any json you want to send (and keep it under the length limit for custom property sizes), and then re-parse it on the other side.

Logging Multiple JSON Objects to a Single File - File Format

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.
With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.

Stream objects from MongoDB cursor into nodejs HTTP response

NOTE: I don't believe this question is a duplicate of this similar question because it is more specific.
I'm attempting to retrieve multiple objects from Mongo with the nodejs-mongodb-driver and write the objects to an HTTP response as JSON. The objects should be in the form of an array, but i don't want to call toArray() on the cursor because of the memory overhead and I try to avoid large JSON.stringify calls whenever possible.
var response = ... // an http response
collection.find().stream(JSON.stringify).pipe(response); // causes a malformed JSON string
The object in the browser appears as follow.
{"obj", "obj"}{"obj", "obj"} // clearly malformed
Is there an efficient way to do this?
I will explain the code you wrote so that you understand why it returns malformed JSON and why you probably need toArray() or the JSONStream libary from the answer you posted.
First collection.find() returns a Cursor object. At that point no data was read. Then, the .stream(JSON.stringify) call returns a readable Stream with the transformation function JSON.stringify. Still no data read.
The .pipe(response) call then reads the entire Stream to the end and for every object it calls the JSON.stringify function. Note that it does really call it for every single object seperately and therefore does not create an array. Instead you get your malformed JSON, object after object.
Now the answer in the question you posted as possible duplicate (Stream from a mongodb cursor to Express response in node.js) would work for you, but it requires an additional libary with a JSONStream. The JSONStream properly handles the CursorStream for JSON output. I don't know if that really reduces the overhead though, but you could try that.
Without an addition libary you will have to use toArray().