Unable to parse JSON from StreamTransformer on unending stream - json

New to Dart. It seems that .transform(JsonDecoder()) will hang until the stream is closed or throw an error if it starts to see a new Json object. I could cache the entire strings and parse them that way, but I would like to take advantage of the stream an not store more than is needed in memory.
Is there a way to get the JsonDecoder to push an object to the sink as soon as it gets a complete valid Json Object? I've tried extending some of the internal classes, but only got a private library error.
https://github.com/dart-lang/sdk/blob/1278bd5adb6a857580f137e47bc521976222f7b9/sdk/lib/_internal/vm/lib/convert_patch.dart#L1500 . This seems to be the relevant code and it's really a pain in my butt. Would I need to create a dummy stream or something?

If the input is newline separated, you can do:
Stream jsonObjects = inputStream
.transform(utf8.decoder) // if incoming is bytes.
.transform(const LineSplitter())
.map(jsonDecode);
The JsonDecoder converter only works on a single JSON value, because the JSON grammar doesn't allow more than one value in a JSON source text.
The LineSplitter will buffer until it has an entire line, then emit one line at a time, so if each JSON message is on a line by itself, that makes each event from the line-splitted stream a complete JSON value.

Related

Utf8JsonReader from a Stream

I'm trying to read a sequence of JSON objects from a network stream. This involves finding complete JSON objects and returning them one by one. As soon as a complete JSON object was received, I need to have it. Anything else that follows that JSON object is for the next object and must only be used when the next complete object was received.
I would have thought that the Utf8JsonReader class could do that but apparently it cannot accept a Stream of any kind. It even seems to be unwanted to have that possibility.
Now I'm wondering if it's possible at all to use .NET's shiny new JSON parser to read from a stream when I don't know when data arrives and how much of it. Do I need to split the JSON object messages manually or can the already existing reader just stop when it has something and continue when the next thing is available? I mean, if it can do that on a predefined array of bytes, it could surely also do it with some waiting in between until more data is available. That just doesn't seem to be exposed in the public API. Or did I miss something?
The JsonDocument.ParseAsync(Stream) method cannot be used because it would read the stream to the end. That doesn't make sense for a network stream that stays open for a long time and just reads some data from time to time.

Reading Large JSON file into variable in C#.net

I am trying to parse the JSON files and insert into the SQL DB.My parser worked perfectly fine as long as the files are small (less than 5 MB).
I am getting "Out of memory exception" when trying to read the large(> 5MB) files.
if (System.IO.Directory.Exists(jsonFilePath))
{
string[] files = System.IO.Directory.GetFiles(jsonFilePath);
foreach (string s in files)
{
var jsonString = File.ReadAllText(s);
fileName = System.IO.Path.GetFileName(s);
ParseJSON(jsonString, fileName);
}
}
I tried the JSONReader approach, but no luck on getting the entire JSON into string or variable.Please advise.
Use 64 bit, check RredCat's answer on a similar question:
Newtonsoft.Json - Out of memory exception while deserializing big object
NewtonSoft Jason Performance Tips
Read the article by David Cox about tokenizing:
"The basic approach is to use a JsonTextReader object, which is part of the Json.NET library. A JsonTextReader reads a JSON file one token at a time. It, therefore, avoids the overhead of reading the entire file into a string. As tokens are read from the file, objects are created and pushed onto and off of a stack. When the end of the file is reached, the top of the stack contains one object — the top of a very big tree of objects corresponding to the objects in the original JSON file"
Parsing Big Records with Json.NET
The json file is too large to fit in memory, in any form.
You must use a JSON reader that accepts a filename or stream as input. It's not clear from your question which JSON Reader you are using. From which library?
If your JSON reader builds the whole JSON tree, you will still run out of memory. As you read the JSON file, either cherry pick the data you are looking for, or write data structures to another on-disk format that can be easily queried, for example, an sqlite database.

Logging Multiple JSON Objects to a Single File - File Format

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.
With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.

AS3 URLVariables Unescaping Data

I have a PHP file that is queried for information, and it passes a couple of variables back. One variable contains a JSON string with a variable in the object called message, which comes escaped to prevent it from causing issues if the message has an ampersand, single quote, etc in it.
&data={"message":"star%27s"}
Obviously the data sent is more complicated, this is just an example. After I take the data passed back by the PHP file and use URLVariables to decode it and access the "data" variable, it ends up looking like:
{"message":"star's"}
At this point I can't parse the JSON string, it will throw an error because of the single quote. Encoding it wouldn't work, it would encode more than just everything after the colon.
Is there a way to keep it from converting it? I was thinking I could manually parse the PHP returned string, but it seems unnecessary and I don't want risk running into issues later on because of it. I looked at the AS3 API and I couldn't find anything documenting this or how to disable it.
Any ideas or suggestions?
You try with
Actionscript API escape() and unescape() see for more details Escape and unescape
Also look at JSON.parse and JSON.stringify working-with-native-json-in-flash-player-11
JSON decode in actionscript see decode-json

Stream objects from MongoDB cursor into nodejs HTTP response

NOTE: I don't believe this question is a duplicate of this similar question because it is more specific.
I'm attempting to retrieve multiple objects from Mongo with the nodejs-mongodb-driver and write the objects to an HTTP response as JSON. The objects should be in the form of an array, but i don't want to call toArray() on the cursor because of the memory overhead and I try to avoid large JSON.stringify calls whenever possible.
var response = ... // an http response
collection.find().stream(JSON.stringify).pipe(response); // causes a malformed JSON string
The object in the browser appears as follow.
{"obj", "obj"}{"obj", "obj"} // clearly malformed
Is there an efficient way to do this?
I will explain the code you wrote so that you understand why it returns malformed JSON and why you probably need toArray() or the JSONStream libary from the answer you posted.
First collection.find() returns a Cursor object. At that point no data was read. Then, the .stream(JSON.stringify) call returns a readable Stream with the transformation function JSON.stringify. Still no data read.
The .pipe(response) call then reads the entire Stream to the end and for every object it calls the JSON.stringify function. Note that it does really call it for every single object seperately and therefore does not create an array. Instead you get your malformed JSON, object after object.
Now the answer in the question you posted as possible duplicate (Stream from a mongodb cursor to Express response in node.js) would work for you, but it requires an additional libary with a JSONStream. The JSONStream properly handles the CursorStream for JSON output. I don't know if that really reduces the overhead though, but you could try that.
Without an addition libary you will have to use toArray().