Reading Large JSON file into variable in C#.net - json

I am trying to parse the JSON files and insert into the SQL DB.My parser worked perfectly fine as long as the files are small (less than 5 MB).
I am getting "Out of memory exception" when trying to read the large(> 5MB) files.
if (System.IO.Directory.Exists(jsonFilePath))
{
string[] files = System.IO.Directory.GetFiles(jsonFilePath);
foreach (string s in files)
{
var jsonString = File.ReadAllText(s);
fileName = System.IO.Path.GetFileName(s);
ParseJSON(jsonString, fileName);
}
}
I tried the JSONReader approach, but no luck on getting the entire JSON into string or variable.Please advise.

Use 64 bit, check RredCat's answer on a similar question:
Newtonsoft.Json - Out of memory exception while deserializing big object
NewtonSoft Jason Performance Tips
Read the article by David Cox about tokenizing:
"The basic approach is to use a JsonTextReader object, which is part of the Json.NET library. A JsonTextReader reads a JSON file one token at a time. It, therefore, avoids the overhead of reading the entire file into a string. As tokens are read from the file, objects are created and pushed onto and off of a stack. When the end of the file is reached, the top of the stack contains one object — the top of a very big tree of objects corresponding to the objects in the original JSON file"
Parsing Big Records with Json.NET

The json file is too large to fit in memory, in any form.
You must use a JSON reader that accepts a filename or stream as input. It's not clear from your question which JSON Reader you are using. From which library?
If your JSON reader builds the whole JSON tree, you will still run out of memory. As you read the JSON file, either cherry pick the data you are looking for, or write data structures to another on-disk format that can be easily queried, for example, an sqlite database.

Related

Utf8JsonReader from a Stream

I'm trying to read a sequence of JSON objects from a network stream. This involves finding complete JSON objects and returning them one by one. As soon as a complete JSON object was received, I need to have it. Anything else that follows that JSON object is for the next object and must only be used when the next complete object was received.
I would have thought that the Utf8JsonReader class could do that but apparently it cannot accept a Stream of any kind. It even seems to be unwanted to have that possibility.
Now I'm wondering if it's possible at all to use .NET's shiny new JSON parser to read from a stream when I don't know when data arrives and how much of it. Do I need to split the JSON object messages manually or can the already existing reader just stop when it has something and continue when the next thing is available? I mean, if it can do that on a predefined array of bytes, it could surely also do it with some waiting in between until more data is available. That just doesn't seem to be exposed in the public API. Or did I miss something?
The JsonDocument.ParseAsync(Stream) method cannot be used because it would read the stream to the end. That doesn't make sense for a network stream that stays open for a long time and just reads some data from time to time.

Unable to parse JSON from StreamTransformer on unending stream

New to Dart. It seems that .transform(JsonDecoder()) will hang until the stream is closed or throw an error if it starts to see a new Json object. I could cache the entire strings and parse them that way, but I would like to take advantage of the stream an not store more than is needed in memory.
Is there a way to get the JsonDecoder to push an object to the sink as soon as it gets a complete valid Json Object? I've tried extending some of the internal classes, but only got a private library error.
https://github.com/dart-lang/sdk/blob/1278bd5adb6a857580f137e47bc521976222f7b9/sdk/lib/_internal/vm/lib/convert_patch.dart#L1500 . This seems to be the relevant code and it's really a pain in my butt. Would I need to create a dummy stream or something?
If the input is newline separated, you can do:
Stream jsonObjects = inputStream
.transform(utf8.decoder) // if incoming is bytes.
.transform(const LineSplitter())
.map(jsonDecode);
The JsonDecoder converter only works on a single JSON value, because the JSON grammar doesn't allow more than one value in a JSON source text.
The LineSplitter will buffer until it has an entire line, then emit one line at a time, so if each JSON message is on a line by itself, that makes each event from the line-splitted stream a complete JSON value.

Parsing JSON in VBA, without any external library

I'm really at my wit's end here... I'm using VB-JSON Parser (http://www.ediy.co.nz/vbjson-json-parser-library-in-vb6-xidc55680.html) and I have the following array :
[{"timestamp":1410001952,"tid":2834225,"price":"483.77"}]
The documentation is really minimal and I have no clue whatsoever of how to access the array, been searching for several hours now on how to resolve this.
How can I get the "price" value? I know that i can use .item("price") when there is no array but I don't know what to do when there's an array and there is no name before it.
First have a look at Parsing JSON in Excel VBA
It explains the JScript way of parsing JSON string.
Browsing through the net, I found it really hard to get a complete VBA based JSON parser.
Some options are available in the VB version and then there are few online parsers who promise to parse JSON and convert them in Excel. These ones work fine with simple JSON data structure. But once you feed in a complex data set with nested arrays and structures, they simply fail.
Using JavaScript features of parsing JSON, on top of ScriptControl, we can create a parser in VBA which will list each and every data point inside the JSON. No matter how nested or complex the data structure is, as long as we provide a valid JSON, this parser will return a complete tree structure.
JavaScript’s Eval, getKeys and getProperty methods provide building blocks for validating and reading JSON.
Coupled with a recursive function in VBA we can iterate through all the keys (up to nth level) in a JSON string. Then using a Tree control (used in this article) or a dictionary or even on a simple worksheet, we can arrange the JSON data as required.
Here, you can find a complete VBA example.
There is a JSON serializer in .NET: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.json

Javascript in place of json input step

I am loading data from a mongodb collection to a mysql table through Kettle transformation.
First I extract them using MongodbInput and then I use json input step.
But since json input step has very low performance, I wanted to replace it with a
javacript script.
I am a beginner in Javascript and even though i tried somethings, the kettle javascript script is not recognizing any keywords.
can anyone give me sample code to convert Json data to different columns using javascript?
To solve your problem you need to see three aspects:
Reading from MongoDB
Reading from JSON
Reading from (probably) String
Reading from MongoDB Except if you changed the interface, MongoDB returns not JSON but BSON files (~binary JSON). You need to see the MongoDB documentation about reading and writing BSON: probably something like BSON.to() and BSON.from() but I don't know it by heart.
Reading from JSON Once you have your BSON in JSON format, you can read it using JSON.stringify() which returns a String.
Reading from (probably) String If you want to use the capabilities of JSON (why else would you use JSON?), you also want to use JSON.parse() which returns a JSON object.
My experience is that to send a JSON object from one step to the other, using a String is not a bad idea, i.e. at the end of a JavaScript step, you write your JSON object to a String and at the beginning of the next JavaScript step (can be further down the stream) you parse it back to JSON to work with it.
I hope this answers your question.
PS: writing JavaScript steps requires you to learn JavaScript. You don't have to be a master, but the basics are required. There is no way around it.
you could use the json input step to get the values of this json and put in common rows

Logging Multiple JSON Objects to a Single File - File Format

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.
With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.