Logging Multiple JSON Objects to a Single File - File Format - json

I have a solution where I need to be able to log multiple JSON Objects to a file. Essentially doing one log file per day. What is the easiest way to write (and later read) these from a single file?
How does MongoDB handle this with BSON? What does it use as a separator between "records"?
Does Protocol Buffers, BSON, MessagePack, etc... offer compression and the record concept? Compression would be a nice benefit.

With protocol buffers you could define the message as follows:
Message JSONObject {
required string JSON = 1;
}
Message DailyJSONLog {
repeated JSONObject JSON = 1;
}
This way you would just read the file from memory and deserialize it. Its essentially the same way for serializing them as well. Once you have the file (serialized DailyJSONLog) on disk, you can easily just append serialized JSONObjects to the end of that file (since the DailyJSONLog message is very simply a repeated field).
The only issue with this is if you have a LOT of messages each day or if you want to start at a certain location during the day (you're not able to easily get to the middle (or arbitrary) of the repeated list).
I've gotten around this by taking a JSONObject, serializing it and then base64 encoding it. I'd store these to a file separating by a new line. This allows you to very easily see how many records are in each file, gain access to any arbitrary JSON object within the file and to trivially keep expanding the file (you can expand the above 'repeated' message as well pretty trivially but it is a one way easy operation...)
Compression is a different topic. Protocol Buffers will not compress strings. If you were to define a pb message to match your JSON message, then you will get the benefit of having pb possibly 'compress' any integers into their [varint][1] encoded format. You will get 'less' compression if you try above base64 encoding route as well.

Related

Utf8JsonReader from a Stream

I'm trying to read a sequence of JSON objects from a network stream. This involves finding complete JSON objects and returning them one by one. As soon as a complete JSON object was received, I need to have it. Anything else that follows that JSON object is for the next object and must only be used when the next complete object was received.
I would have thought that the Utf8JsonReader class could do that but apparently it cannot accept a Stream of any kind. It even seems to be unwanted to have that possibility.
Now I'm wondering if it's possible at all to use .NET's shiny new JSON parser to read from a stream when I don't know when data arrives and how much of it. Do I need to split the JSON object messages manually or can the already existing reader just stop when it has something and continue when the next thing is available? I mean, if it can do that on a predefined array of bytes, it could surely also do it with some waiting in between until more data is available. That just doesn't seem to be exposed in the public API. Or did I miss something?
The JsonDocument.ParseAsync(Stream) method cannot be used because it would read the stream to the end. That doesn't make sense for a network stream that stays open for a long time and just reads some data from time to time.

How to read a CSV from a string?

I have some data in a csv format. However they are already a string, since i have got them from an HTTP request.
I would like to use Data Frames, in order to view the data.
However i don't know how to parse it, because the CSV package only accepts files, not Strings.
One solution would be to write the content of the String into a file, and then to read it out again. But there has to be a better way!
Use IOBuffer(your_string):
CSV.read(IOBuffer(your_string), DataFrame)

Reading Large JSON file into variable in C#.net

I am trying to parse the JSON files and insert into the SQL DB.My parser worked perfectly fine as long as the files are small (less than 5 MB).
I am getting "Out of memory exception" when trying to read the large(> 5MB) files.
if (System.IO.Directory.Exists(jsonFilePath))
{
string[] files = System.IO.Directory.GetFiles(jsonFilePath);
foreach (string s in files)
{
var jsonString = File.ReadAllText(s);
fileName = System.IO.Path.GetFileName(s);
ParseJSON(jsonString, fileName);
}
}
I tried the JSONReader approach, but no luck on getting the entire JSON into string or variable.Please advise.
Use 64 bit, check RredCat's answer on a similar question:
Newtonsoft.Json - Out of memory exception while deserializing big object
NewtonSoft Jason Performance Tips
Read the article by David Cox about tokenizing:
"The basic approach is to use a JsonTextReader object, which is part of the Json.NET library. A JsonTextReader reads a JSON file one token at a time. It, therefore, avoids the overhead of reading the entire file into a string. As tokens are read from the file, objects are created and pushed onto and off of a stack. When the end of the file is reached, the top of the stack contains one object — the top of a very big tree of objects corresponding to the objects in the original JSON file"
Parsing Big Records with Json.NET
The json file is too large to fit in memory, in any form.
You must use a JSON reader that accepts a filename or stream as input. It's not clear from your question which JSON Reader you are using. From which library?
If your JSON reader builds the whole JSON tree, you will still run out of memory. As you read the JSON file, either cherry pick the data you are looking for, or write data structures to another on-disk format that can be easily queried, for example, an sqlite database.

Is using multipart/form-data any better then JSON + Base64?

I have a server and I need to upload files along with some fields from the client to the server. I have currently been using standard multipart/form-data.
I have found however that using multipart/form-data is not ideal. Objects on my server may have other objects nested within them, and thus are represented as a JSON object with other JSON objects embedded within.
I would like for the client to start making POST/PUT requests using a JSON representation exactly like it would expect in a GET request to the server, in a REST-ful manner. This way I don't have to flatten the fields which might be nested a couple layers within the JSON object in order to use multipart/form-data.
Problem is, JSON doesn't represent binary data. Multipart/form-data doesn't seem to have a way to represent fields nested within the values of other fields. But it does have much better handling of file-uploads.
I am at a loss for how to design this. Should I just have the client upload JSON with the fields encoded in base64, and take the 25% hit? Or should I have the JSON object being represented as some sort of "json" variable in a Multipart/form-data request, and have the binary files to be uploaded as another variable?
Should I just have the client upload JSON with the fields encoded in
base64, and take the 25% hit?
The hit will be 33% since 4/3=1.33.
Or should I have the JSON object being represented as some sort of
"json" variable in a Multipart/form-data request, and have the binary
files to be uploaded as another variable?
This should work.
You might also consider this approach: send all files using multipart, then get some identificators of files as a response. Put this identificators in your json and send it anyway you like. This approach might be beneficial if you have many scenarios in which you send files: you might always send them to the server with the same request, then get their identificators; after that do with them what you like.

Send PDF as byte[] / JSON problem

I am trying to send a generated PDF file (Apache FOP) to the client. I know this can be done by writing the array to the response stream and by setting the correct content type, length and so on in the servlet. My problem is that the whole app was built based on the idea that it will only receive/send JSON. In the servlet's service() method, I have this:
response.setContentType("application/json");
reqBroker.process(request, response);
RequestBroker is the class who processes the JSON (jackson processor), everything is generic and I cannot change it. On top of this, I have to receive the JSON from the request correctly, to access the data and generate my pdf. So those two lines are necessary. But when I send the response, I need to have another content type so that the pdf is displayed correctly in the browser.
So far, I am able to send the byte array as part of the JSON, but then I don't know how to display the array as PDF on the client (if smth like this is even possible).
I would like some suggestions on how can I send my pdf and set the right header, without messing with the JSON. Thanks.
JSON and byte arrays don't mix.
Instead, you should create an <iframe> and point it to a URL that returns a raw PDF.
Take a look here:How to send pdf in json, it lists couple of approaches that you can consider. The easiest way is to convert the binary data into string by using Base64 compression. In C#, this would mean a call to Convert.FromBase64String. However this has space overhead as Base64 compression means around +33% more memory. If you can get away with it, this is the least complicated solution. in case additional size is an issue you can think about zipping it up.