How to parse "streamed" json objects with json4s? - json

I have a streaming source that produces many JSON objects without separators (or only whitespace in between). If I pass that to json4s parse function, it only produces AST for the first object.
As a workaround, I could parse it manually and either turn it into a JSON array by adding brackets and commas as appropriate or chunk it and call parse on each chunk.
However, this is a rather common format, so I'm sure the problem is already solved. I just cannot find the API for it in json4s documentation.

If you reading it from an InputStream, then use BufferedInputStream wrapper with mark(), read() and reset() calls to skip whitespace(s) between parse() call:
val in = new BufferedInputStream(new FileInputStream("/tmp/your.json"))
try {
var continue = true
in.mark(1)
do {
in.reset()
// <-- here should be call for parse
// skip white spaces or exit if EOF found
var b = 0
do {
in.mark(1)
b = in.read()
if (b < 0) continue = false
} while (Character.isWhitespace(b))
} while (continue)
} finally in.close()
EDIT: Today I have released 0.11.0 version of jsoniter-scala with new ability to parse streaming JSON values or JSON arrays w/o need to hold all values in memory.

Related

Why does one form file iteration work but the other throws % exception? (working with JSON parse in Google-apps-script)

I was trying to use the method found here (see most up-voted answer):
Google Apps Script Fastest way to find a row?
I currently use this while it does work I wanted to try the above linked method yet when I replace the below code
function AutoPopulate (evalue)
{
//uses google drive file irretator reads in JSON file and parses it to a Javascript object that we can work with
var iter = DriveApp.getFilesByName("units.json");
// iterate through all the files named units.json
while (iter.hasNext()) {
// define a File object variable and set the Media Tyep
var file = iter.next();
var jsonFile = file.getBlob().getDataAsString();
// log the contents of the file
//Logger.log(jsonFile);
}
var UnitDatabase = JSON.parse(jsonFile);
//Logger.log(UnitDatabase);
//Logger.log(UnitDatabase[1027]);
return UnitDatabase[evalue];
}
WITH THIS CODE:
function AutoPopulate (evalue)
{
//this method did not work for me but should have according to stackflow answer linked above I am trying to understand why or how I can find out why it may have thrown an error
var jsonFile = DriveApp.getFilesByName("units.json").next(),
UnitDatabase = UnitDatabase.getBlob().getDataAsString();
return UnitDatabase[evalue];
}
I get an error in the excecution indicating that there is a % at postion 0 in the JSON, between the methods I dont alter the JSON file in anyway so I dont understand why does the top method work but the bottom one does not?
For further information the idea behind the code is that I have a list of Unit numbers and model numbers that are in a spreadsheet. I then convert this to a JSON file, this however is only done when a new unit is added to the fleet. As I learned one can parse a whole JSON file into a javascript object which makes working with the data set much faster. This javascript object is used so that when a user enters a UNIT# the MODEL# is auto populated based on the JSON file.
I cannot share the JSON file as it contains client information.
Your code does not work for two reasons:
You have a typo in the line UnitDatabase = UnitDatabase.getBlob()... - it should be UnitDatabase = jsonFile.getBlob()...
If you want to retrieve a nested object from a json file - you need to parse the JSOn - otherwise it is considered a string and you can not access the nested structure
Modified working code:
function AutoPopulate2 (evalue)
{
var jsonFile = DriveApp.getFilesByName("units.json").next();
var UnitDatabase = JSON.parse(jsonFile.getBlob().getDataAsString());
return UnitDatabase[evalue];
}
Mind that this code will only work if you have a "units.json" file on your drive and if evalue is a valid 1st-level nested object of this json.

JSON.parse and JSON.stringify are not idempotent and that is bad

This question is multipart-
(1a) JSON is fundamental to JavaScript, so why is there no JSON type? A JSON type would be a string that is formatted as JSON. It would be marked as parsed/stringified until the data was altered. As soon as the data was altered it would not be marked as JSON and would need to be re-parsed/re-stringified.
(1b) In some software systems, isn't it possible to (accidentally) attempt to send a plain JS object over the network instead of a serialized JS object? Why not make an attempt to avoid that?
(1c) Why can't we call JSON.parse on a straight up JavaScript object without stringifying it first?
var json = { //JS object in properJSON format
"baz":{
"1":1,
"2":true,
"3":{}
}
};
var json0 = JSON.parse(json); //will throw a parse error...bad...it should not throw an error if json var is actually proper JSON.
So we have no choice but to do this:
var json0= JSON.parse(JSON.stringify(json));
However, there are some inconsistencies, for example:
JSON.parse(true); //works
JSON.parse(null); //works
JSON.parse({}); //throws error
(2) If we keep calling JSON.parse on the same object, eventually it will throw an error. For example:
var json = { //same object as above
"baz":{
"1":1,
"2":true,
"3":{}
}
};
var json1 = JSON.parse(JSON.stringify(json));
var json2 = JSON.parse(json1); //throws an error...why
(3) Why does JSON.stringify infinitely add more and more slashes to the input? It is not only hard to read the result for debugging, but it actually puts you in dangerous state because one JSON.parse call won't give you back a plain JS object, you have to call JSON.parse several times to get back the plain JS object. This is bad and means it is quite dangerous to call JSON.stringify more than once on a given JS object.
var json = {
"baz":{
"1":1,
"2":true,
"3":{}
}
};
var json2 = JSON.stringify(json);
console.log(json2);
var json3 = JSON.stringify(json2);
console.log(json3);
var json4 = JSON.stringify(json3);
console.log(json4);
var json5 = JSON.stringify(json4);
console.log(json5);
(4) What is the name for a function that we should be able to call over and over without changing the result (IMO how JSON.parse and JSON.stringify should behave)? The best term for this seems to be "idempotent" as you can see in the comments.
(5) Considering JSON is a serialization format that can be used for networked objects, it seems totally insane that you can't call JSON.parse or JSON.stringify twice or even once in some cases without incurring some problems. Why is this the case?
If you are someone who is inventing the next serialization format for Java, JavaScript or whatever language, please consider this problem.
IMO there should be two states for a given object. A serialized state and a deserialized state. In software languages with stronger type systems, this isn't usually a problem. But with JSON in JavaScript, if call JSON.parse twice on the same object, we run into fatal exceptions. Likewise, if we call JSON.stringify twice on the same object, we can get into an unrecoverable state. Like I said there should be two states and two states only, plain JS object and serialized JS object.
1) JSON.parse expects a string, you are feeding it a Javascript object.
2) Similar issue to the first one. You feed a string to a function that needs an object.
3) Stringfy actually expects a string, but you are feeding it a String object. Therefore, it applies the same measures to escape the quotes and slashes as it would for the first string. So that the language can understand the quotes, other special characters inside the string.
4) You can write your own function for this.
5) Because you are trying to do a conversion that is illegal. This is related to the first and second question. As long as the correct object types are fed, you can call it as many times as you want. The only problem is the extra slashes but it is in fact the standard.
We'll start with this nightmare of your creation: string input and integer output.
IJSON.parse(IJSON.stringify("5")); //=> 5
The built-in JSON functions would not fail us this way: string input and string output.
JSON.parse(JSON.stringify("5")); //=> "5"
JSON must preserve your original data types
Think of JSON.stringify as a function that wraps your data up in a box, and JSON.parse as the function that takes it out of a box.
Consider the following:
var a = JSON.stringify;
var b = JSON.parse;
var data = "whatever";
b(a(data)) === data; // true
b(b(a(a(data)))) === data; // true
b(b(b(a(a(a(data)))))) === data; // true
That is, if we put the data in 3 boxes, we have to take it out of 3 boxes. Right?
If I put my data in 2 boxes and take it out of 1, I'm not holding my data yet, I'm holding a box that contains my data. Right?
b(a(a(data))) === data; // false
Seems sane to me...
JSON.parse unboxes your data. If it is not boxed, it cannot unbox it. JSON.parse expects a string input and you're giving it a JavaScript object literal
The first valid call to JSON.parse would return an object. Calling JSON.parse again on this object output would result in the same failure as #1
repeated calls to JSON.stringify will "box" our data multiple times. So of course you have to use repeated calls to JSON.parse then to get your data out of each "box"
Idempotence
No, this is perfectly sane. You can't triple-stamp a double-stamp.
You'd never make a mistake like this, would you?
var json = IJSON.stringify("hi");
IJSON.parse(json);
//=> "hi"
OK, that's idempotent, but what about
var json = IJSON.stringify("5");
IJSON.parse(json);
//=> 5
UH OH! We gave it a string each time, but the second example returns an integer. The input data type has been lost!
Would the JSON functions have failed us here?
var json = JSON.stringify("hi");
JSON.parse(json);
//=> "hi"
All good. And what about the "5" ?
var json = JSON.stringify("5");
JSON.parse(json));
//=> "5"
Yay, the types have been preseved! JSON works, IJSON does not.
Maybe a more real-life example:
OK, so you have a busy app with a lot of developers working on it. It makes
reckless assumptions about the types of your underlying data. Let's say it's a chat app that makes several transformations on messages as they move from point to point.
Along the way you'll have:
IJSON.stringify
data moves across a network
IJSON.parse
Another IJSON.parse because who cares? It's idempotent, right?
String.prototype.toUpperCase — because this is a formatting choice
Let's see the messages
bob: 'hi'
// 1) '"hi"', 2) <network>, 3) "hi", 4) "hi", 5) "HI"
Bob's message looks fine. Let's see Alice's.
alice: '5'
// 1) '5'
// 2) <network>
// 3) 5
// 4) 5
// 5) Uncaught TypeError: message.toUpperCase is not a function
Oh no! The server just crashed. You'll notice it's not even the repeated calling of IJSON.parse that failed here. It would've failed even if you called it once.
Seems like you were doomed from the start... Damned reckless devs and their careless data handling!
It would fail if Alice used any input that happened to also be valid JSON
alice: '{"lol":"pwnd"}'
// 1) '{"lol":"pwnd"}'
// 2) <network>
// 3) {lol:"pwnd"}
// 4) {lol:"pwnd"}
// 5) Uncaught TypeError: message.toUpperCase is not a function
OK, unfair example maybe, right? You're thinking, "I'm not that reckless, I
wouldn't call IJSON.stringify or IJSON.parse on user input like that!"
It doesn't matter. You've fundamentally broken JSON because the original
types can no longer be extracted.
If I box up a string using IJSON, and then unbox it, who knows what I will get back? Certainly not you, and certainly not the developer using your reckless function.
"Will I get a string type back?"
"Will I get an integer?"
"Maybe I'll get an object?"
"Maybe I will get cake. I hope it's cake"
It's impossible to tell!
You're in a whole new world of pain because you've been careless with your data types from the start. Your types are important so start handling them with care.
JSON.stringify expects an object type and JSON.parse expects a string type.
Now do you see the light?
I'll try to give you one reason why JSON.parse cannot be called multiple time on the same data without us having a problem.
you might not know it but a JSON document does not have to be an object.
this is a valid JSON document:
"some text"
lets store the representation of this document inside a javascript variable:
var JSONDocumentAsString = '"some text"';
and work on it:
var JSONdocument = JSON.parse(JSONDocumentAsString);
JSONdocument === 'some text';
this will cause an error because this string is not the representation of a JSON document
JSON.parse(JSONdocument);
// SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
in this case how could have JSON.parse guessed that JSONdocument (being a string) was a JSON document and that it should have returned it untouched ?

cordova readAsText returns json string that can't be parsed to JSON object

I read my json file using http and cordova file readAsText functions.
http request returns an object which is ok.
cordova file readAsText function return 'string' which contain extra "r\n\" symbols. This make it impossible to use JSON.parse(evt.target.result)
function readJson(absPath, success, failed){
window.resolveLocalFileSystemURL(absPath, function (entry) {
entry.file(function (file) {
var reader = new FileReader();
reader.onloadend = function (evt) {
success(evt.target.result);
};
reader.readAsText(file);
}, failed);
}, failed);
}
readJson(cordova.file.dataDirectory + 'my.json', function(res){
console.log(JSON.parse(res)); //here I've got an parsing error due to presence of r\n\ symbols
}, failed );
How to read JSON files using cordova?
UPDATE:
funny thing that the following works:
a = '{\r\n"a":"1",\r\n"b":"2"\r\n}';
b = JSON.parse(a);
so the problem not only with \r\n... there is something else that is added by cordova readAsText
UPDATE2
as a workaround I use now var object = eval("(" + res + ")")
Still search for a common way to load json objects...
No one has answered this and I just had to solve it for my project, so I will post my solution.
The readAsText method outputs a string, so you CAN actually run a replace on it, but what you need to do is use a RegExp to find the newline character. here's my solution:
var sanitizerRegex = new RegExp(String.fromCharCode(10), 'g');
var sanitizedData = JSON.parse(result.replace(sanitizerRegex, ''));
I've used the String method fromCharCode to get the specific newline character and the "g" flag to match all instances in the entire string. The problem with your string solution is that you can't do a string replace using the characters for backslash and "n" because the issue is the actual new line character, which is merely represented as "\n".
I do not know the reason JSON.parse can't handle the newline character, or why the file plugin introduces this problem, but this solution seems to work for me.
Also, NEVER use eval like this if you can avoid it, especially on input from a source like a JSON file. Even in a cordova app, using eval is potentially very unsafe.
I found out the solution after debug deeply. readAsText function returned text has one more letter at the first position of text.
Example:
{"name":"John"} => ?{"name":"John"} (?: API didn't return ?, just one string)
I confirmed this with length of result, so we need to use substr(1) before parse JSON.
fileContent = fileContent.substr(1);
var jData = jQuery.parseJSON(fileContent);

Map to String/String to Map conversion in Groovy

I have a json object that gets passed into a save function as
{
"markings": {
"headMarkings": "Brindle",
"leftForeMarkings": "",
"rightForeMarkings": "sock",
"leftHindMarkings": "sock",
"rightHindMarkings": "",
"otherMarkings": ""
}
** EDIT **
The system parses it and passes it to my function as a mapping. I don't actually have the JSON, although it wouldn't be difficult to build up the JSON myself, it just seems like overkill
* END EDIT **
The toString() function ends up putting the results into the database as
"[rightForeMarkings:, otherMarkings:, leftForeMarkings:sock, leftHindMarkings:sock, rightHindMarkings:, headMarkings:brindle]"
I then want to save that as a string (fairly easy) by calling
params.markings.toString()
From here, I save the info and return the updated information.
My issue is that since I am storing the object in the DB as a string, I can't seem to get the markings back out as a map (to then be converted to JSON).
I have tried a few different things to no avail, although it is completely possible that I went about something incorrectlywith these...
Eval.me(Item.markings)
evaluate(Item.markings)
Item.markings.toList()
Thanks in advance for the help!
Throwing my tests.
Using JSON converters in Grails, I think this should be the approach: (synonymous to #JamesKleeh and #GrailsGuy)
def json = '''{
"markings": {
"headMarkings": "Brindle",
"leftForeMarkings": "",
"rightForeMarkings": "sock",
"leftHindMarkings": "sock",
"rightHindMarkings": "",
"otherMarkings": ""
}
}'''
def jsonObj = grails.converters.JSON.parse(json)
//This is your JSON object that should be passed in to the method
print jsonObj //[markings:[rightForeMarkings:sock, otherMarkings:, leftForeMarkings:, leftHindMarkings:sock, rightHindMarkings:, headMarkings:Brindle]]
def jsonStr = jsonObj.toString()
//This is the string which should be persisted in db
assert jsonStr == '{"markings":{"rightForeMarkings":"sock","otherMarkings":"","leftForeMarkings":"","leftHindMarkings":"sock","rightHindMarkings":"","headMarkings":"Brindle"}}'
//Get back json obj from json str
def getBackJsobObj = grails.converters.JSON.parse(jsonStr)
assert getBackJsobObj.markings.leftHindMarkings == 'sock'
If I understand correctly, you want to convert a String to a JSON object? You can actually bypass converting it to a map, and parse it directly as a JSON object:
import grails.converters.JSON
def json = JSON.parse(Item.markings)
This will give you your entire JSON object, and then you can just reference the values as you would a map.
Edit #2:
So apparently there is no "safe" way to convert that string back to a map without something custom. I would recommend saving the structure in the database as it originally comes in. If you can do that, then all you would need is JSON.parse()

how to parse a large, Newline-delimited JSON file by JSONStream module in node.js?

I have a large json file, its is Newline-delimited JSON, where multiple standard JSON objects are delimited by extra newlines, e.g.
{'name':'1','age':5}
{'name':'2','age':3}
{'name':'3','age':6}
I am now using JSONStream in node.js to parse a large json file, the reason I use JSONStream is because it is based on stream.
However,both parse syntax in the example can't help me to parse this json file with separated JSON in each line
var parser = JSONStream.parse(**['rows', true]**);
var parser = JSONStream.parse([**/./**]);
Can someone help me with that
Warning: Since this answer was written, the author of the JSONStream library removed the emit root event functionality, apparently to fix a memory leak.
Future users of this library, you can use the 0.x.x versions if you need the emit root functionality.
Below is the unmodified original answer:
From the readme:
JSONStream.parse(path)
path should be an array of property names, RegExps, booleans, and/or functions. Any object that matches the path will be emitted as 'data'.
A 'root' event is emitted when all data has been received. The 'root' event passes the root object & the count of matched objects.
In your case, since you want to get back the JSON objects as opposed to specific properties, you will be using the 'root' event and you don't need to specify a path.
Your code might look something like this:
var fs = require('fs'),
JSONStream = require('JSONStream');
var stream = fs.createReadStream('data.json', {encoding: 'utf8'}),
parser = JSONStream.parse();
stream.pipe(parser);
parser.on('root', function (obj) {
console.log(obj); // whatever you will do with each JSON object
});
JSONstream is intended for parsing a single huge JSON object, not many JSON objects. You want to split the stream at newlines, then parse them as JSON.
The NPM package split claims to do this splitting, and even has a feature to parse the JSON lines for you.
If your file is not enough large here is an easy, but not performant solution:
const fs = require('fs');
let rawdata = fs.readFileSync('fileName.json');
let convertedData = String(rawdata)
.replace(/\n/gi, ',')
.slice(0, -1);
let JsonData= JSON.parse(`[${convertedData}]`);
I created a package #jsonlines/core which parses jsonlines as object stream.
You can try the following code:
npm install #jsonlines/core
const fs = require("fs");
const { parse } = require("#jsonlines/core");
// create a duplex stream which parse input as lines of json
const parseStream = parse();
// read from the file and pipe into the parseStream
fs.createReadStream(yourLargeJsonLinesFilePath).pipe(parseStream);
// consume the parsed objects by listening to data event
parseStream.on("data", (value) => {
console.log(value);
});
Note that parseStream is a standard node duplex stream.
So you can also use for await ... of or other ways to consume it.
Here's another solution for when the file is small enough to fit into memory. It reads the whole file in one go, converts it into an array by splitting it at the newlines (removing the blank line at the end), and then parses each line.
import fs from "fs";
const parsed = fs
.readFileSync(`data.jsonl`, `utf8`)
.split(`\n`)
.slice(0, -1)
.map(JSON.parse)