how to parse a large, Newline-delimited JSON file by JSONStream module in node.js? - json

I have a large json file, its is Newline-delimited JSON, where multiple standard JSON objects are delimited by extra newlines, e.g.
{'name':'1','age':5}
{'name':'2','age':3}
{'name':'3','age':6}
I am now using JSONStream in node.js to parse a large json file, the reason I use JSONStream is because it is based on stream.
However,both parse syntax in the example can't help me to parse this json file with separated JSON in each line
var parser = JSONStream.parse(**['rows', true]**);
var parser = JSONStream.parse([**/./**]);
Can someone help me with that

Warning: Since this answer was written, the author of the JSONStream library removed the emit root event functionality, apparently to fix a memory leak.
Future users of this library, you can use the 0.x.x versions if you need the emit root functionality.
Below is the unmodified original answer:
From the readme:
JSONStream.parse(path)
path should be an array of property names, RegExps, booleans, and/or functions. Any object that matches the path will be emitted as 'data'.
A 'root' event is emitted when all data has been received. The 'root' event passes the root object & the count of matched objects.
In your case, since you want to get back the JSON objects as opposed to specific properties, you will be using the 'root' event and you don't need to specify a path.
Your code might look something like this:
var fs = require('fs'),
JSONStream = require('JSONStream');
var stream = fs.createReadStream('data.json', {encoding: 'utf8'}),
parser = JSONStream.parse();
stream.pipe(parser);
parser.on('root', function (obj) {
console.log(obj); // whatever you will do with each JSON object
});

JSONstream is intended for parsing a single huge JSON object, not many JSON objects. You want to split the stream at newlines, then parse them as JSON.
The NPM package split claims to do this splitting, and even has a feature to parse the JSON lines for you.

If your file is not enough large here is an easy, but not performant solution:
const fs = require('fs');
let rawdata = fs.readFileSync('fileName.json');
let convertedData = String(rawdata)
.replace(/\n/gi, ',')
.slice(0, -1);
let JsonData= JSON.parse(`[${convertedData}]`);

I created a package #jsonlines/core which parses jsonlines as object stream.
You can try the following code:
npm install #jsonlines/core
const fs = require("fs");
const { parse } = require("#jsonlines/core");
// create a duplex stream which parse input as lines of json
const parseStream = parse();
// read from the file and pipe into the parseStream
fs.createReadStream(yourLargeJsonLinesFilePath).pipe(parseStream);
// consume the parsed objects by listening to data event
parseStream.on("data", (value) => {
console.log(value);
});
Note that parseStream is a standard node duplex stream.
So you can also use for await ... of or other ways to consume it.

Here's another solution for when the file is small enough to fit into memory. It reads the whole file in one go, converts it into an array by splitting it at the newlines (removing the blank line at the end), and then parses each line.
import fs from "fs";
const parsed = fs
.readFileSync(`data.jsonl`, `utf8`)
.split(`\n`)
.slice(0, -1)
.map(JSON.parse)

Related

Can you use 'require' in react to import a library?

In my react project, I'm trying to convert XML data from an API call into JSON (using a library called xml-js).
As per the documentation, I'm importing the library in my parent component as follows
const convert = require('xml-js')
and then attempting the convert the API data as follows
const beerList =
'<Product>
<Name>Island Life IPA</Name>
<Volume>300ml/473ml</Volume>
<Price>$10/$13</Price>
<ABV>6.3%</ABV>
<Handpump>No</Handpump>
<Brewery>Eddyline</Brewery>
<IBU/>
<ABV>6.3%</ABV>
<Image>islandlife.png</Image>
<Country>New Zealand</Country>
<Description>Fruited IPA</Description>
<Pouring>Next</Pouring>
<IBU/>
<TapBadge/>
<Comments/>
</Product>'
const beerJs = convert(beerList,{compact: true, spaces: 4})
The errors are telling me that 'convert' is not a function, which tells me that the library isn't being imported. So is the issue with using 'require' syntax, and if so, what alternative would work in react?
which tells me that the library isn't imported
No. If that were the case, you wouldn't even get that far, your require call would throw an error.
Instead, it tells you that convert is not a function - which it isn't! Look at it in a debugger or log it, and you'll see it's an object with several functions inside. You can't call an object like a function.
Take a look at the xml-js docs again:
This library provides 4 functions: js2xml(), json2xml(), xml2js(), and xml2json(). Here are the usages for each one (see more details in the following sections):
var convert = require('xml-js');
result = convert.js2xml(js, options); // to convert javascript object to xml text
result = convert.json2xml(json, options); // to convert json text to xml text
result = convert.xml2js(xml, options); // to convert xml text to javascript object
result = convert.xml2json(xml, options); // to convert xml text to json text
So the solution is to call convert.xml2json and not convert:
const beerJs = convert.xml2json(beerList, {compact: true, spaces: 4})
Or maybe you want an actual object and not a JSON string, then you'd use convert.xml2js (in which case the spaces option is useless):
const beerJs = convert.xml2js(beerList, {compact: true})

Reading an FASTA file

I want to convert the following line of a file into JSON, I want to save that into an mongoose schema.
>HWI-ST700660_96:2:1101:1455:2154#5#0/1
GAA…..GAATG
Should be:
{“>HWI-ST700660_96:2:1101:1455:2154#5#0/1”: “GAA…..GAATG”}
I have tried several options, one sample below, but no success, any suggestion?
const parser = require("csv-parse/lib/sync");//import parser
const fs = require("fs");//import file reader
const path = require("path");//for join paths
const sourceData = fs.readFileSync(path.join(__dirname, "Reads.txt"), "utf8");//read the file, locally stored
console.log(sourceData);//print out for checking
const documents = parser(sourceData);//parsing, it works for other situations I have tested, in a column like data
console.log(documents);//printing out
This code give me an output as following:
[ [ '>HWI-ST700660_96:2:1101:1455:2154#5#0/1' ],
[ 'GAATGGAATGAAATGGATAGGAATGGAATGGAATGGAATGGATTGGAATGGATTAGAATGGATTGGAATGGAATGAAATTAATTTGATTGGAATGGAATG' ],...
Similar question: fasta file reading python
Because you are using the default config of the parser, it does simply output arrays of arrays in that configuration.
If you want to receive objects you will need to give the parser some options (columns) first. Take a look at the doc.
When using the sync parsing mode (like you are using) you can provide options like this:
const documents = parse(sourceData, {columns: true})
columns:true will infer the column names from the first line of the input csv.

How do I save the data in a local JSON file as a variable in node?

The code above works however I would like to load the searchStrings array from a JSON file.
My goal is to have the json file on a shared drive so my coworkers are able to edit the names.
You can use following:
var someObject = require('./somefile.json')
JSON can be imported via require just like Node modules. (See Ben Nadel's explanation.)
You would generally want to store it as a global variable, rather than re-loading it on every keyup event. So if the JSON is saved as watchlist.json, you could add
var watchlist = require('./watchlist');
at the top of your code. Then the search command could be written (without needing the for loop) as:
kitten = kitten.toLowerCase();
if (watchlist.entities.indexOf(kitten) !== -1) {
alert('potential watchlist');
}
In the latest versions of Node you can simply use imports!
CommonJs:
const jsonContent = require('path/to/json/file.json')
ES6:
import jsonContent from 'path/to/json/file.json'
You can also import JSon files dinamically, take the following example:
if (condition) {
const jsonContent = require('path/to/json/file.json')
// Use JSon content as you prefer!
}
that way, you only load your JSon file if you really need it and your code will score better performances!
Do you prefer an old school approach?
ES6:
import fs from 'fs' // or const fs = require('fs') in CommonJs
const JSonFile = fs.readFileSync('path/to/json/file.json', 'utf8')
const toObject = JSON.parse(JSonFile)
// read properties from `toObject` constant!
Hope it helps :)

NodeJS JSON file Read without newline characters

I am reading a JSON file using fs.readFileSync(fileName, 'utf8'); but the results include newline characters, and the output is getting like:
"{\r\n \"name\":\"Arka\",\r\n \"id\": \"13\"\r\n}"
How do I avoid these characters?
my local file looks like:
{
"name":"Arka",
"id": "13"
}
Its unnecessary to read JSON in using fs.readFileSync(). This requires you to also write a try/catch block around the fs.readFileSync() usage and then use JSON.parse() on the file data. Instead you can require JSON files in Node as if they were packages. They will get parsed as if you read the file in as a string and then used JSON.parse(), this simplifies the reading of JSON to one line.
let data = require(fileName)
console.log(data) // { name: 'Arka', id: '13' }
If you want to serialize the parsed JS object within data to a file without the new line & carriage return characters you can write the JSON string to a file using JSON.stringify() only passing in data.
const {promisify} = require('util')
const writeFile = util.promisify(require('fs').writeFile)
const data = require(fileName)
const serializeJSON = (dest, toJson) => writeFile(dest, JSON.stringify(toJson))
serializeJSON('./stringify-data.json', data)
.then(() => console.log('JSON written Successfully'))
.catch(err => console.log('Could not write JSON', err))
You could read the file and then remove them with a regex:
var rawJson = fs.readFileSync(fileName, 'utf8');
rawJson = rawJson.replace(/\r|\n/g, '');
Keep in mind though that for parsing JSON with JSON.parse, you don't need to do this. The result will be the same with and without the newlines.

How can i make a new single json object by extracting particular fields from realtime json data using node.js

I have the following code which publishes the json data in the specified url using mqtt.The initial data is retrieved from http.
var request = require('request');
var JSONStream = require('JSONStream');
var es = require('event-stream');
var mqtt = require('mqtt');
request({url: 'http://isaacs.couchone.com/registry/_all_docs'})
.pipe(JSONStream.parse('rows.*'))
.pipe(es.mapSync(function (data) {
console.info(data);
var client = mqtt.createClient(1883, 'localhost');
client.publish('NewTopic', JSON.stringify(data));
client.end();
return data;
}))
The following is the subscriber code which subscribes the data that is published (in the above code) through mqtt
var mqtt = require('mqtt');
var client = mqtt.createClient();
client.subscribe('NewTopic');
client.on('message', function(topic, message) {
console.info(message);
});
In the above code, I get all json data in the specified url in 'message'.I need to extract 'id' and 'value' from the received data and make it as a single JSON object and need to publish it to mqtt,so that another client can subscribe only the 'id' and 'value' as json data.
To convert a JSON text into an object, you can use the eval() function. eval() invokes the JavaScript compiler. Since JSON is a proper subset of JavaScript, the compiler will correctly parse the text and produce an object structure. The text must be wrapped in parens to avoid tripping on an ambiguity in JavaScript's syntax.
var myObject = eval(message);
The eval function is very fast. However, it can compile and execute any JavaScript program, so there can be security issues. The use of eval is indicated when the source is trusted and competent. It is much safer to use a JSON parser. In web applications over XMLHttpRequest, communication is permitted only to the same origin that provide that page, so it is trusted. But it might not be competent. If the server is not rigorous in its JSON encoding, or if it does not scrupulously validate all of its inputs, then it could deliver invalid JSON text that could be carrying dangerous script. The eval function would execute the script, unleashing its malice.
To defend against this, a JSON parser should be used. A JSON parser will recognize only JSON text, rejecting all scripts. In browsers that provide native JSON support, JSON parsers are also much faster than eval.
var myObject = JSON.parse(message);
And use it as a Object:
myObject.id;
myObject.value;
Create a object with just id and value :
var idAndValueObj = {};
idAndValueObj.id = myObject.id;
idAndValueObj.value = myObject.value;
Convert to JSON string:
var jsonStr = JSON.stringify(idAndValueObj);