I am learning about streaming with nodejs, I understand the examples shown in the request npm module;
request(url).pipe(fs.createWriteStream('./filename.json'))
But there are two parts of my problem.
Case 1:
function fetchSitemaps() {
return requestAsync(url).then(data => {
const $ = cheerio.load(data);
let urls = [];
$("loc").each((i, e) => urls.push($(e).text()));
fs.writeFileSync('./sitemaps.json', JSON.stringify(urls))
})
}
I want to convert the above from writeFileSync to createWriteStream, but how do I keep appending data to an array which is in JSON format?
Case 2:
function fetchLyricUrls() {
let sitemaps = JSON.parse(fs.readFileSync('./sitemaps.json'));
sitemaps.forEach((sitemap, i) => {
let fileName = i + '.json';
if(url_pat.exec(sitemap)) {
fileName = url_pat.exec(sitemap)[1] + '.json';
}
requestAsync(url).then(data => {
const $ = cheerio.load(data);
let urls = [];
$("loc").each((i, e) => urls.push($(e).text()));
return urls;
}).then(urls => {
let allUrls = [];
urls.map(u => {
return requestAsync(u).then(sm => {
const $ = cheerio.load(sm);
$("loc").each((i, e) => allUrls.push($(e).text()))
fs.writeFileSync('./lyrics.json', JSON.stringify(allUrls))
return allUrls;
});
});
});
});
}
The first part of the problem is same, appending to a json data using writeStream, but this time, I want to parse the the html data and get some text, which I want to send using stream, not the html data as a whole.
So let's split up the answers
Case 1
First of all I'd try to keep the data as a stream and try not to accumulate it. So in essence, instead of loading the whole sitemap and then parsing it, I'd use something like the xml-nodes so that the nodes are a separate stream. Then my module scramjet would come to transform
const request = require('request');
const xmlNodes = require('xml-nodes');
const writable = fs.createWritableStream('./sitemaps.json');
const cheerio = require('cheerio');
const scramjet = require('scramjet');
writable.write('[');
let first = 0;
request('http://example.com/sitemap.xml')
// this fetches your sitemap
.on('end', () => writable.end("]"))
// when the stream ends, this will end the sitemaps.json
.pipe(xmlNodes('loc'))
// this extracts your "loc" nodes
.pipe(new scramjet.DataStream())
// this creates a mappable stream
.map((nodeString) => cheerio('loc', nodeString).text())
// this extracts the text as in your question
.map((url) => (first++ ? ',' : '') + JSON.stringify(url))
// this makes sure that strings are nicely escaped
// and prepends them with a comma on every node, but first one
.pipe(writable, {end: false})
// and this will push all your entries to the writable stream
Case 2
Here you'll need to do something similar, although if case 1 is an immediate step, then I'd suggest to store the files in lines of JSONs, not an array. It'd make easier to stream that way.
Related
I am currently tasked with finding the amount of times a specific email has contacted us. The contacts are stored in JSON files and the key should be "email".
The thing is there are potentially infinite JSON files so I would like to merge them in to a single object and iterate to count the email frequency.
So to be clear I need to read in the JSON content. Produce it as a log
consume the message
transform that message into a tally of logs per email used.
My thought process may be wrong but I am thinking I need to merge all JSON files into a single object that I can then iterate over and manipulate if needed. However I believe I am having issues with the synchronicity of it.
I am using fs to read in (I think in this case 100 JSON files) running a forEach and attempting to push each into an array but the array comes back empty. I am sure I am missing something simple but upon reading the documentation for fs I think I just may be missing it.
const fs = require('fs');
let consumed = [];
const fConsume = () => {
fs.readdir(testFolder, (err, files) => {
files.forEach(file => {
let rawData = fs.readFileSync(`${testFolder}/${file}`);
let readable = JSON.parse(rawData);
consumed.push(readable);
});
})
}
fConsume();
console.log(consumed);
For reference this is what each JSON object looks like, and there are several per imported file.
{
id: 'a7294140-a453-4f3c-91b6-210819e2c43e',
email: 'ethan.hernandez#microsoft.com',
message: 'successfully handled skipped operation.'
},
fs.readdir() is async, so your function returns before it executes the callback. If you want to use synchronous code here, you need to use fs.readdirSync() instead:
const fs = require('fs');
let consumed = [];
const fConsume = () => {
const files = fs.readdirSync(testFolder)
files.forEach(file => {
let rawData = fs.readFileSync(`${testFolder}/${file}`);
let readable = JSON.parse(rawData);
consumed.push(readable);
});
}
fConsume();
console.log(consumed);
await page.on("response", async (response) => {
const request = await response.request();
if (
request.url().includes("https://www.jobs.abbott/us/en/search-results")
) {
const text = await response.text();
const root = await parse(text);
root.querySelectorAll("script").map(async function (n) {
if (n.rawText.includes("eagerLoadRefineSearch")) {
const text = await n.rawText.match(
/"eagerLoadRefineSearch":(\{.*\})\,/,
);
const refinedtext = await text[0].match(/\[{.*}\]/);
//console.log(refinedtext);
console.log(JSON.parse(refinedtext[0]));
}
});
}
});
In the snippet I have posted a data which is in text format I want to extract eagerLoadRefineSearch : { (and its content too)} as a text with regex and perform json.parse on extracted text so that i get finally a json object of "eagerLoadRefineSearch" : {}.
I am using puppetter for intercepting response. I just want a correct regex which can get me whole object text of "eagerLoadRefineSearch" : {} (with its content).
I am sharing the response text from the server in this link https://codeshare.io/bvjzJA .
I want to extract "eagerLoadRefineSearch" : {} from the data which is in text format in this https://codeshare.io/bvjzJA
Context
Silly mistakes
The text you are parsing has no flanked " around eagerLoadRefineSearch. Now the object to match spans across several lines thus m flag is required. Also . does not match new line so the alternative is to use [\s\S]. Refer to how-to-use-javascript-regex-over-multiple-lines.
Also also, don't use await on string method match.
Matching the closing brace
Quick search on this topic lead me to this link and as I suspected, this is complicated. To ease this problem I made this assumption that the text is correctly indented. We can match on the indentation level to find the closing brace with this pattern.
/(?<indent>[\s]+)\{[\s\S]+\k<indent>\}/gm
This works if the both the opening and the closing braces are at the same level of indentation. They are not in our case since eagerLoadRefineSearch: is between the indent and opening brace but we can account for this.
const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm
Valid JSON
As metioned earlier the keys lack flanking double quotes so lets replace all keys with "key"s.
const reMatchKeys = /(\w+):/gm
const impure = 'hello: { name: "nammu", age: 18, subjects: { first: "english", second: "mythology"}}'
const pure = impure.replace(reMatchKeys, '"$1":')
console.log(pure)
Then we get rid of the trailing commas. Here's the regex that worked for this example.
const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm
Once we pipe these replace functions, the data is good to use by JSON.parse.
Code
await page.on('response', async (response) => {
const request = await response.request();
if (
request
.url()
.includes('https://www.jobs.abbott/us/en/search-results')
) {
const text = await response.text();
const root = await parse(text);
root.querySelectorAll('script').map(async function (n) {
const data = n.rawText;
if (data.includes('eagerLoadRefineSearch')) {
const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm;
const reMatchKeys = /(\w+):\s/g;
const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm;
const parsedStringArray = data.toString().match(reMatchObject);
for (const parsed of parsedStringArray) {
const noTrailingCommas = parsed.replace(reMatchTrailingCommas, '');
const validJSONString = '{' + noTrailingCommas.replace(reMatchKeys, '"$1":') + '}';
console.log(JSON.parse(validJSONString));
}
}
});
}
});
Two part quersion.
Part 1:
Im uploading an image to my server and want to save it to my database.
So far:
table:
resolver:
registerPhoto: inSequence([
async (obj, { file }) => {
const { filename, mimetype, createReadStream } = await file;
const stream = createReadStream();
const t = await db.images.create({
Name: 'test',
imageData: stream ,
});
},
])
executing query:
Executing (default): INSERT INTO `images` (`Id`,`imageData`,`Name`) VALUES (DEFAULT,?,?);
But nothing is saved.
Im new to this and im probably missing something but dont know what.
Part2:
This is followed by part 1, lets say I manage to save the image, how do I read it and send it back to my FE?
An edit: Ive read alot of guides saving the an image name to the db and then tha actuall image in a folder. This is NOT what im after, want to save the image to the DB and then be able to fetch it from the DB abd present it.
This took me some time but I finaly figured it out.
First step (saving to the db):
Have to get the entire stream data and read it like this:
export const readStream = async (stream, encoding = 'utf8') => {
stream.setEncoding('base64');
return new Promise((resolve, reject) => {
let data = '';
// eslint-disable-next-line no-return-assign
stream.on('data', chunk => (data += chunk));
stream.on('end', () => resolve(data));
stream.on('error', error => reject(error));
});
};
use like this:
const streamData = await readStream(stream);
Before saving I tur the stream into a buffer:
const buff = Buffer.from(streamData);
Finaly the save part:
db.images.create(
{
Name: filename,
imageData: buff,
Length: stream.bytesRead,
Type: mimetype,
},
{ transaction: param }
);
Note that I added Length and Type parameter, this is needed if you like to return a stream when you return the image.
Step 2 (Retrieving the image).
As #xadm said multiple times you can not return an image from GRAPHQL and after some time I had to accept that fact, hopefully graphql will remedy this in the future.
S What I needed to do is set up a route on my fastify backend, send a image Id to this route, fetch the image and then return it.
I had a few diffirent approaches to this but in the end I simpy returned a binary and on the fronted I encoded it to base64.
Backend part:
const handler = async (req, reply) => {
const p: postParams = req.params;
const parser = uuIdParserT();
const img = await db.images.findByPk(parser.setValueAsBIN(p.id));
const binary = img.dataValues.imageData.toString('binary');
const b = Buffer.from(binary);
const myStream = new Readable({
read() {
this.push(Buffer.from(binary));
this.push(null);
},
});
reply.send(myStream);
};
export default (server: FastifyInstance) =>
server.get<null, any>('/:id', opts, handler);
Frontend part:
useEffect(() => {
// axiosState is the obj that holds the image
if (!axiosState.loading && axiosState.data) {
// #ts-ignore
const b64toBlob = (b64Data, contentType = '', sliceSize = 512) => {
const byteCharacters = atob(b64Data);
const byteArrays = [];
for (let offset = 0; offset < byteCharacters.length; offset += sliceSize) {
const slice = byteCharacters.slice(offset, offset + sliceSize);
const byteNumbers = new Array(slice.length);
// #ts-ignore
// eslint-disable-next-line no-plusplus
for (let i = 0; i < slice.length; i++) {
byteNumbers[i] = slice.charCodeAt(i);
}
const byteArray = new Uint8Array(byteNumbers);
byteArrays.push(byteArray);
}
const blob = new Blob(byteArrays, { type: contentType });
return blob;
};
const blob = b64toBlob(axiosState.data, 'image/jpg');
const urlCreator = window.URL || window.webkitURL;
const imageUrl = urlCreator.createObjectURL(blob);
setimgUpl(imageUrl);
}
}, [axiosState]);
and finaly in the html:
<img src={imgUpl} alt="NO" className="imageUpload" />
OTHER:
For anyone who is attempting the same NOTE that this is not a best practice thing to do.
Almost every article I found saved the images on the sever and save an image Id and other metadata in the datbase. For the exact pros and cons for this I have found the following helpful:
Storing Images in DB - Yea or Nay?
I was focusing on finding out how to do it if for some reason I want to save an image in the datbase and finaly solved it.
There are two ways to store images in your SQL database. You either store the actual image on your server and save the image path inside your mySql db OR you create a BLOB using the image and store it in db.
Here is a handy read https://www.technicalkeeda.com/nodejs-tutorials/nodejs-store-image-into-mysql-database
you should save the image in a directory and save the link of this image in the database
I have a gulp task that uses Nightmare to visit a series of URLs, extract SVGs from them, process them and output them.
gulp.task('export', done => {
const path = require('path');
const Nightmare = require('nightmare');
const nightmare = new Nightmare();
const urls = ['http://one.com', 'http://two.org', 'http://three.net'];
async function exportPDFs (items) {
for (url of items) {
const filename = path.parse(url).name;
const selector = 'svg';
await nightmare
.goto(url)
.wait(selector)
.evaluate(selector => {
let content;
// Extract SVG from the page
return content;
}, selector)
.then(
svg => {
// Heavy operation that takes long
// How do I wait for this properly?
processThing(filename);
outputThing(filename);
},
err => console.error('Page evaluation failed', err)
);
}
await nightmare.end().then(() => done()); // ???
}
exportPDFs(urls);
});
How can I make it wait for the processing and outputting on each iteration, and at the end of all of them end the gulp task with done()?
Currently it ends before saving the last PDF:
Starting 'export'...
one.pdf saved
two.pdf saved
Finished 'export' after 3.2 s
three.pdf saved
Convert processThing and outputThing into promise. Then chain them like this,
.evaluate(()=>/*The code*/)
.then(processThing)
.then(outputThing)
.catch(e=>/*deal with errors*/)
I'm working with Angular2 and a nodejs rest api. I have to do one or more http request for a same task so I'm using Observable.forkJoin() to wait for all of them to finish.
I map the result with the json parsing method and then subscribe to this result but I can't get any json properties from the result the way I used to do.
My service method returns the Observable.forkJoin() itself:
public rename(file:MyFile, newName:string){
let requests = new Array();
for(let i=0; i<file.sources.length; i++){
let url:string = this.serverUrl;
if(src.name === "src1"){
url += "rename/src1";
} else if (src.name === "src2" ){
url += "rename/src2";
}
requests[i] = this.http.get(url)
.map((res:Response) => res.json())
.catch(this.handleError);
}
return Observable.forkJoin(requests);
}
Then I subscribe to it in another method elsewhere:
this.api.rename(this.selectedFile, newFileName).subscribe(
rep => {
// The editor tells me "Property 'name' doesn't exist on type '{}'."
console.log(rep[0].name);
},
err => { console.error(err); }
);
The server correctly respond with the data I asked. The rep[0] is correctly set, it looks like this:
Object {name: "res.png", id: "HyrBvB6H-", size: 0, type: "", isShared: falseā¦}
I suppose it's a typing problem. Usually, with a simple http.get request, it returns an 'any' object. Here it returns an '[]{}' object. res[0] is an '{}' object and I can't get the json properties on it.
Am I using the Observer.forkJoin() correctly? Am I missing something?
Thanks in advance for help :)
If is the editor complaining and it is not an error when the code executes, it likely is a typing problem. You can set the return type of rename() to:
public rename(file:MyFile, newName:string): Observable<any[]> { }
This should allow you access properties of the inner results such as name.
Or you can type the rep array in subscribe() as any[]:
this.api.rename(this.selectedFile, newFileName).subscribe(
(rep: any[]) => {
console.log(rep[0].name);
},
err => { console.error(err); }
);
If all else fails or doesn't work for your solution you can use Type Assertion to treat rep as any[]:
this.api.rename(this.selectedFile, newFileName).subscribe(
rep => {
const responses = rep as any as any[];
console.log(responses[0].name);
},
err => { console.error(err); }
);
If the results structure is consistent across the different endpoints, it would best practice to create an interface/class to replace any[] with.
Hopefully that helps!
http.get is a asynchronous process, so you can't use for loop.
Syntactically you have to nest the gets inside forkJoin, so you have something like this. You can use the for loop to build an array of urls first.:
return Observable.forkJoin([
this.http.get(url[1]).map(res => res.json()),
this.http.get(url[2]).map(res => res.json()),
this.http.get(url[3]).map(res => res.json())
])
.map((data: any[]) => {
this.part1 = data[0];
this.part2 = data[1];
this.part3 = data[2];
});
I wonder if you may be able to do something like this. I'll have a try tomorrow. It's late..
return Observable.forkJoin(let req = [];
for(let i=0; i<file.sources.length; i++){
req[i] = this.http.get(url[i]).map(res => res.json())
}
)