How do I parse a html page using nodejs to find a qr code? - html

I want to parse a web page, searching for QRcodes in the page. When I find them, I am going to read them using the QRcode npm module.
The hard part is, I don't know how to parse the html page in a way I can detect the only the image tags that contains a QRcode inside it.
I tried finding some kind of pattern in the images that contain a Qr code, but it usually starts with "?qr" but I think the ending is different everytimwe.
I'm using the module require-promise to get the raw html, and then I parse through it
const rp = require('request-promise');
const url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States';
rp(url)
.then(function(html){
//success!
console.log(html);
})
.catch(function(err){
//handle error
});
I want to be able to download the image of the QRcode.

You need to pass the html returned into something like https://www.npmjs.com/package/node-html-parser
const rp = require('request-promise');
const parser = require('node-html-parser');
const url = 'https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States';
rp(url)
.then(function(html){
const data = parser.parse(html);
console.log(JSON.stringify(data));
})
.catch(function(err){
//handle error
});
Then you can access things off the data object to find the QR code

Related

Can't parse <content:encoded> from RSS

This is what RSS looks like: https://reddit.0qz.fun/r/dankmemes/top.json
My script perfectly parses "title", "description" and other items tags from the RSS. But it doesn't parse "content:encoded".
I tried this:
item.getChild("content:encoded").getText();
And this:
item.getChild("encoded").getText();
And this (found on Stackoverflow):
item.getChild("http://purl.org/rss/1.0/modules/content/","encoded").getText();
But nothing works... Could you help me?
The namespace is important for the getChild and similar methods to parse the content successfully.
Your third example is close, but you have the parameter order backwards, and you need to use the XmlService.getNamespace method, not a raw string. (The signature is getChild(string, namespace), not getChild(string, string).)
This one is tricky as the namespace should be included for some of the elements, and not for others. I am not an XML expert, so I don't know if this is expected behavior or not. The minimal example script below does find and log the text of the <content:encoded> elements using getChild, but I was only able to figure out when to include or exclude the namespace through trial and error. (If anyone has further info on why this is, please let me know in the comments.)
function logContentEncoded() {
const result = UrlFetchApp.fetch("https://reddit.0qz.fun/r/dankmemes/top.json");
const document = XmlService.parse(result.getContentText());
const root = document.getRootElement();
const namespace = XmlService.getNamespace("http://purl.org/rss/1.0/modules/content/");
const channel = root.getChild("channel"); // fails if namespace is included
const item = channel.getChild("item"); // fails if namespace is included
const encoded = item.getChild("encoded", namespace); // fails if namespace is EXCLUDED
console.log(encoded.getText());
}
Adding this library to the project: 1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw
You can scrape the page. With this code, i.e., You can get the first content of <content:encoded> tags.
function getDataFromJson() {
var url = "https://reddit.0qz.fun/r/dankmemes/top.json";
var fromText = '<content:encoded>';
var toText = '</content:encoded>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
.data(content)
.from(fromText)
.to(toText)
.build();
Logger.log(scraped);
return scraped;
}

How would you create a downloadable pdf in a client side app?

One of our requirements for an admin tool is to create a form that can be filled and translated to a downloadable pdf file. (A terms and condition with blank input fields to be exact).
I did some googling and tried creating a form in html and css and converted it into a canvas using the html2canvas package. Then I used the jspdf package to convert it into a pdf file. The problem is that I cannot get it to fit and resize accordingly to an a4 format with correct margins. I'm sure I can get to a somewhat working solution if I spend some time on it.
However, my real question is how would you guys solution this? Is there a 3rd party app/service that does this exact thing? Or would you do all this in the server side? Our current app is using angular 7 with firebase as our backend.
Cheers!
I was able to use the npm package pdfmake to create a dynamic pdf based on user information the user provided while interacting with my form. (I was using React) It opened the pdf in a new tab and the user is able to save the pdf. In another application (still React),
I used the same package to create a receipt so you can customize the size of the "page". We created the pdf and used the getBase64() method and sent the pdf as an email attachement.
My service function:
getEvidenceFile(id: number, getFileContent: boolean) {
return this.http.get(environment.baseUrl + ‘upload’ + ‘/’ + id , {responseType: ‘blob’ as ‘json’})
.map(res => res);
}
My component function called from the selected item of a FileDownload…
FileDownload(event: any) {
// const blob = await this.callService.getEvidenceFile(event.target.value, true);
// const url = window.URL.createObjectURL(blob);
this.callService.getEvidenceFile(event.target.value, true).subscribe(data => {
var binaryData = [];
binaryData.push(data);
var downloadLink = document.createElement(‘a’);
downloadLink.href = window.URL.createObjectURL(new Blob(binaryData));
document.body.appendChild(downloadLink);
downloadLink.click();
});
}

How to parse json newline delimited in Angular 2

I am writing an Angular 2 app (built with angular cli), and trying to use AWS Polly text-to-speech API.
According to the API you can request audio output as well as "Speech Marks" which can describe word timing, visemes, etc. The audio is delivered as "mp3" format, and the speech marks as "application/x-json-stream", which I understand as a "new line" delimited JSON. It cannot be parsed with JSON.parse() due to the new lines. I have yet been unable to read/parse this data. I have looked at several libs that are for "json streaming" but they are all built for node.js and won't work with Angular 2. My code is as follows...
onClick() {
AWS.config.region = 'us-west-2';
AWS.config.accessKeyId = 'xxxxx';
AWS.config.secretAccessKey = 'yyyyy';
let polly = new AWS.Polly();
var params = {
OutputFormat: 'json',
Text: 'Hello world',
VoiceId: 'Joanna',
SpeechMarkTypes:['viseme']
};
polly.synthesizeSpeech(params, (err, data) => {
if (err) {
console.log(err, err.stack);
} else {
var uInt8Array = new Uint8Array(data.AudioStream);
var arrayBuffer = uInt8Array.buffer;
var blob = new Blob([arrayBuffer]);
var url = URL.createObjectURL(blob);
this.audio.src = url;
this.audio.play(); // works fine
// speech marks info displays "application/x-json-stream"
console.log(data.ContentType);
}
});
Strangely enough Chrome browser knows how to read this data and displays it in the response.
Any help would be greatly appreciated.
I had the same problem. I saved the file so I could then read it line by line, accessing the JSON objects when I need to highlight words being read. Mind you this is probably not the most effective way, but an easy way to move on and get working on the fun stuff.
I am trying out different ways to work with Polly, will update answer if I find a better way
You can do it with:
https://www.npmjs.com/package/ndjson-parse
That worked for me.
But I can't play audio, I tried your code it says
DOMException: Failed to load because no supported source was found.

Handlebars Js Not Loading my Content

I am trying to parse some json with Handlebars on my website. I don't get any errors but also don't get any content. I've developed my own rest point to return a json response and I think my problem might be there somewhere, but you can see the response in the code.
http://codepen.io/anon/pen/Czdxh
$(document).ready(function(){
var raw_template = $('#post-template').html();
// Compile that into an handlebars template
var template = Handlebars.compile(raw_template);
// Retrieve the placeHolder where the Posts will be displayed
var placeHolder = $("#all-posts");
// Fetch all Blog Posts data from server in JSON
$.getJSON("https://instapi-motleydev.rhcloud.com/liked",function(data){
$.each(data,function(index,element){
// Generate the HTML for each post
var html = template(element);
// Render the posts into the page
placeHolder.append(html);
});
});
});
Thanks for any help!
The problem was I was getting an array response from the server and needed to adapt my template to include the {#each this} syntax. Also switched my getJSON to a simple get and looped over the reaction that way and tossed the $.each handler.

How write and immediately read a file nodeJS

I have to obtain a json that is incrusted inside a script tag in certain page... so I can't use regular scraping techniques, like cheerio.
Easy way out, write the file (download the page) to the server and then read it using string manipulation to extract the json (there are several) work on them and save to my db hapily.
the thing is that I'm too new to nodeJS, and can't get the code to work, I think that I'm trying to read the file before it is fully written, and if read it time before obtain [Object Object]...
Here's what I have so far...
var http = require('http');
var fs = require('fs');
var request = require('request');
var localFile = 'tmp/scraped_site_.html';
var url = "siteToBeScraped.com/?searchTerm=foobar"
// writing
var file = fs.createWriteStream(localFile);
var request = http.get(url, function(response) {
response.pipe(file);
});
//reading
var readedInfo = fs.readFileSync(localFile, function (err, content) {
callback(url, localFile);
console.log("READING: " + localFile);
console.log(err);
});
So first of all I think you should understand what went wrong.
The http request operation is asynchronous. This means that the callback code in http.get() will run sometime in the future, but the fs.readFileSync, due to its synchronous nature will execute and complete even before the http request will actually be sent to the background thread that will execute it, since they are both invoked in what is commonly known as the (same) tick. Also fs.readFileSync returns a value and does not use a callback.
Even if you replace fs.readFileSync with fs.readFile instead the code still might not work properly since the readFile operation might execute before the http response is fully read from the socket and written to the disk.
I strongly suggest reading: stackoverflow question and/or Understanding the node.js event loop
The correct place to invoke the file read is when the response stream has finished writing to the file, which would look something like this:
var request = http.get(url, function(response) {
response.pipe(file);
file.once('finish', function () {
fs.readFile(localFile, /* fill encoding here */, function(err, data) {
// do something with the data if there is no error
});
});
});
Of course this is a very raw and not recommended way to write asynchronous code but that is another discussion altogether.
Having said that, if you download a file, write it to the disk and then read it all back again to the memory for manipulation, you might as well forgo the file part and just read the response into a string right away. Your code will then look something like so (this can be implemented in several ways):
var request = http.get(url, function(response) {
var data = '';
function read() {
var chunk;
while ( chunk = response.read() ) {
data += chunk;
}
}
response.on('readable', read);
response.on('end', function () {
console.log('[%s]', data);
});
});
What you really should do IMO is to create a transform stream that will strip away all the data you need from the response, while not consuming too much memory and yielding this more elegantly looking code:
var request = http.get(url, function(response) {
response.pipe(yourTransformStream).pipe(file)
});
Implementing this transform stream, however, might prove slightly more complex. So if you're a node beginner and you don't plan on downloading big files or lots of small files than maybe loading the whole thing into memory and doing string manipulations on it might be simpler.
For further information about transformation streams:
node.js stream api
this wonderful guide by substack
this post from strongloop
Lastly, see if you can use any of the million node.js crawlers already out there :-) take a look at these search results on npm
According to the http module help 'get' does not return the response body
This is modified from the request example on the same page
What you need to do is process the response with in the callback (function) passed into http.request so it can be called when it is ready (async)
var http = require('http')
var fs = require('fs')
var localFile = 'tmp/scraped_site_.html'
var file = fs.createWriteStream(localFile)
var req = http.request('http://www.google.com.au', function(res) {
res.pipe(file)
res.on('end', function(){
file.end()
fs.readFile(localFile, function(err, buf){
console.log(buf.toString())
})
})
})
req.on('error', function(e) {
console.log('problem with request: ' + e.message)
})
req.end();
EDIT
I updated the example to read the file after it is created. This works by having a callback on the end event of the response which closes the pipe and then it can reopen the file for reading. Alternatively you can use
req.on('data', function(chunk){...})
to process the data as it arrives without putting it into a temporary file
My impression is that you serializing a js object into JSON by reading it from a stream that's downloading a file containing HTML. This is do-able yet hard. Its difficult to know when you're search expression is found because if you parse as the chunks come in then you never know if you received only context and you could never find what you're looking for because it was split into 2 or many parts which were never analyzed as a whole.
You could try something like this:
http.request('u/r/l',function(res){
res.on('data',function(data){
//parse data as it comes in
}
});
This allows you to read data as it comes in. You can handle it to save to disc, db, or even parse it if you accumulated the contents within the script tags into a single string then parsed objects in that.