Puppeteer is stuck on big xml url - puppeteer

I am trying open a big sitemap xml file with puppeteer and parse its content, but it seems to be waiting forever. Below is what I am doing:
const response = await page.goto("xmlurl")
await response.text()
The xml is 15 MBs with 20k entries in it. I just want to read its contents but I am guessing it is trying to parse it like an html file. Is there way to get only the raw text without all the parsing?

Related

How to render html in backend and save to pdf?

I am creating a document for my users that is prefilled/customized with each user's information, and I would like to save a copy of the document to my database/filesystem.
To show the document to the user, in the frontend I have a React page with a few blanks. I pull info from the backend to fill in those blanks, and I allow the user to print the finished document out. I would like to save a pdf for myself in the backend too, though, and I'm not sure how to do it.
Is it possible to render and populate React in my backend and convert that into a pdf, all in the backend?
I've tried Googling different solutions, but I haven't found anything helpful.
use headless browser, such as puppeteer:
const puppeteer = require('puppeteer')
async function printPDF(url) {
const browser = await puppeteer.launch({ headless: true })
const page = await browser.newPage()
await page.goto(url)
const pdf = await page.pdf({ format: 'A4' })
await browser.close()
return pdf
}
depending on what programming languague is your BE using, you can likely follow these steps with the proper library:
On the BE you should have access to your customer information already as you're sending it to FE.
With this information use a template system to render the variables to the HTML code, you may need to edit your react code a little to match the template scheme.
Then with the render template use a library to generate a PDF file and save it to the proper place (depending on your architecture, eg: on a folder on the same system, an s3 bucket, etc).
Finally after saving the pdf and getting the URL, save the string URL to your user table if any.
For example, in python you can use the following libs:
jinja (template renderer)
pdfkit (html pdf renderer)

get json data from a public json file not REST api with React

I have a url similar to https://www.nonexistentsite.com/fubar.json where fubar.json is a public json file which will download to your file system if you navigate to the url with your browser. I have a React web app where I want to read that file directly so as to display some of its data. I don't want to bother with any kind of a backend that would download the file to the apps file system so that it can read it. I want the React front end to read it directly in it's client side code with a fetch or an axios call or something like that. I'm familiar with the typical situation where I have a REST url like https://www.nonexistentsite.com/fubar which I can call and get the data. I'm failing to find info on how to handle this situation.
You could use Axios to load the data from the json file.
Example usage;
axios.get('https://www.nonexistentsite.com/fubar.json')
.then(jsoncontent => {
console.log(jsoncontent);
//do stuff with jsoncontent here
});
Maybe I'm misunderstanding your question, but I believe if you are just needing to fetch the json from a hosted file, you should be able to do so with axios.get(url)

API return data in CSV format

I'm creating an API which should return data in a CSV format. I set the content-type header to text/csv but this forces a download of the contents as a csv file.
I'm using NodeJS and the express framework. It could be that this is standard behaviour. However I would like to know how you guys solved this issue.
This is a sample of the code that I'm using:
res.set('Content-Type', 'text/csv');
var toCsv = require('to-csv');
// obj is a just a standard JavaScript object.
res.send(toCsv(obj));
I would like that the person using the API can retrieve data in a CSV format without actually downloading a file
Maybe have a look at this question:
How does browser determine whether to download or show
It's your browser that decides that content of the type "text/csv" should be downloaded.
You should simply consider using another content-type, if you just want the csv to show in the browser as plain text.
Try this instead:
res.set('Content-Type', 'text/plain');

Download Client Side Json as CSV

I am using the angularJS frontend framework and nodejs/express as a backend server to send and receive JSON. The backend sent a large JSON object to the frontend and I was wondering if I could download the JSON object from the frontend in CSV format.
The data is stored as json in an scope variable: $scope.data in an angular controller. Then I converted the data to a string in CSV format in the variable $scope.CSVdata. How do I get the CSVdata to download from the client browser?
I know nodejs can be set up to send a file in CSV format but it would be nice to keep the backend a clean JSON api.
Referencing this post I've thrown together quick demonstration on how this may be done using AngularJS:
JavaScript Demo (Plunker)
I've wrapped the referenced Base64 code in a service, and use it in the following way:
$scope.downloadCSV = function() {
var data = Base64.encode($scope.CSVData);
window.location.href = "data:text/csv;base64," + data;
};
There are some disadvantages to this method however as mentioned in the comments. I've pulled out some bullet points from the Wikipedia page on this subject. Head over there for the full list.
Data URIs are not separately cached from their containing documents (e.g. CSS or HTML files), therefore the encoded data is downloaded
every time the containing documents are re-downloaded.
Internet Explorer 8 limits data URIs to a maximum length of 32 KB. (Internet Explorer 9 does not have this limitation)
In IE 8 and 9, data URIs can only be used for images, but not for navigation or JavaScript generated file downloads.[7]
Base64-encoded data URIs are 1/3 times larger in size than their binary equivalent. (However, this overhead is reduced to 2–3% if the
HTTP server compresses the response using gzip)
Data URIs do not carry a filename as a normal linked file would. When saving, a default filename for the specified MIME type is
generally used.
[ . . . ]

Send PDF as byte[] / JSON problem

I am trying to send a generated PDF file (Apache FOP) to the client. I know this can be done by writing the array to the response stream and by setting the correct content type, length and so on in the servlet. My problem is that the whole app was built based on the idea that it will only receive/send JSON. In the servlet's service() method, I have this:
response.setContentType("application/json");
reqBroker.process(request, response);
RequestBroker is the class who processes the JSON (jackson processor), everything is generic and I cannot change it. On top of this, I have to receive the JSON from the request correctly, to access the data and generate my pdf. So those two lines are necessary. But when I send the response, I need to have another content type so that the pdf is displayed correctly in the browser.
So far, I am able to send the byte array as part of the JSON, but then I don't know how to display the array as PDF on the client (if smth like this is even possible).
I would like some suggestions on how can I send my pdf and set the right header, without messing with the JSON. Thanks.
JSON and byte arrays don't mix.
Instead, you should create an <iframe> and point it to a URL that returns a raw PDF.
Take a look here:How to send pdf in json, it lists couple of approaches that you can consider. The easiest way is to convert the binary data into string by using Base64 compression. In C#, this would mean a call to Convert.FromBase64String. However this has space overhead as Base64 compression means around +33% more memory. If you can get away with it, this is the least complicated solution. in case additional size is an issue you can think about zipping it up.