JS apache-arrow tableFromIPC's supported compression method/level? - pyarrow

I'm using file systems (local, Google Cloud Storage, and maybe S3) to exchange data between the web front end (JS) and back end (Python).
After writing Arrow IPC data format to file systems using the Python back end like below:
with self.file_system.open_output_stream(f'{self.bucket}/{file_name}') as sink:
with ipc.new_file(sink, schema=table.schema, options=ipc.IpcWriteOptions(compression='lz4')) as writer:
writer.write(table)
sink.flush()
the front-end JS package apache-arrow doesn't read the LZ4 or ZSTD compressed files:
import { readFileSync } from 'fs';
import { tableFromIPC } from 'apache-arrow';
const arrow = readFileSync('xx.arrow');
const table = tableFromIPC(arrow);
console.table(table.toArray());
I got this error:
Uncaught (in promise) Error: Record batch compression not implemented
I didn't find any clue from the JS package documentation which is its default reading format or how I could customize the options. Does anyone have an idea?

Related

CSV file read issue in firebase functions

I am trying to read a csv file in a firebase function so that I can process the file and do the rest operations using the data.
import * as csv from "csvtojson";
const csvFilePath = "<gdrive shared link>"
try{
console.log("First Method...")
csv()
.fromFile(csvFilePath)
.then((jsonObj: any)=>{
console.log("jsonObj....",JSON.stringify(jsonObj));
})
console.log("Second Method...")
const jsonArray=await csv().fromFile(csvFilePath);
console.log("jsonArray...", JSON.stringify(jsonArray))
}
catch(e){
console.log("error",JSON.stringify(e))
}
The above mentioned are the 2 methods I have tried for reading the csv but both shows the firebase error
'Error: File does not exist. Check to make sure the file path to your csv is correct.'
In case of 'csvFilePath' I have tried 2 methods
Just added the csv file in same folder of the function and added the code like
const csvFilePath = "./student.csv"
Added the same file to google drive and changed the access permissions to anyone with the link can read and edit and given the path to same
const csvFilePath = "<gdrive shared link>"
Both shows the same error. In case of google drive I don't want to use any sort of google credential because I was intented to read a simple csv file in firebase function.
I will start by proposing that you convert your csv to json locally or without the function and see if it works. This is because I see you are using ES6 imports which might be causing an issue since all the documentation uses require. You can also try CSV Parse or some solutions provided in this question as an alternative, trying them without the function to check if it actually works and discard it. Actually, you can upload the JSON once you have converted it from the csv, but that depends on what you are trying to do.
I think the best way to achieve this, is following the approach given in this question, that first uploads the file into cloud storage and using onFinalize() to trigger the conversion.
Also, will address these three questions that went through similar issues with the path. They were able to fix it by adding __dirname. Each one has some extra useful information.
Context for "relative paths" seems to change to the calling module if a module is imported
The csvtojson converter ignores my file name and just puts undefined
How to avoid the error which throws a csvtojson

S3 ManagedUpload equivalent in aws javascript sdk v3?

In the older version of javascript I was using managedupload function for uploading big files to s3, which will do the queueing and manage multiparting of files. But in V3 this function is not anywhere in documentation, is that removed? or is there any alternatives? Please help...
In V3 the high level abstractions are moved to functionality specific lib packages while the client packages offer a one to one mapping of the low level public apis.
For S3 the client is in #aws-sdk/client-s3 and the high level operations are in #aws-sdk/lib-storage packages respectively.
Sample upload code for a managed upload would look like the following
const { S3Client } = require("#aws-sdk/client-s3");
const { Upload } = require("#aws-sdk/lib-storage");
const multipartUpload = new Upload({
client: new S3Client({}),
params: {Bucket: 'bucket', Key: 'key', Body: stream},
});
More information here.

Convert HTML to PDF or PNG without headeless browser instance in NodeJS

TL;DR:
Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.
Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.
Longer version:
In a NodeJS server we need to convert an HTML string to a PDF or PNG based on the request params. We are using puppeteer to generate this PDF and PNG (screenshot) deployed in a google cloud function. In my local running this application in a docker and restricted memory usage to 100MB and this seems working. But in cloud function it throws memory limit exception when we set the cloud function to 250MB memory. For a temporary solution we upgraded the cloud function to 1 GB.
We would like to try any alternatives for puppeteer without any headless browser approach. Another library PDF-Kit looks good but it have canvas api kind of input. We can't directly feed html.
Any thoughts or input on this
Any suggestions in NodeJS to convert an HTML to PDF or PNG without any headless browser instances.
Yes, you can try with jsPDF. I never used it before.
The syntax is simple.
Under the hood it looks no headless browser libraries are used and it seems this is a 100% pure javascript implementation.
You can feed the library directly with and HTML string.
BUT there is no png option. For images anyway there are a lot of solution that could be combined with jsPDF (so, HTML to PDF to PNG) or also other HTML to PNG direct solutions. Take a look here.
Also anyone uses puppeteer in any production environment. I would like to know how the resource utilisations and performance of running headless browser in prod.
When you want use puppeteer, I suggest to split services: a simple http server that must just handle the HTTP communication with your clients and a separate puppeteer service. Both services must be scalable but, ofcourse, the second will require more resources to run. To optimize resorces, I suggest using puppeter-cluster to create a cluster of puppeteer workers. You can better handle errors, flow and concurrency and at the same time you can save memory by using a single istance of Chromium (with the CONCURRENCY_PAGE or CONCURRENCY_CONTEXT model)
If you can use Docker, then a great solution for you may be Gotenberg.
It's an incredible service that can convert a lot of formats (HTML, Markdown, Word, Excel, etc.) into PDF.
If your page render depends on JavaScript, then no problem, it will run it and wait (you can even configure the max wait time) for the page to be completely rendered to generate your PDF.
We are using it for an application that generates 3000 PDFs per day and never had any issue with it.
Demo:
Take a look at this sample HTML invoice: https://sparksuite.github.io/simple-html-invoice-template/
Now let's convert it to PDF:
Boom, done!
1: Gotenberg URL (here using a demo endpoint provided by Gotenberg team with some limitations like 2 requests per second per IP and 5MB body limit)
2: pass an url parameter with the URL of the webpage you want to convert
3: You get the PDF as the HTTP response with Content-Type application/pdf
Curl version:
curl --location --request POST 'https://demo.gotenberg.dev/forms/chromium/convert/url' \
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf
Node.JS version:
const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');
async function main() {
const formData = new FormData();
formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
const res = await fetch('https://demo.gotenberg.dev/forms/chromium/convert/url', {
method: 'POST',
body: formData
})
const pdfBuffer = await res.buffer()
// You can do whatever you like with the pdfBuffer, such as writing it to the disk:
fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}
main()
Using your own Docker instance instead of the demo endpoint, here is what you need to do:
1. Create the Gotenberg Docker container:
docker run -p 3333:3000 gotenberg/gotenberg:7 gotenberg
2. Call the http://localhost:3333/forms/chromium/convert/url endpoint:
Curl version:
curl --location --request POST 'http://localhost:3333/forms/chromium/convert/url' \
--form 'url="https://sparksuite.github.io/simple-html-invoice-template/"' \
-o myfile.pdf
Node.JS version:
const fetch = require('node-fetch');
const FormData = require('form-data');
const fs = require('fs');
async function main() {
const formData = new FormData();
formData.append('url', 'https://sparksuite.github.io/simple-html-invoice-template/')
const res = await fetch('http://localhost:3333/forms/chromium/convert/url', {
method: 'POST',
body: formData
})
const pdfBuffer = await res.buffer()
// You can do whatever you like with the pdfBuffer, such as writing it to the disk:
fs.writeFileSync('/home/myfile.pdf', pdfBuffer);
}
main()
Gotenberg homepage: https://gotenberg.dev/
If you have access to command wkhtmltopdf, I recommended it.
We use with success in our production website to generate pdfs.
First generate file_name html file, then
wkhtmltopdf --encoding utf8 --disable-smart-shrinking --dpi 100 -s {paper_size} -O {orientation} '{file_name}'

JSZip read downloaded data (Angular 2)

I am trying to use JSZip to unzip a JSON file but due to my lack of understanding how JSZip works I get the response in a format that I do not know how to use.
So far this is my code:
this.rest.getFile(this.stlLocation).subscribe(
data => {
let JSONFIle = new JSZIP();
JSONFIle.file(data.url, data._body, {binary : true, compression : 'DEFLATE'});
console.log(JSONFIle);
},
err => {
this.msgs.push({severity: 'error', summary: 'Error Message', detail: err});
}
);
So I download a file using an angular 2 service and I use an observable to get the response. When the data is received I finally call JSZip and try to unzip the file but the result of the operation is an intricate object with my data scattered all over the place and buried inside several layers. All I want is the unzipped JSON file that I can open and process.
Thank you for your help,
Dino
after a bit of reading I have realized I was going on the wrong path. If you are downloading the file to a browser, you shouldn't have to do anything. Browsers add the Accept-Encoding: 'deflate' header automatically; it is both unnecessary and not good practice to do this at a DOM/JS level. If you are using NGINX the following link may help you out:
NGINX COMPRESSION AND DECOMPRESSION

write a file using FileSystem API

I am trying to create a file using the File system API..i googled and i get a code
function onFs(fs) {
fs.root.getFile('log.txt', {create: true, exclusive: true},
function(fileEntry) {
fileEntry.getMetaData(function(md) {
}, onError);
},
onError
);
}
window.requestFileSystem(TEMPORARY, 1024*1024 /*1MB*/, onFs, onError);
can any one say what is the fs which is passed as function argument..
Please refer me a good example...
fs is a javascript object that allows you to make "system-like" level calls to a virtual filesystem.
So for instance you can use the fs object to create/get a reference to a file in the virtual filesystem with fs.root.getFile(...). The third argument (in your case, the following lines of code from your above snippet) in the .getFile(...) method happens to be a callback for successful obtaining of a file reference.
function(fileEntry) {
fileEntry.getMetaData(function(md) {
}, onError);
}
That file reference (in your case it is called fileEntry) can have various methods called such as .createWriter(...) for writing to files, .file(...) for reading files and .remove(...) for removing files. Your method calls .getMetaData(...) which contains a file size and modification date.
For more specifics as well as some good examples on the html5 file-system api you may find the following article helpful Exploring the File-System API
The location of the files differs on the browser, operating system and storage type (persistent vs. temporary) but the following link has served to be quiet useful as well Chrome persistent storage locations