Cannot download file from Google Storage inside Cloud Function? - google-cloud-functions

Im trying to perform a simple download of a .docx file info a buffer so I can handle it latter inside my Cloud Function. I've been using the whole Google Platform for multiple projects but never faced the need to download in server side, and now I need to, I just cant.
The following piece of code is not working, it just sends timeout as a response (I don't even get an error If I try to catch it or something):
var bucket = admin.storage().bucket("gs://myBucket.com");
return bucket.file("001Lineales/4x3-1/1000.docx").download().then((contents)=>{
var buffer = contents[0];
//I never get into this point
}).catch((error)=>{
//No error
})
I tried in a local NodeJs script and worked as expected. Also tried to perform a readStream() download but no luck, the function gets hang up in any try of downloading the file.
return new Promise((resolve,reject)=>{
var archivo = bucket.file(selectedCategory).createReadStream();
var array = [];
//Under here, never happens
archivo.on('data', (d) => {array.push(d)}).on("end",()=>{
var newbuff = Buffer.concat(array);
resolve(newbuff)
})
})
The file permissions read/write are public. And the main problem is that debugging is difficult cause Im not able to perform this function in local emulator.
What can I do? Thanks in advance.
EDIT:
Double checking a local call with emulator, I get the following error:
Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.

Double check the service account hat you've assigned to the Cloud Function and that you've given it the permission it needs.
I think Storage Object Viewer will give you what you need to read a file into the buffer.
By default, if you haven't changed it, the AppEngine's default service account gets used, which I don't think has access to Storage.

Related

How to read CSV data stored in Google Cloud Storage with Cloud Functions

As part of a communications effort to a large user base, I need to send upwards of 75,000 emails per day. The emails of the users I'm contacting are stored in a CSV file. I've been using Postman Runner to send these requests via SendGrid (Email API), but with such a large volume, my computer either slows way down or Postman completely crashes before the batch completes. Even if it doesn't crash, it takes upwards of 3 hours to send this many POST requests via Runner.
I'd like to upload the CSV containing the emails into a Cloud Storage bucket and then access the file using Cloud Functions to send a POST request for each email. This way, all the processing can be handled by GCP and not by my personal machine. However, I can't seem to get the Cloud Function to read the CSV data line-by-line. I've tried using createReadStream() from the Cloud Storage NodeJS client library along with csv-parser, but can't get this solution to work. Below is what I tried:
const sendGridMail = require('#sendgrid/mail');
const { Storage } = require('#google-cloud/storage');
const fs = require('fs');
const csv = require('csv-parser');
exports.sendMailFromCSV = (file, context) => {
console.log(` Event: ${context.eventId}`);
console.log(` Event Type: ${context.eventType}`);
console.log(` Bucket: ${file.bucket}`);
console.log(` File: ${file.name}`);
console.log(` Metageneration: ${file.metageneration}`);
console.log(` Created: ${file.timeCreated}`);
console.log(` Updated: ${file.updated}`);
const storage = new Storage();
const bucket = storage.bucket(file.bucket);
const remoteFile = bucket.file(file.name);
console.log(remoteFile);
let emails = [];
fs.createReadStream(remoteFile)
.pipe(csv())
.on('data', function (row) {
console.log(`Email read: ${row.email}`);
emails.push(row.email);
//send email using the SendGrid helper library
const msg = {
to: [{
"email": row.email;
}],
from: "fakeemail#gmail.com",
template_id: "fakeTemplate",
};
sendGridMail.send(msg).then(() =>
context.status(200).send(file.body))
.catch(function (err) {
console.log(err);
context.status(400).send(file.body);
});
})
.on('end', function () {
console.table(emails);
});
};
The Cloud Function is currently triggered by an upload to the Cloud Storage bucket.
Is there a way to build a solution to this problem without loading the file into memory? Is Cloud Functions to right path to be moving down, or would it be better to use App Engine or some other tool? Willing to try any GCP solution that moves this process to the cloud
Cloud Function's memory can be shared/used as a temporary directory /tmp. Thus, you can download the csv file from the cloud storage bucket into that directory as a local file, and then process it, as if that file is handled from the local drive.
At the same time, you may would like to remember about 2 main restrictions:
Memory - up to 2Gb for everything
Timeout - no more than 540 seconds per invocation.
I personally would create a solution based on a combination of a few GCP resources.
The first cloud function is triggered by a 'finlize' event - when the csv file is saved in the bucket. This cloud function reads the file and for every record composes a Pub/Sub message with relevant details (enough to send an email). That message is posted into a Pub/Sub topic.
The Pub/Sub topic is used to transfer all messages from the first cloud function to trigger the second cloud function.
The second cloud function is triggered by a Pub/Sub message, which contains all neccessary details to process and send an email. As there may be 75K records in the source csv file (for example), you should expect 75K invocations of the second cloud function.
That may be enough at a high level. Pub/Sub paradigm guarantees at least once delivery (but may be more than once), so if you need no more than one email per address, some additional resources may be required to achieve an idempotent behaviour.
Basically you will have to download the file locally in the Cloud Function machine to be able to read it in this way.
Now there are multiple options to workaround this.
The most basic/simplest is to provision a Compute Engine machine and run this operation from it if is a once on a time event.
If you need to do this more frequently (i.e. daily) you can use an online tool to convert your csv file into json and import it to Firestore, then you can read a lot faster the emails from Firestore.

Interrupted downloads when downloading a file from Web Api (remote host closed error 0x800704CD)

I have read near 20 other posts about this particular error, but most seem to be issues with the code calling Response.Close or similar, which is not our case. I understand that this particular error means that typically a user browsed away from the web page or cancelled the request midway, but in our case we are getting this error without cancelling a request. I can observe the error just after a few seconds, the download just fails in the browser (both Chrome and IE, so it's not browser specific).
We have a web api controller that serves a file download.
[HttpGet]
public HttpResponseMessage Download()
{
//
// Enumerates a directory and returns a Read-only FileStream of the download
var stream = dataProvider.GetServerVersionAssemblyStream(configuration.DownloadDirectory, configuration.ServerVersion);
if (stream == null)
{
return new HttpResponseMessage(HttpStatusCode.NotFound);
}
var response = new HttpResponseMessage(HttpStatusCode.OK)
{
Content = new StreamContent(stream)
};
response.Content.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment");
response.Content.Headers.ContentDisposition.FileName = $"{configuration.ServerVersion}.exe";
response.Content.Headers.ContentType = new MediaTypeHeaderValue(MediaTypeNames.Application.Octet);
response.Content.Headers.ContentLength = stream.Length;
return response;
}
Is there something incorrect we are doing in our Download method, or is there something we need to tweak in IIS?
This happens sporadically. I can't observe a pattern, it works sometimes and other times it fails repeatedly.
The file download is about 150MB
The download is initiated from a hyperlink on our web page, there is no special calling code
The download is over HTTPS (HTTP is disabled)
The Web Api is hosted on Azure
It doesn't appear to be timing out, it can happen just after a second or two, so it's not hitting the default 30 second timeout values
I also noticed I can't seem to initiate multiple file downloads from the server at once, which is concerning. This needs to be able to serve 150+ businesses and multiple simultaneous downloads, so I'm concerned there is something we need to tweak in IIS or the Web Api.
I was able to finally fix our problem. For us it turned out to be a combination of two things: 1) we had several memory leaks and CPU intensive code in our Web Api that was impacting concurrent downloads, and 2) we ultimately resolved the issue by changing MinBytesPerSecond (see: https://blogs.msdn.microsoft.com/benjaminperkins/2013/02/01/its-not-iis/) to a lower value, or 0 to disable. We have not had an issue since.

Trouble with getting script to recognize a JSON file in the directory (Google API)

So I am attempting to learn how to use the Google Sheets API with Node.js. In order to get an understanding, I followed along with the node.js quick start guide supplied by Google. I attempted to run it, nearly line for line a copy of the guide, just without documentation. I wind up encountering this: cmd console output that definitely didn't work.
Just in case anyone wants to see if I am not matching the guide, which is entirely possible since I am fairly new to this, here is a link to the Google page and my code.
https://developers.google.com/sheets/api/quickstart/nodejs
var fs = require('fs');
var readline = require('readline');
var google = require('googleapis');
var googleAuth = require('google-auth-library');
var SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly'];
var TOKEN_DIR = (process.env.HOME || process.env.HOMEPATH ||
process.env.USERPROFILE) + '/.credentials/';
var TOKEN_PATH = TOKEN_DIR + 'sheets.googleapis.com-nodejs-quickstart.json';
fs.readFile('client_secret.json', function processClientSecrets(err, content) {
if (err) {
console.log('Error loading client secret file: ' + err);
}
authorize(JSON.parse(content), listMajors);
});
I have tried placing the JSON file in each and every part of the directory, but it still won't see it. I've been pulling hairs all day, and a poke in the right direction would be immensely appreciated.
From your command output:
Error loading client secret file
So your if (err) line is being triggered. But since you don't throw the error, the script continues anyway (which is dangerous in general).
SyntaxError: Unexpected token u in JSON at position 0
This means that the data you are passing to JSON.parse() is undefined. It is not a valid JSON string.
You could use load-json-file (or the thing it uses, parse-json) to get more helpful error messages. But it's caused by the fact that your content variable has nothing since the client_secret.json you tried to read could not be found.
As for why the file could not be found, there could be a typo in either the script or the filename you saved the JSON in. Or it may have to do with the current working directory. You may want to use something like this to ensure you end up with the same path regardless of the current working directory.
path.join(__dirname, 'client_secret.json')
Resources
path.join()
__dirname

Trying to get Google Drive to work with PCL Xamarin Forms application

I’m using Xamarin Forms to do some cross platform applications and I’d like to offer DropBox and GoogleDrive as places where users can do backups, cross platform data sharing and the like. I was able to get DropBox working without doing platform specific shenanagins just fine, but Google Drive is really giving me fits. I have my app setup properly with Google and have tested it with a regular CLI .NET application using their examples that read the JSON file off the drive and create a temporary credentials file – all fine and well but getting that to fly without access to the file system is proving elusive and I can’t find any examples on how to go about it.
I’m currently just using Auth0 as a gateway to allow users to provide creds/access to my app for their account which works dandy, the proper scope items are requested (I’m just using read only file access for testing) – I get an bearer token and refresh token from them – however when trying to actually use that data and just do a simple file listing, I get a 400 bad request error.
I’m sure this must be possible but I can’t find any examples anywhere that deviate from the slightest of using the JSON file downloaded from Google and creating a credentials file – surely you can create an instance of the DriveService object armed with only the bearer token...
Anyway – here’s a chunk of test code I’m trying to get the driveService object configured – if anyone has done this or has suggestions as to what to try here I’d very much appreciate your thoughts.
public bool AuthenticationTest(string pBearerToken)
{
try
{
var oInit = new BaseClientService.Initializer
{
ApplicationName = "MyApp",
ApiKey = pBearerToken,
};
_googleDrive = new DriveService(oInit);
FilesResource.ListRequest listRequest = _googleDrive.Files.List();
listRequest.PageSize = 10;
listRequest.Fields = "nextPageToken, files(id, name)";
//All is well till this call to list the files…
IList<Google.Apis.Drive.v3.Data.File> files = listRequest.Execute().Files;
foreach (var file in files)
{
Debug. WriteLine(file.Name);
}
}
catch (Exception ex)
{
RaiseError(ex);
}
}

node.js - repeatedly updating webpage from mysql database

I am trying to create a node.js app to automatically update a webpage every few seconds with new data from a mysql database. I have followed the information on this site: http://www.gianlucaguarini.com/blog/push-notification-server-streaming-on-a-mysql-database/
The code on this site does indeed work, but upon further testing it keeps running the "handler" function and therefore executing the readFile function for each row of the database processed.
I am in the process of learning node.js, but cannot understand why the handler function keeps getting called. I would only like it to get called once per connection. Constantly reading the index.html file like this seems very ineffecient.
The reason that I know the handler function keeps getting called is that I placed a console.log("Hello"); statement in the handler function and it keeps outputting that line to the console.
Do you provide the image URLs that the client.html is looking for? Here's what I think is happening:
The client connects to your server via Socket.IO and retrieves the user information (user_name, user_description, and user_img). The client then immediately tries to load an image using the user_img URL. The author's server code however, doesn't appear to support serving these pictures. Instead it just returns the same client.html file for every request. This would be why it appears to be calling handler over and over again - it's trying to load a picture for every user.
I would recommend using the express module in node to serve static files instead of trying to do it by hand. Your code would look something like this:
var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);
http.use(app.static(__dirname + "/public"));
That essentially says to serve any static files they request from the public folder. In that folder you will put client.html as well as the user photos.