How to process files which are uploaded in google cloud storage for validation in cloud function - google-cloud-functions

How to process files which are uploaded in google cloud storage (Java)
I have to unzip all the files which are uploaded in cloud storage in google cloud function.
Basically i need to check file formats of the files within zip folder.
Is there any way to download it in cloud function ?Is it right way to do
Thanks in advance

Using Cloud Functions triggered by Cloud Storage (GCS) events is a common and good practice:
https://cloud.google.com/functions/docs/calling/storage#functions-calling-storage-java
When an object is uploaded to GCS, one or more Cloud Functions is triggered, uses the Cloud Storage client library:
https://cloud.google.com/storage/docs/reference/libraries
To:
unzip the object
enumerate its content
process (check file types etc.)
Do something with the result
NOTE events only arise on changes so you'll likely want to consider a mechanism that reconciles the source bucket's content for zipped files that e.g. existed prior to the trigger's deployment

Related

Cloud Functions methods for writing files to Storage?

A crucial difference between Firestore and Storage is that Firestore stores data (objects, arrays, strings, etc.) when Storage stores files. How we handle files in Cloud Functions is very different from how we handle data.
I use Node to call APIs, get files, and write the files to Storage. Node is hard. Are there easy methods to write files to Storage?
I found an easy way to write files to Storage from Cloud Functions. [Sounds of thin ice creaking.]
index.ts
import { getStorage, ref, uploadBytes, uploadString, connectStorageEmulator } from "firebase/storage";
let file = await got('https://audio.oxforddictionaries.com/en/mp3/winter__us_2.mp3');
await uploadBytes(storageRef, file['rawBody'], metadata);
The ['rawBody'] parameter overrides the requirement that uploadBytes only uploads files.
That's cringy, right? Is there a better way?
Google says
Cloud Client Libraries are the recommended option for accessing Cloud
APIs programmatically, where available.
In terms of Cloud Storage (and every other GCP product), they provide a whole bunch of working examples that you can practically copy/paste into your code.
Node.js Cloud Client Libraries
Specifically, see Google Cloud Storage: Node.js Client
The methods for accessing Cloud Storage in Cloud Functions for Firebase comes from the Firebase Admin SDK for Node.js and are a thin wrapper around the Cloud Storage SDK for Node.js. There is no friendly Firebase wrapper for these, so my usual approach is to search for (non Cloud Functions specific, non Firebase specific) Node.js example.

How run CLOUD FUNCTION on old files in the bucket (trigger: cloud storage (create/finalise))

I have a cloud function :
What does it do: uploads csv file into bigquery
Trigger: cloud storage (create/finalize)
GCS bucket status: already has 100s of files
regularly more files are uploaded to the bucket daily
I tested my function and looks perfect, whenever I upload new file, it goes into bigquery straight away.
QUESTION:
How can I upload the files which already been in the bucket before I deploy the function?
Posting as community wiki to help other community members that will encounter this issue. As stated by #Sana and #guillaume blaquiere:
The easiest solution is copying all files to a temp bucket and moving them back to the bucket, however it seems a bit silly but old files will trigger the function and get uploaded into BigQuery. Generating the events is to recreate the files and write them again to finalize.

hosting a JSON file for a 3rd party app/service to use

We currently use Jive Cloud N which can use the Rest API and allows the use of Custom Apps. Our UI devs have created an app which uses a JS GET to pull data from a JSON file for our "Birthdays and Anniversaries" tile.
At the moment, the JSON file is hosted on our UI dev's Google Cloud Apps account, but we wish to host it internally so we don't have to keep contacting them for changes.
I uploaded the file to our OneDrive for Business storage and created a public URL with full read permissions but the Jive platform is throwing an error trying to load the custom app.
The error is that the file
has been blocked by CORS policy: No "Access-Control-Allow-Origin"
header is present
Our dev said that to get it working on his Google Cloud App storage, he had to specify the allow-control-allow-origin field in the server's server app.yaml file. I don't know what this is and if there is an equivalent for ODfB/SharePoint.
To get to my question: How can I host this JSON file on ODfB or even somewhere on our Azure tenancy so that it can be used? Or am I better off trying to setup a Google Cloud App storage location and replicate our dev's setup? FYI - I'd prefer the former because we're using M$ for a number of cloud hosted services already.
Thanks in advance
To get to my question: How can I host this JSON file on ODfB or even somewhere on our Azure tenancy so that it can be used?
FYI - I'd prefer the former because we're using M$ for a number of cloud hosted services already.
Per my understanding, you could leverage Azure Blob Storage to store your JSON file, and you could use Microsoft Azure Storage Explorer to easily manage/share your files.
Moreover, You could manage anonymous read access to your containers and blobs, refer to this tutorial for more details. Also, you could leverage SAS to grant limited access to your storage account for other clients, you could follow this tutorial for getting started with SAS.
For a simple way, you could create your storage account and leverage Microsoft Azure Storage Explorer to manage/share your file as follows:
For cross domain accessing, you need to configure CORS Setting:
For sharing your file(blob), you could Set Container Public Access Level or leverage SAS to grant limited access to your file for other clients as follows:
Right click your container, select "Set Public Access Level":
Sample file for share: https://brucechen.blob.core.windows.net/brucechen/index.json
Also, you could right click your JSON file, click "Get Shared Access Signature":
Sample file for share: https://brucechen.blob.core.windows.net/brucechen/index.json?st=2017-02-28T08%3A04%3A00Z&se=2017-09-01T08%3A04%3A00Z&sp=r&sv=2015-12-11&sr=b&sig=rVkorHeNOd4j2YhkmmxZ6DfXVLf1FoN2smY6mNRIoWs%3D

How to retrieve my Install Statistics from the Google Developer's Console

I am trying to programmatically retrieve my company's app data from the Google Developer's Console, specifically the daily installs. I have found that Google recommends the gsutil tool to access the data programmatically through the Google Cloud Storage SDK. However, I beleive they charge for this service. I want a free way to programmatically retrieve the data, preferably as a JSON stream to avoid dealing with file downloads. I have found the "direct reporting" links, but I have problems authenticating when I try to use them, and I also have to do something with the actual files then.
Is there a way to get a JSON version of the data through OAuth2 or something without downloading an Excel file? Has anyone had to do this?
You should look into use the Core Reporting API.
There are client libraries available in a number of languages.
You should work through the Hello Analytics APIs to get started.
Java Script
PHP
Python
Java
A quick solution for building a dashboard would also be the Embed API.
Using the gsutil tool to access the company's storage bucket that are provided by google is a free service. I wrote a code that will run the gsutil code as a process through the command line and parsed the downloaded .csv files into a database for storage. OAuth2 was not necessary.

How to get an accurate file list through the google drive api

I'm developing an application using the Google drive api (java version). The application saves files on a Google drive mimicking a file system (i.e. has a folder tree). I started by using the files.list() method to retrieve all the existing files on the Google drive but the response got slower as the number of files increased (after a couple of hundred).
The java Google API hardcodes the response timeout to 20 seconds. I changed the code to load one folder at a time recursively instead (using files.list().setQ("'folderId' in parents) ). This method beats the timeout problem but it consistently misses about 2% of the files in my folders (the same files are missing each time). I can see those files through the Google drive web browser interface and even through the Google drive API if I search the file name directly files.list().setQ("title='filename'").
I'm assuming that the "in parents" search uses some inexact indexing which may only be updated periodically. I need a file listing that's more robust and accurate.
Any ideas?
could you utilize the Page mechanism to do multiple times of queries and each query just asks for a small mount of result ?