How run CLOUD FUNCTION on old files in the bucket (trigger: cloud storage (create/finalise)) - google-cloud-functions

I have a cloud function :
What does it do: uploads csv file into bigquery
Trigger: cloud storage (create/finalize)
GCS bucket status: already has 100s of files
regularly more files are uploaded to the bucket daily
I tested my function and looks perfect, whenever I upload new file, it goes into bigquery straight away.
QUESTION:
How can I upload the files which already been in the bucket before I deploy the function?

Posting as community wiki to help other community members that will encounter this issue. As stated by #Sana and #guillaume blaquiere:
The easiest solution is copying all files to a temp bucket and moving them back to the bucket, however it seems a bit silly but old files will trigger the function and get uploaded into BigQuery. Generating the events is to recreate the files and write them again to finalize.

Related

How to download files from AWS s3 storage on schedule using Github actions?

I've some small files to download from AWS S3 storage on a specific schedule. The limitation is that I need to use Github Actions to get them, so no external scripts. I realise it's possible to send a file from Github to AWS S3 using a Github Action but what about the other way around?
The files are just monthly generated metadata.json files, nothing massive. Each month there is a new one created and I need to grab the last 3 most recent for reporting. Ultimately I'd want to automatically convert it to .csv also and possibly join them into one larger.json, but I don't need to do anything else special with them.
I can find dozens of examples of sending files to S3 on a schedule but not receiving.

How to process files which are uploaded in google cloud storage for validation in cloud function

How to process files which are uploaded in google cloud storage (Java)
I have to unzip all the files which are uploaded in cloud storage in google cloud function.
Basically i need to check file formats of the files within zip folder.
Is there any way to download it in cloud function ?Is it right way to do
Thanks in advance
Using Cloud Functions triggered by Cloud Storage (GCS) events is a common and good practice:
https://cloud.google.com/functions/docs/calling/storage#functions-calling-storage-java
When an object is uploaded to GCS, one or more Cloud Functions is triggered, uses the Cloud Storage client library:
https://cloud.google.com/storage/docs/reference/libraries
To:
unzip the object
enumerate its content
process (check file types etc.)
Do something with the result
NOTE events only arise on changes so you'll likely want to consider a mechanism that reconciles the source bucket's content for zipped files that e.g. existed prior to the trigger's deployment

Monitor and automatically upload local files to Google Cloud Bucket

My goal is to make a website (hosted on Google's App Engine through a Bucket) that includes a upload button much like
<p>Directory: <input type="file" webkitdirectory mozdirectory /></p>
that prompts users to select a main directory.
The main directory will first generate a subfolder and have discrete files being written every few seconds, up to ~4000 per subfolder, upon which the machine software will create another subfolder and continue, and so on.
I want Google Bucket to automatically create a Bucket folder based on metadata (e.g. user login ID and time) in the background, and the website should monitor the main directory and subfolders, and automatically upload every file, sequentially from the time they are finished being written locally, into the Cloud Bucket folder. Each 'session' is expected to run for ~2-5 days.
Creating separate Cloud folders is meant to separate user data in case of multiple parallel users.
Does anyone know how this can be achieved? Would be good if there's sample code to adapt into existing HTML.
Thanks in advance!
As per #JohnHanely, this is not really feasible using a application. I also do not understand the use case entirely but I can provide some insight into monitoring Cloud Buckets.
GCP provides Cloud Functions:
Respond to change notifications emerging from Google Cloud Storage. These notifications can be configured to trigger in response to various events inside a bucket—object creation, deletion, archiving and metadata updates.
The Cloud Storage Triggers will help you avoid having to monitor the buckets yourself and can instead leave that to GCF.
Maybe you could expand on what you are trying to achieve with that many folders? Are you trying to create ~4,000 sub-folders per user? There may be a better path forward should we know more about the intended use of the data? Is seems you want hold data and perhaps a DB is better suited?
- Application
|--Accounts
|---- User1
|-------Metadata
|----User2
|------Meatadata

BigQuery auto-detect schema cause load of Google Drive CSV to fail

I've been using BigQuery for a while and load my data by fetching a CSV from an http address, uploading this to Google Drive using the Drive API, then attaching this to BigQuery using the BigQuery API.
I always specified auto-detect schema via the API and it has worked perfectly on a cron until March 16, 2017.
On March 16 it stopped working. The CSV still loads to Google Drive fine, but BigQuery won't pick it up.
I started troubleshooting by attempting to load the same CSV manually using the BigQuery UI, and noticed something strange: using auto-detect schema seems to prevent the loading of the CSV, because when I enter the schema manually it loads fine.
I thought maybe some rogue data might be the problem, but auto-detect schema isn't working for me now even with incredibly basic test tables, like...
id name
1 Paul
2 Peter
Has anyone else found auto-detect schema suddenly stopped working.
Maybe some default setting has changed on the API?
I could not get it to work either from GDrive today - 23 mar.
Note: first time ever using BigQuery/Google Cloud Storage.
I had a large CSV of bus stops 134MB.
Tried uploading it to GDrive but couldnt get it to import to big Query.
Just Tried Google Cloud Storage Bucket and it worked ok.

How to get an accurate file list through the google drive api

I'm developing an application using the Google drive api (java version). The application saves files on a Google drive mimicking a file system (i.e. has a folder tree). I started by using the files.list() method to retrieve all the existing files on the Google drive but the response got slower as the number of files increased (after a couple of hundred).
The java Google API hardcodes the response timeout to 20 seconds. I changed the code to load one folder at a time recursively instead (using files.list().setQ("'folderId' in parents) ). This method beats the timeout problem but it consistently misses about 2% of the files in my folders (the same files are missing each time). I can see those files through the Google drive web browser interface and even through the Google drive API if I search the file name directly files.list().setQ("title='filename'").
I'm assuming that the "in parents" search uses some inexact indexing which may only be updated periodically. I need a file listing that's more robust and accurate.
Any ideas?
could you utilize the Page mechanism to do multiple times of queries and each query just asks for a small mount of result ?