Parse JSON file uploaded in S3 using Lambda

Parse JSON file uploaded in S3 using Lambda - json

I'm trying to parse a JSON file that get's uploaded in S3. I invoke the lambda function using an S3 PUT/POST method trigger.
I'm using the following code.. however i'm not able to parse the json file. Can someone please help me?
var aws = require('aws-sdk');
var s3 = new aws.S3();
exports.handler = async (event, context, callback) => {
var srcBucket = event.Records[0].s3.bucket.name;
var srcKey = event.Records[0].s3.object.key;
console.log("Params: srcBucket: " + srcBucket + " srcKey: " + srcKey + "\n");
var getParams = {
Bucket: srcBucket,
Key: srcKey,
};
s3.getObject(getParams, function (err, data) {
if (err) console.log(err, err.stack);
else {
console.log(JSON.stringify(data.Body.toString()));
}
});
};

Your code looks correct However, I'd suggest taking the example from AWS docs as your starting point. Since your Lambda handler is an async handler, you have to await the promise returned by s3.getObject() , otherwise your function will complete before the callback executes (see the example code from the link).
Since you mention that your Lambda function cannot parse the file, I assume the function gets invoked by S3 trigger (i.e. you can see the console.log('Params: ...) line in Cloudwatch Logs). If that's not the case, first check that the S3 trigger is configured correctly and S3 has permission to invoke the Lambda function. If you created the function via AWS Console, this permission would have been set automatically.
The next step I'd suggest is to check the Lambda function's IAM role. Check if the IAM role has s3:GetObject permission for your bucket and all objects under it or for the specific prefix you have configured S3 notification (e.g. <bucket>/* or <bucket>/prefix/*).
If the Lambda IAM permissions are correct, you'll have to check S3 bucket policies. I suspect you haven't set up bucket policies according to what you describe.

Related

How to get around previously declared json body-parser in Nodebb?

I am writing a private plugin for nodebb (open forum software). In the nodebb's webserver.js file there is a line that seems to be hogging all incoming json data.
app.use(bodyParser.json(jsonOpts));
I am trying to convert all incoming json data for one of my end-points into raw data. However the challenge is I cannot remove or modify the line above.
The following code works ONLY if I temporarily remove the line above.
var rawBodySaver = function (req, res, buf, encoding) {
if (buf && buf.length) {
req.rawBody = buf.toString(encoding || 'utf8');
}
}
app.use(bodyParser.json({ verify: rawBodySaver }));
However as soon as I put the app.use(bodyParser.json(jsonOpts)); middleware back into the webserver.js file it stops working. So it seems like body-parser only processes the first parser that matches the incoming data type and then skips all the rest?
How can I get around that? I could not find any information in their official documentation.
Any help is greatly appreciated.
** Update **
The problem I am trying to solve is to correctly handle an incoming stripe webhook event. In the official stripe documentation they suggested I do the following:
// Match the raw body to content type application/json
app.post('/webhook', bodyParser.raw({type: 'application/json'}),
(request, response) => {
const sig = request.headers['stripe-signature'];
let event;
try {
event = stripe.webhooks.constructEvent(request.body, sig,
endpointSecret);
} catch (err) {
return response.status(400).send(Webhook Error:
${err.message});
}
Both methods, the original at the top of this post and the official stripe recommended way, construct the stripe event correctly but only if I remove the middleware in webserver. So my understanding now is that you cannot have multiple middleware to handle the same incoming data. I don't have much wiggle room when it comes to the first middleware except for being able to modify the argument (jsonOpts) that is being passed to it and comes from a .json file. I tried adding a verify field but I couldn't figure out how to add a function as its value. I hope this makes sense and sorry for not stating what problem I am trying to solve initially.

The only solution I can find without modifying the NodeBB code is to insert your middleware in a convenient hook (that will be later than you want) and then hack into the layer list in the app router to move that middleware earlier in the app layer list to get it in front of the things you want to be in front of.
This is a hack so if Express changes their internal implementation at some future time, then this could break. But, if they ever changed this part of the implementation, it would likely only be in a major revision (as in Express 4 ==> Express 5) and you could just adapt the code to fit the new scheme or perhaps NodeBB will have given you an appropriate hook by then.
The basic concept is as follows:
Get the router you need to modify. It appears it's the app router you want for NodeBB.
Insert your middleware/route as you normally would to allow Express to do all the normal setup for your middleware/route and insert it in the internal Layer list in the app router.
Then, reach into the list, take it off the end of the list (where it was just added) and insert it earlier in the list.
Figure out where to put it earlier in the list. You probably don't want it at the very start of the list because that would put it after some helpful system middleware that makes things like query parameter parsing work. So, the code looks for the first middleware that has a name we don't recognize from the built-in names we know and insert it right after that.
Here's the code for a function to insert your middleware.
function getAppRouter(app) {
// History:
// Express 4.x throws when accessing app.router and the router is on app._router
// But, the router is lazy initialized with app.lazyrouter()
// Express 5.x again supports app.router
// And, it handles the lazy construction of the router for you
let router;
try {
router = app.router; // Works for Express 5.x, Express 4.x will throw when accessing
} catch(e) {}
if (!router) {
// Express 4.x
if (typeof app.lazyrouter === "function") {
// make sure router has been created
app.lazyrouter();
}
router = app._router;
}
if (!router) {
throw new Error("Couldn't find app router");
}
return router;
}
// insert a method on the app router near the front of the list
function insertAppMethod(app, method, path, fn) {
let router = getAppRouter(app);
let stack = router.stack;
// allow function to be called with no path
// as insertAppMethod(app, metod, fn);
if (typeof path === "function") {
fn = path;
path = null;
}
// add the handler to the end of the list
if (path) {
app[method](path, fn);
} else {
app[method](fn);
}
// now remove it from the stack
let layerObj = stack.pop();
// now insert it near the front of the stack,
// but after a couple pre-built middleware's installed by Express itself
let skips = new Set(["query", "expressInit"]);
for (let i = 0; i < stack.length; i++) {
if (!skips.has(stack[i].name)) {
// insert it here before this item
stack.splice(i, 0, layerObj);
break;
}
}
}
You would then use this to insert your method like this from any NodeBB hook that provides you the app object sometime during startup. It will create your /webhook route handler and then insert it earlier in the layer list (before the other body-parser middleware).
let rawMiddleware = bodyParser.raw({type: 'application/json'});
insertAppMethod(app, 'post', '/webhook', (request, response, next) => {
rawMiddleware(request, response, (err) => {
if (err) {
next(err);
return;
}
const sig = request.headers['stripe-signature'];
let event;
try {
event = stripe.webhooks.constructEvent(request.body, sig, endpointSecret);
// you need to either call next() or send a response here
} catch (err) {
return response.status(400).send(`Webhook Error: ${err.message}`);
}
});
});

The bodyParser.json() middleware does the following:
Check the response type of an incoming request to see if it is application/json.
If it is that type, then read the body from the incoming stream to get all the data from the stream.
When it has all the data from the stream, parse it as JSON and put the result into req.body so follow-on request handlers can access the already-read and already-parsed data there.
Because it reads the data from the stream, there is no longer any more data in the stream. Unless it saves the raw data somewhere (I haven't looked to see if it does), then the original RAW data is gone - it's been read from the stream already. This is why you can't have multiple different middleware all trying to process the same request body. Whichever one goes first reads the data from the incoming stream and then the original data is no longer there in the stream.
To help you find a solution, we need to know what end-problem you're really trying to solve? You will not be able to have two middlewares both looking for the same content-type and both reading the request body. You could replace bodyParser.json() that does both what it does now and does something else for your purpose in the same middleware, but not in separate middleware.

Cloud Functions for Firebase Could not handle the request after a successful request

TLDR: After writing a JSON (successfully) to my Firestore, the next request will give me Internal Server Error (500). I have a suspicion that the problem is that inserting is not yet complete.
So basically, I have this code:
const jsonToDb = express();
exports.jsondb = functions.region('europe-west1').https.onRequest(jsonToDb);
jsonToDb.post('', (req, res) => {
let doc;
try {
doc = JSON.parse(req.body);
} catch(error) {
res.status(400).send(error.toString()).end();
return;
}
myDbFuncs.saveMyDoc(doc);
res.status(201).send("OK").end();
}
The database functions are in another JS file.
module.exports.saveMyDoc = function (myDoc) {
let newDoc = db.collection('insertedDocs').doc(new Date().toISOString());
newDoc.set(myDoc).then().catch();
return;
};
So I have several theories, maybe one of them is not wrong, but please help me with this. (Also if I made some mistakes in this little snippet, just tell me.)
Reproduction:
I send the first request => everything is OK, Json in the database.
I send a second request after the first request give me OK status => it does not do anything for a few secs, then 500: Internal Server Error.
Logs: Function execution took 4345 ms, finished with status: 'connection error'.
I just don't understand. Let's imagine I'm using this as an API, several requests simultaneously. Can't it handle? (I suppose it can handle, just I do something stupid.) Deliberately, I'm sending the second request after the first has finished and this occurs.
Should I make the saveMyDoc async?

saveMyDoc isn't returning a promise that resolves when all the async work is complete. If you lose track of a promise, Cloud Functions will shut down the work and clean up before the work is complete, making it look like it simply doesn't work. You should only send a response from an HTTP type function after all the work is fully complete.
Minimally, it should look more like this:
module.exports.saveMyDoc = function (myDoc) {
let newDoc = db.collection('insertedDocs').doc(new Date().toISOString());
return newDoc.set(myDoc);
};
Then you would use the promise in your main function:
myDbFuncs.saveMyDoc(doc).then(() => {
res.status(201).send("OK").end();
}
See how the response is only sent after the data is saved.
Read more about async programming in Cloud Functions in the documentation. Also watch this video series that talks about working with promises in Cloud Functions.

Trigger a cloud build pipeline using Cloud Function

I'm trying to create a cloud function listening to cloudbuilds topic and making an API call to trigger the build. I think I'm missing something in my index.js file (I'm new to Node.js). Can you provide a sample example of a Cloud Function making an API call to the Cloud Build API?
Here is my function:
const request = require('request')
const accessToken = '$(gcloud config config-helper --format='value(credential.access_token)')';
request({
url: 'https://cloudbuild.googleapis.com/v1/projects/[PROJECT_ID]/builds',
auth: {
'bearer': accessToken
},
method: 'POST',
json: {"steps": [{"name":"gcr.io/cloud-builders/gsutil", "args": ['cp','gs://adolfo-test-cloudbuilds/cloudbuild.yaml', 'gs://adolfo-test_cloudbuild/cloudbuild.yaml']}]},
},
module.exports.build = (err, res) => {
console.log(res.body);
});
I was executing the command gcloud config config-helper --format='value(credential.access_token)', copying the token, and putting it as a value to the variable accessToken. But this didn't work for me.
Here is the error: { error: { code: 403, message: 'The caller does not have permission', status: 'PERMISSION_DENIED' } }

I had the same exact problem and I have solved it by writing a small package, you can use it or read the source code.
https://github.com/MatteoGioioso/google-cloud-build-trigger
With this package you can run a pre-configured trigger from cloud build.
You can also extend to call other cloud build API endpoints.
As my understanding cloud build API requires either OAuth2 or a service account. Make sure you gave the right permission to cloud build in the gcp console under IAM. After that you should be able to download the service-account.json file.

Stackdriver export to .txt or PDF on drive/mail

I've set up an script which reads data from a spreadsheet and sends emails according this data.
Now, I've also set it up to do some simple logging via stackdriver.
What I'd like to do is to export these logs (after/at the end of every execution of the mail-script) to a .txt or .pdf file which then get saved to a specific Google Drive folder or been send by mail.
Unfortunately I can't seem to find out how to do this, or if its even posible?

There is no way to edit a Google docs file if this is what you where thinking of doing. Your going to have to create your .txt or .pdf file locally then upload the file to Google drive or send it as an email. Technically if you upload the file as a .txt i think that Google drive will allow you to export it as pdf but i haven't tried with the new version of Google drive.
var fileId = '1ZdR3L3qP4Bkq8noWLJHSr_iBau0DNT4Kli4SxNc2YEo';
var dest = fs.createWriteStream('/tmp/resume.pdf');
drive.files.export({
fileId: fileId,
mimeType: 'application/pdf'
})
.on('end', function () {
console.log('Done');
})
.on('error', function (err) {
console.log('Error during download', err);
})
.pipe(dest);
Downloading google Documents
I also dont think that you will be able to email a file directly from Google Drive you will have to download the file locally then add send your email.

Stackdriver has an error reporting API. Documentation for Stackdriver The API has REST capability, which means that you can call it from Apps Script using UrlFetchApp.fetch(url) where url is the url needed to get error reporting information. The base url for the Stackdriver API is: https://clouderrorreporting.googleapis.com The API must be enabled.
There are multiple methods that can be used with the API.
The method that you probably need is the list method, which requires the url:
https://clouderrorreporting.googleapis.com/v1beta1/{projectName=projects/*}/events
where the projectName parameter must be a Google Cloud Platform project ID.
See documentation on list at: projects.events.list
The return value for that HTTPS Request, if successful, is a "response body" with the following structure and data:
{
"errorEvents": [
{
object (ErrorEvent)
}
],
"nextPageToken": string,
"timeRangeBegin": string
}
The ErrorEvent is a JSON object with the following structure and data:
{
"eventTime": string,
"serviceContext": {
object (ServiceContext)
},
"message": string,
"context": {
object (ErrorContext)
}
}
So, if you want to send an email with error data from Stackdriver, it won't be sent directly from Stackdriver, you need to make a request to Stackdriver from Apps Script, get the error information, and then send an email from Apps Script.
Of course, you could have your own error handling system, that logged error information to some external target, (Eg. your spreadsheet, or a database) using UrlFetchApp.fetch(url);
To make the request to the Stackdriver API you would need code something like this:
var projectID = "Enter project ID";
var url = 'https://clouderrorreporting.googleapis.com/v1beta1/' + projectID
+ '/events';
var tkn = ScriptApp.getOAuthToken();
var options = {};
options.headers = {Authorization: 'Bearer ' + tkn}
options.muteHttpExceptions = true;
var rtrnObj = UrlFetchApp.fetch(url,options);
Logger.log(rtrnObj.getContentText())
I haven't use this API and I haven't tested this code. If anyone uses it, and has information or finds an error, please make a comment.

How to use update function to upload attachment in CouchDB

I would like to know what can I do to upload attachments in CouchDB using the update function.
here you will find an example of my update function to add documents:
function(doc, req){
if (!doc) {
if (!req.form._id) {
req.form._id = req.uuid;
}
req.form['|edited_by'] = req.userCtx.name
req.form['|edited_on'] = new Date();
return [req.form, JSON.stringify(req.form)];
}
else {
return [null, "Use POST to add a document."]
}
}
example for remove documents:
function(doc, req){
if (doc) {
for (var i in req.form) {
doc[i] = req.form[i];
}
doc['|edited_by'] = req.userCtx.name
doc['|edited_on'] = new Date();
doc._deleted = true;
return [doc, JSON.stringify(doc)];
}
else {
return [null, "Document does not exist."]
}
}
thanks for your help,

It is possible to add attachments to a document using an update function by modifying the document's _attachments property. Here's an example of an update function which will add an attachment to an existing document:
function (doc, req) {
// skipping the create document case for simplicity
if (!doc) {
return [null, "update only"];
}
// ensure that the required form parameters are present
if (!req.form || !req.form.name || !req.form.data) {
return [null, "missing required post fields"];
}
// if there isn't an _attachments property on the doc already, create one
if (!doc._attachments) {
doc._attachments = {};
}
// create the attachment using the form data POSTed by the client
doc._attachments[req.form.name] = {
content_type: req.form.content_type || 'application/octet-stream',
data: req.form.data
};
return [doc, "saved attachment"];
}
For each attachment, you need a name, a content type, and body data encoded as base64. The example function above requires that the client sends an HTTP POST in application/x-www-form-urlencoded format with at least two parameters: name and data (a content_type parameter will be used if provided):
name=logo.png&content_type=image/png&data=iVBORw0KGgoA...
To test the update function:
Find a small image and base64 encode it:
$ base64 logo.png | sed 's/+/%2b/g' > post.txt
The sed script encodes + characters so they don't get converted to spaces.
Edit post.txt and add name=logo.png&content_type=image/png&data= to the top of the document.
Create a new document in CouchDB using Futon.
Use curl to call the update function with the post.txt file as the body, substituting in the ID of the document you just created.
curl -X POST -d #post.txt http://127.0.0.1:5984/mydb/_design/myddoc/_update/upload/193ecff8618678f96d83770cea002910
This was tested on CouchDB 1.6.1 running on OSX.
Update: #janl was kind enough to provide some details on why this answer can lead to performance and scaling issues. Uploading attachments via an upload handler has two main problems:
The upload handlers are written in JavaScript, so the CouchDB server may have to fork() a couchjs process to handle the upload. Even if a couchjs process is already running, the server has to stream the entire HTTP request to the external process over stdin. For large attachments, the transfer of the request can take significant time and system resources. For each concurrent request to an update function like this, CouchDB will have to fork a new couchjs process. Since the process runtime will be rather long because of what is explained next, you can easily run out of RAM, CPU or the ability to handle more concurrent requests.
After the _attachments property is populated by the upload handler and streamed back to the CouchDB server (!), the server must parse the response JSON, decode the base64-encoded attachment body, and write the binary body to disk. The standard method of adding an attachment to a document -- PUT /db/docid/attachmentname -- streams the binary request body directly to disk and does not require the two processing steps.
The function above will work, but there are non-trivial issues to consider before using it in a highly-scalable system.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parse JSON file uploaded in S3 using Lambda - json

Related

How to get around previously declared json body-parser in Nodebb?

Cloud Functions for Firebase Could not handle the request after a successful request

Trigger a cloud build pipeline using Cloud Function

Stackdriver export to .txt or PDF on drive/mail

How to use update function to upload attachment in CouchDB

Categories

Resources