Google cloud functions 2gen await when stream | file parser - google-cloud-functions

Based on google docs
text
came up with this snippet but the i'm missing the part where i need to block the function during the heavy processing of stream during a file parser..
functions.cloudEvent('fileParser', async(cloudEvent) => {
const file = cloudEvent.data;
const bucketObject = storage.bucket( file.bucket );
const fileHandler = bucketObject.file( file.name );
let chunks = [];
await fileHandler
.createReadStream()
.pipe(parse({ delimiter: ",", from_line: 2 }))
.on("data", function (row) {
chunks.push({ obj: row });
if (chunks.length == MAX_CHUNK) {
//do something with row
chunks = [];
}
})
.on("end", function () {
//do something with data
console.log("FINISHED DESERIALIZATION");
})
.on("error", function (err) {
console.error(err.message);
})
console.log("FINISHED PROCESSING");
});
The basic idea is to process a huge csv file located in gcp bucket and then do something with it.
For obvious reasons, the function should wait for all process but that's it's not happening..
I'm getting as output "FINISHED PROCESSING" and only after seconds "FINISHED DESERIALIZATION" which is not what i expected since the processing it's aborted after few seconds due to timeout or sigkill from gcf..
Ideas?

Related

Puppeteer Google Cloud Function Pub/Sub Trigger can't open browser

I'm trying to create a Puppeteer function in GCP which can be triggered by Pub/Sub messages. The function is callable, but doesn't behave as expected and throws a Timeout Error once browser tries to initialize. Could the trigger possibly be using a NodeJS environment different from HTTP trigger?
I'm also very new to NodeJS, so I apologize ahead of time if the issue is blatantly obvious.
I've created an HTTP trigger for the function which behaves as expected. I copy/paste the Puppeteer Function below into the index.js when creating the Cloud Function, but separated in example for clarity that both triggers are running the identical function.
Puppeteer Function
const puppeteer = require('puppeteer');
scrapeUglyWebsite = () => {
return new Promise(async(resolve, reject) => {
await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
})
.then(async (browser) => {
const page = await browser.newPage();
await page.goto('http://suzannecollinsbooks.com/', {waitUntil: 'load', timeout: 0})
.then(async () => {
//Wait for content to load
await page.waitForFunction('document.body !== null && document.body.innerText.includes(\'Jon Scieszka\')');
//Evaluate page contents
const dom_eval = await page.evaluate(() => document.body.innerText.includes("Here’s a picture of me with a rat"));
await browser.close();
resolve(dom_eval);
});
}).catch((err) => {
reject(err);
});
});
};
HTTP Trigger - index.js
exports.cloudFunctionTest = (req, res) => {
scrapeUglyWebsite()
.then((results) => {
if(results) {
res.send('Suzanne Collins takes pictures with rats.');
} else {
res.send("Suzzane Collins doesn't take pictures with rats.");
};
})
.catch((err) => {
res.send(err.toString());
});
Pub/Sub Trgger - index.js
exports.cloudFunctionTest = (data, context) => {
scrapeUglyWebsite()
.then((results) => {
if(results) {
console.log('Suzanne Collins takes pictures with rats.');
} else {
console.log("Suzzane Collins doesn't take pictures with rats.");
};
})
.catch((err) => {
console.log(err.toString());
});
};
package.json
{
"name": "test",
"version": "0.0.1",
"engines": {
"node": "8"
},
"dependencies": {
"puppeteer": "^1.6.0"
}
}
HTTP Trigger behaves correctly with the expected result
Suzanne Collins takes pictures with rats.
Pub/Sub Trigger throws the following error with no output
TimeoutError: Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r662092
I know this is late but the reason that the TimeoutError occurs is because cloud functions do not automatically wait for async tasks to finish completing. So in exports.cloudFunctionTest, scrapeUglyWebsite() is called but the function does not wait for the promise to be fulfilled, so the program terminates. Hence the error
More info here on how background functions work in NodeJs
In order for the function to wait for scrapeUglyWebsite(), you need to return a promise that completes when scrapeUglyWebsite() and the resulting code is complete.
Personally, I got it to work by simply wrapping the code currently in the function I am exporting in another async function and then returning the promise of the wrapper function.
async function wrapper() {
try {
const result = await scrapeUglyWebsite();
if(results) {
console.log('Suzanne Collins takes pictures with rats.');
} else {
console.log("Suzzane Collins doesn't take pictures with rats.");
};
} catch (err) {
console.log(err.toString());
}
}
Then in the function you want to export:
exports.cloudFunctionTest = (data, context) => {
return wrapper();
};

Firebase functions RangeError: Maximum call stack size exceeded

I have a Callable function that uploads an image and update Firestore and Storage accordingly. The function does what it should do. but I still get this error:
Unhandled error RangeError: Maximum call stack size exceeded
here is the function:
export const uploadImageToStripe = functions.https.onCall(async (data, context) => {
let businessDoc: DocumentSnapshot
try {
if (!fireStoreDB) {
fireStoreDB = admin.firestore();
fireStoreDB.settings(settings);
}
businessDoc = await fireStoreDB.collection('businesses').doc(data.business_id).get()
const bucketName = functions.config().storage.default_bucket;
const tempLocalFile = path.join(os.tmpdir(), 'img.jpg').trim();
const tempLocalDir = path.dirname(tempLocalFile);
const bucket = admin.storage().bucket(bucketName);
// Create the temp directory where the storage file will be downloaded.
await mkdirp(tempLocalDir);
console.log('Temporary directory has been created', tempLocalDir);
// Download file from bucket.
await bucket.file(data.photo_location).download({ destination: tempLocalFile });
console.log('The file has been downloaded to', tempLocalFile);
// Downloads the file
console.log(`gs://${bucketName}/${data.photo_location} downloaded to ${tempLocalDir}.`)
const uploadedFile: stripeM.fileUploads.IFileUpdate = await stripe.fileUploads.create({
file: {
data: fs.readFileSync(tempLocalFile),
name: 'img.jpg',
type: 'application.octet-stream',
}
});
if (!businessDoc.exists) {
throw new functions.https.HttpsError('not-found', `Couldn't find business document ` + data.business_id);
}
await stripe.accounts.update(businessDoc.data().stripeId,
{ document: uploadedFile.id });
await businessDoc.ref.update({ "photoNeeded": false })
return await bucket.file(data.photo_location).delete()
} catch (error) {
console.error(error);
await businessDoc.ref.update({ "photoNeeded": true })
throw new functions.https.HttpsError('unavailable', `failed to upload photo to stripe`);
}
})
Any ideas why I get this error?
This line throw the error:
return await bucket.file(data.photo_location).delete()
splitting it to:
await bucket.file(data.photo_location).delete()
return "Success"
solve it.

MySQL nodejs crash upon selecting data from big table

I'm attempting to convert data from one database to the other but I'm getting some issues trying to get data from a big table, save it to an object and insert into another database. This is my code:
let sql;
let resultsToFetch = true;
while (resultsToFetch) {
sql = `SELECT X FROM Y LIMIT ${index}, 1000`;
DB1.query(sql, (err, result) => {
if (err) {
resultsToFetch = false;
throw err;
} else if (result.length == 0) {
resultsToFetch = false;
} else {
result.forEach(res => {
const obj = {
id: res.id,
name: res.name
};
sql = "INSERT INTO X SET ?";
DB2.query(sql, obj, (err, result) => {
if (err) throw err;
});
});
}
});
index += 1000;
}
I'm trying to use LIMIT so I'm not selecting all 6 million entries right away but I still get a Javascript heap out of memory error. I think I misunderstood something related to Node.js, but I'm not quite sure what it is. This is the error:
<--- Last few GCs --->
[11256:000002A5D2CBB600] 22031 ms: Mark-sweep 1418.5 (1482.0) -> 1418.5 (1451.5) MB, 918.3 / 0.0 ms last resort GC in old space requested
[11256:000002A5D2CBB600] 22947 ms: Mark-sweep 1418.5 (1451.5) -> 1418.5 (1451.5) MB, 915.2 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000000B356525529 <JSObject>
1: /* anonymous */ [\index.js:~1] [pc=00000042DA416732](this=000000C326B04AD1 <Object map = 0000027D35B023B9>,exports=000000C326B04AD1 <Object map = 0000027D35B023B9>,require=000000C326B04A89 <JSFunction require (sfi = 00000229888651E9)>,module=000000C326B04A39 <Module map = 0000027D35B44F69>,__filename=000002298886B769 <String[52]\
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::DecodeWrite
2: node_module_register
3: v8::internal::FatalProcessOutOfMemory
4: v8::internal::FatalProcessOutOfMemory
5: v8::internal::Factory::NewUninitializedFixedArray
6: v8::internal::WasmDebugInfo::SetupForTesting
7: v8::internal::interpreter::BytecodeArrayRandomIterator::UpdateOffsetFromIndex
8: 00000042DA2843C1
Edit: #Grégory NEUT
let query = DB1.query("SELECT * FROM X");
let index = 0;
query
.on("error", function(err) {
// Handle error, an 'end' event will be emitted after this as well
})
.on("fields", function(fields) {
// the field packets for the rows to follow
})
.on("result", function(row) {
// Pausing the connnection is useful if your processing involves I/O
DB1.pause();
const obj = {
id: row.id,
};
console.log(obj);
const sql = `INSERT INTO X SET ?`;
DB2.query(sql, obj, (err, result) => {
if (err) {
throw err;
}
DB1.resume();
});
console.log(index);
index++;
})
.on("end", function() {
// all rows have been received
});
I don't know how mysql driver is done in node.js but maybe it load everything and then limit the data. Or maybe 1000 entry are too much.
Anyway the soluce is to use streams
var query = connection.query('SELECT * FROM posts');
query
.on('error', function(err) {
// Handle error, an 'end' event will be emitted after this as well
})
.on('fields', function(fields) {
// the field packets for the rows to follow
})
.on('result', function(row) {
// Pausing the connnection is useful if your processing involves I/O
connection.pause();
processRow(row, function() {
connection.resume();
});
})
.on('end', function() {
// all rows have been received
});
So it will load in memory only the processed data at a time. Using it you will be sure that whatever the amount of data you have, you won't hit allocation failure.

Nodejs, Cloud Firestore Upload Tasks - Auth error:Error: socket hang up

I'm coding a function that runs API calls and requests JSON from a huge database in sequence via offsets. The JSON response is parsed and then the subsequent data within is uploaded to our Cloud Firestore server.
Nodejs (Node 6.11.3) & Latest Firebase Admin SDK
The information is parsed as expected, and prints to the console perfectly. When the data attempts to upload to our Firestore database however, the console is spammed with the error message:
Auth error:Error: socket hang up
(node:846) UnhandledPromiseRejectionWarning: Unhandled promise rejection
(rejection id: -Number-): Error: Getting metadata from plugin failed with
error: socket hang up
and occasionally:
Auth error:Error: read ECONNRESET
The forEach function collects the items from the downloaded JSON and processes the data before uploading to the Firestore database. Each JSON has up to 1000 items of data (1000 documents worth) to pass through the forEach function. I understand that this might be a problem if the function repeats before the upload set finishes?
I'm a coding newbie and understand that the control flow of this function isn't the best. However, I can't find any information on the error that the console prints. I can find plenty of information on socket hang ups, but none on the Auth error section.
I'm using a generated service account JSON as a credential to access our database, which uses the firebase-adminsdk account. Our read/write rules for the database are currently open to allow any access (as we're in development with no real users).
Here's my function:
Firebase initialisation & offset zero-ing
const admin = require('firebase-admin');
var serviceAccount = require("JSON");
admin.initializeApp({
credential: admin.credential.cert(serviceAccount),
databaseURL: "URL"
});
var db = admin.firestore();
var offset = 0;
var failed = false;
Running the function & setting HTTP Headers
var runFunction = function runFunction() {
var https = require('https');
var options = {
host: 'website.com',
path: (path including an offset and 1000 row specifier),
method: 'GET',
json: true,
headers: {
'content-type': 'application/json',
'Authorization': 'Basic ' + new Buffer('username' + ':' + 'password').toString('base64')
}
};
Running the HTTP Request & Re-running the function if we haven't reached the end of the response from the API
if (failed === false) {
var req = https.request(options, function (res) {
var body = '';
res.setEncoding('utf8');
res.on('data', function (chunk) {
body += chunk;
});
res.on('end', () => {
console.log('Successfully processed HTTPS response');
body = JSON.parse(body);
if (body.hasOwnProperty('errors')) {
console.log('Body ->' + body)
console.log('API Call failed due to server error')
console.log('Function failed at ' + offset)
req.end();
return
} else {
if (body.hasOwnProperty('result')) {
let result = body.result;
if (Object.keys(result).length === 0) {
console.log('Function has completed');
failed = true;
return;
} else {
result.forEach(function (item) {
var docRef = db.collection('collection').doc(name);
console.log(name);
var upload = docRef.set({
thing: data,
thing2: data,
})
});
console.log('Finished offset ' + offset)
offset = offset + 1000;
failed = false;
}
if (failed === false) {
console.log('Function will repeat with new offset');
console.log('offset = ' + offset);
req.end();
runFunction();
} else {
console.log('Function will terminate');
}
}
}
});
});
req.on('error', (err) => {
console.log('Error -> ' + err)
console.log('Function failed at ' + offset)
console.log('Repeat from the given offset value or diagnose further')
req.end();
});
req.end();
} else {
req.end();
}
};
runFunction();
Any help would be greatly appreciated!
UPDATE
I've just tried changing the rows of JSON that I pull at a time and subsequently upload at a time using the function - from 1000 down to 100. The socket hang up errors are less frequent so it is definitely due to overloading the database.
Ideally it would be perfect if each forEach array iteration waited for the previous iteration to complete before commencing.
UPDATE #2
I've installed the async module and I'm currently using the async.eachSeries function to perform one document upload at a time. All errors mid-upload disappear - however the function will take an insane amount of time to finish (roughly 9 hours for 158,000 documents). My updated loop code is this, with a counter implemented:
async.eachSeries(result, function (item, callback) {
// result.forEach(function (item) {
var docRef = db.collection('collection').doc(name);
console.log(name);
var upload = docRef.set({
thing: data,
thing2: data,
}, { merge: true }).then(ref => {
counter = counter + 1
if (counter == result.length) {
console.log('Finished offset ' + offset)
offset = offset + 1000;
console.log('Function will repeat with new offset')
console.log('offset = ' + offset);
failed = false;
counter = 0
req.end();
runFunction();
}
callback()
});
});
Also, after a period of time the database returns this error:
(node:16168) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: -Number-): Error: The datastore operation timed out, or the data was temporarily unavailable.
It seems as if now my function is taking too long... instead of not long enough. Does anyone have any advice on how to make this run faster without stated errors?
The write requests as part of this loop were simply exceeding Firestore's quota - thus the server was rejecting the majority of them.
To solve this issue I converted my requests to upload in chunks of 50 or so items at a time, with Promises confirming when to move onto the next chunk upload.
The answer was posted here -> Iterate through an array in blocks of 50 items at a time in node.js, and the template for my working code is as below:
async function uploadData(dataArray) {
try {
const chunks = chunkArray(dataArray, 50);
for (const [index, chunk] of chunks.entries()) {
console.log(` --- Uploading ${index + 1} chunk started ---`);
await uploadDataChunk(chunk);
console.log(`---Uploading ${index + 1} chunk finished ---`);
}
} catch (error) {
console.log(error)
// Catch en error here
}
}
function uploadDataChunk(chunk) {
return Promise.all(
chunk.map((item) => new Promise((resolve, reject) => {
setTimeout(
() => {
console.log(`Chunk item ${item} uploaded`);
resolve();
},
Math.floor(Math.random() * 500)
);
}))
);
}
function chunkArray(array, chunkSize) {
return Array.from(
{ length: Math.ceil(array.length / chunkSize) },
(_, index) => array.slice(index * chunkSize, (index + 1) * chunkSize)
);
}
Pass the data array through to uploadData - using uploadData(data); and post your upload code for each item into uploadDataChunk inside the setTimeout block (before the resolve() line) within the chunk.map function.
I got around this by chaining the promises in the loop with a wait of 50 milliseconds in between each.
function Wait() {
return new Promise(r => setTimeout(r, 50))
}
function writeDataToFirestoreParentPhones(data) {
let chain = Promise.resolve();
for (let i = 0; i < data.length; ++i) {
var docRef = db.collection('parent_phones').doc(data[i].kp_ID_for_Realm);
chain = chain.then(()=> {
var setAda = docRef.set({
parent_id: data[i].kf_ParentID,
contact_number: data[i].contact_number,
contact_type: data[i].contact_type
}).then(ref => {
console.log(i + ' - Added parent_phones with ID: ', data[i].kp_ID_for_Realm);
}).catch(function(error) {
console.error("Error writing document: ", error);
});
})
.then(Wait)
}
}
For me this turned out to be a network issue.
Uploading 180,000 documents in batches of 10,000 was no trouble for me before and today having used a public, slower wifi connection, I received that error.
Switching back to my 4G mobile connection sorted the problem for me. Not sure whether it's a speed issue - could have been a security issue - but I'll go with that assumption.

GraphQL: fulfill query from JSON file source

I've just started messing about with GraphQL, and I'd like a resolver that uses a JSON file on disk as the data source. What I've got so far causes GraphQL to return null.
How do I do this and why doesn't the approach below work?
var schema = buildSchema(`
type Experiment {
id: String
trainData: String
goldData: String
gitCommit: String
employee: String
datetime: String
}
type Query {
# Metadata for an individual experiment
experiment: Experiment
}
schema {
query: Query
}`);
var root = {
experiment: () => {
fs.readFile('./data/experimentExample.json', 'utf8', function(err, data) {
if (err) throw err;
console.log(data);
return JSON.parse(data);
});
}
};
const app = express();
app.use('/graphql', graphqlHTTP({
rootValue: root,
schema: schema,
graphiql: true
}));
app.listen(4000);
console.log('Running a GraphQL API server at localhost:4000/graphql');
The callback function you're passing to readFile runs asynchronously, which means returning a value from it doesn't do anything -- the function the readFile call is inside is done executing and has returned a value (null) by the time your callback is done.
As a rule of thumb, when dealing with GraphQL, you should stay away from callbacks -- your resolvers should always return a value or a Promise that will eventually resolve to a value.
Luckily, fs has an asynchronous method for reading files, so you can just do:
const root = {
experiment: () => {
const file = fs.readFileSync('./data/experimentExample.json', 'utf8')
return JSON.parse(file)
}
};
// or even cleaner:
const root = {
experiment: () => JSON.parse(fs.readFileSync('./data/experimentExample.json', 'utf8'))
};
As an additional example, here's how you would do that with a Promise:
// using Node 8's new promisify for our example
const readFileAsync = require('util').promisify(fs.readFile)
const root = {
experiment: () => readFileAsync('./data/experimentExample.json', {encoding: 'utf8'})
.then(data => JSON.parse(data))
};
// Or with async/await:
const root = {
experiment: async () => JSON.parse(await readFileAsync('./data/experimentExample.json', {encoding: 'utf8'}))
};
Of course there's no need to promisify readFile since you already have an async method available, but this gives you an idea of how to work with Promises, which GraphQL is happy to work with.