How can i call google-bigquery delete and insert API's synchronously? - mysql

I am maintaining a database of transaction records which change data periodically.
i have a cron running every half an hour pulling latest transactions from main database and feeding to my express node app (i am pretty new to node), i am deleting old transactions which match with incoming transaction's order number first then insert the latest one into big query table.
after running the app for a day i am getting duplicate transactions in my database. even after checking logs i don't see delete api failing anywhere no idea how and where duplicates are coming from.
i am using #google-cloud/bigquery: ^2.0.2 , i am deleting and inserting data into bigquery tables using query api.
i have tried using streaming inserts but it won't allow me to delete the recently inserted rows until 90 minutes which won't work in my case.
My index.js
let orderNumbers = '';
rows.map(function (value) {
orderNumbers += "'" + value.Order_Number+ "',";
});
orderNumbers = orderNumbers.slice(0, -1);
await functions.deleteAllWhere('Order_Number', orderNumbers);
let chunkedRowsArray = _.chunk(rows, CONSTANTS.chunkSize);
let arrSize = chunkedRowsArray.length;
for (var i = 0; i < arrSize; i++) {
let insertString = '';
chunkedRowsArray[i].forEach(element => {
let values = '(';
Object.keys(element).forEach(function (key) {
if (typeof element[key] == 'string') {
values += '"' + element[key] + '",';
} else {
values += element[key] + ",";
}
});
values = values.slice(0, -1);
values += '),';
insertString += values;
});
insertString = insertString.slice(0, -1);
let rs = await functions.bulkInsert(insertString,i);
}
delete function call
await functions.deleteAllWhere('Order_Number', orderNumbers);
module.exports.deleteAllWhere = async (conditionKey, params) => {
const DELETEQUERY = `
DELETE FROM
\`${URI}\`
WHERE ${conditionKey}
IN
(${params})`;
const options = {
query: DELETEQUERY,
timeoutMs: 300000,
useLegacySql: false, // Use standard SQL syntax for queries.
};
// // Runs the query
return await bigquery.query(options);
};
similarly building insert query with values by chunk of 200 in insert function.
I need to write a synchronous node program which deletes some rows first and after successful deletion of rows insert the new ones.
I have no idea if this is caused by async nature of code or something is up with bigquery or the stored procedure is buggy from which i am getting the data.
Sorry for this long post i am new to node and stack overflow.
any help is appreciated.

Regarding BigQuery integration, you should arhitect your data flow in such way to let every new row in BigQuery table. Then have queries that return only newest row, which is easy to do if you have a field to order by most recent row.
You can schedule BigQuery queries that maintain a materialized table of this cleanup data. So in the end you end up having two tables one that you stream into all rows, one that is materialized to retain only the newest.

Related

Complex transpose exceeding run time in Google Apps Script

I am receiving data in a single column and must transpose that into individual records. Some records will be 12 characters long, others 10, and the remainder 9. Furthermore, the latter 2 values in the 10 and 9-character-long records must be shifted 1 and 2 fields to the right, respectively. The first value in a given record is always a date. I have created the following code which works well, except that it times out after about 6 minutes and 77 records. I need to be able to handle 15 times as many if not more.
I embedded the calculation of the date objects in the else section of each if statement and nested the subsequent if statements in an effort to reduce unnecessary calculations. This got me from about 48 records to 77.
Very grateful for any clever insight 🙂
function transposeNew(){
let ss = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
let lr = ss.getRange("A13").getDataRegion().getLastRow();
let sr = 13
// get the data column
let data = ss.getRange(sr,1,lr-sr,1).getValues();
// set up the rows loop
let pasteRow = 2;
let arrayField = 0;
while (arrayField < data.length){
//use the new Date() constructor to create a date object with the date value passed
let isDate12 = new Date(data[arrayField+12]).getFullYear(); //processed; size input;record should include 12 rows & 13th should be a date to begin the next row
if (isDate12 === 2020) {
let record = data.slice(arrayField, arrayField+12);
let recordTr = transposeSub(record);
ss.getRange(pasteRow, 5, 1, 12).setValues(recordTr);
arrayField = arrayField + 12;
}
else {
let isDate10 = new Date(data[arrayField+10]).getFullYear(); //unprocessed;size input
if (isDate10 === 2020) {
let record = data.slice(arrayField, arrayField+10);
let record1 = record.slice(0,8);
let record1Tr = transposeSub(record1);
let record2 = record.slice(8,10);
let record2Tr = transposeSub(record2);
ss.getRange(pasteRow, 5, 1, 8).setValues(record1Tr);
ss.getRange(pasteRow, 14, 1, 2).setValues(record2Tr);
arrayField = arrayField + 10;
}
else {
let isDate9 = new Date(data[arrayField+9]).getFullYear(); //unprocessed;no size
input
if (isDate9 === 2020) {
let record = data.slice(arrayField, arrayField+10);
let record1 = record.slice(0,7);
let record1Tr = transposeSub(record1);
let record2 = record.slice(7,9);
let record2Tr = transposeSub(record2);
ss.getRange(pasteRow, 5, 1, 7).setValues(record1Tr);
ss.getRange(pasteRow, 14, 1, 2).setValues(record2Tr);
arrayField = arrayField + 9;
}
}
}
pasteRow ++;
}
}
function transposeSub(a)
{
return Object.keys(a[0]).map(function (c) { return a.map(function (r) { return r[c]; }); });
}
I see you create a loop and inside this while statement you call several times to SpreadsheetApp functions. This creates a connection to the spreadsheet, reads/changes its data and close connection a lot of times, that's why your code is taking too long to run. Please check GAS Best Practices batch operations section.
You should consider banishing any get/setValue() inside while, instead, call getValues() to hold all values in a javascript array before while and then use setValues() after while to write all outputs at once. The described concept is explored in this answer.
So it turns out that the problem was a flaw in the loop criteria; rookie mistake. There was a data anomaly such that one record did not meet the criteria in any of the if statements and so the loop was continuing in perpetuity. I discovered this by inserting the values for pasteRow and arrayField next to each record on the sheet so that I could see where it was breaking. Interestingly, the records stopped, but the pasteRow and arrayField values continued into the 20,000s before the app quit.
I do note that the feedback provided by #Bruno Polo and #Cooper are correct. Shortly after posting this I reworked it to push the records into a new array and paste the array once complete. That failed due to the same reason noted above. I think I will go back to that version now that I understand the problem.
Thank you for looking at this with me. This is an extraordinary community of experts from whom I have learned so much! 😁

Google Sheets SQL Queries Timing Out [duplicate]

I am trying to fetch data from mySQL database on Google Cloud SQL using JDBC from Google Apps Script. However, I got this error:
Exception: Statement cancelled due to timeout or client request
I can fetch some other data successfully. However, some data I can't.
I execute one of the successful queries and one of the unsuccessful queries on mySQL workbench. I can execute the unsuccessful query with no problem on mySQL workbench.
I compared the durations.
Duration / Fetch
-------------------------------------------
Successful query: 0.140 sec / 0.016 sec
Unsuccessful query: 0.406 sec / 0.047 sec
The unsuccessful query seems to take longer. So, I set query timeout with:
stmt.setQueryTimeout(0);
intending to set no timeout (when the value is set to zero it means that the execution has no timeout limit). Then, I executed it on Google Apps Script.
However, it doesn't work and get the same error. Could you tell me a solution for this?
This seems to be a known issue. Star ★ and comment on the issue to get Google developers to prioritise the issue. Until the issue is fixed, you can switch back to rhino runtime.
Update to add 2nd fix
After some trial and error I figured out what solved this for me -- Some queries worked, others returned this error.
Fix 1
The common denominator was that it was the queries that had been converted to the multi-line format by the V8 engine / new editor that had this issue.
As an example, switching to the new editor / V8 converted long text strings to be similar to the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber "
+"as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
This query resulted in the error as described. The fix is changing longer queries to be continuous strings like the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
Fix 2
This one was a bit more frustrating. V8 is not as forgiving when it comes to connections to the database. Previous to V8 It would automatically close any connections that lingered, however, it looks like V8 does not like that. I had originally written my scripts to share as few connections as possible but I noticed there were some that I got this error on and it was the ones where a connection might be 'split.' For example:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
var results = sqltojson(query, true, conn);
The function above was causing this error, and I'm assuming it's because the 'conn' variable was being returned from the function but, I'm going to make a wild assumption, that the connection for whatever reason cannot be in two separate functions at once since it was being both returned and continued to be a value in the object 'keyobj' and also as 'conn'. Adding delete keyobj.conn immediately after defining the conn variable did the trick:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
delete keyobj.conn;
var results = sqltojson(query, true, conn);
Doing both of these fixes stopped this error and allowed the script to continue without problems.
The issue was the same as mentioned above and it failed on today so I tried to change the old version and it works for me. FYI
#TheMaster: Trying out connection to MySQL, same time out issue, even when I tried the example at https://developers.google.com/apps-script/guides/jdbc > Write 500 rows of data to a table in a single batch.
Even worse, when I reverted to the rhino interface, the result was the same. That crashes my whole development approach! :(
[Edit] FWIW, it seems to me that both Rhino and V8 don't like keeping a connection (or is it the statement?) open long enough for the above prepareStatement to complete.
So I tried inserting the 500 records as per above linked example, using a prepared SQL statement, which worked OK:
var conn = sqlGetConnection();
var start = new Date();
// conn.setAutoCommit(false);
// var stmt = conn.prepareStatement('INSERT INTO entries '
// + '(guestName, content) values (?, ?)');
// for (var i = 0; i < 500; i++) {
// stmt.setString(1, 'Name ' + i);
// stmt.setString(2, 'Hello, world ' + i);
// stmt.addBatch();
// }
// var batch = stmt.executeBatch();
// conn.commit();
var sql = 'INSERT INTO ' +
'entries (guestName, content) ' +
'values ';
for (var i = 0; i < 500; i++) {
var col1 = "'" + 'Name ' + i + "'"; // Note that the strings had to be ecapsulated
var col2 = "'" + 'Hello, world ' + i + "'"; // in quotes to work with this method
sql = sql + '(' + col1 + ', ' + col2 + '),';
}
sql = sql.substr(0,sql.length-1);
var stmt = conn.createStatement();
var response = stmt.executeUpdate(sql); // executeQuery is only for SELECT statements
conn.close();
var end = new Date();
Logger.log('Time elapsed: %sms, response: %s rows', end - start, response);
}
I got this issue when I submit the update statement for the same records in mysql table.
I set the breakpoint before update statement in my program, and I start the 2 process to run this program. so, the first process will update the mysql table correctly and the second process will get this exception later.
you need add the 'for update ' in you select statement. so the second process will got the zero not the exception when you update the record in the transaction.

jdbc sqlserver Exception: The query has timed out [duplicate]

I am trying to fetch data from mySQL database on Google Cloud SQL using JDBC from Google Apps Script. However, I got this error:
Exception: Statement cancelled due to timeout or client request
I can fetch some other data successfully. However, some data I can't.
I execute one of the successful queries and one of the unsuccessful queries on mySQL workbench. I can execute the unsuccessful query with no problem on mySQL workbench.
I compared the durations.
Duration / Fetch
-------------------------------------------
Successful query: 0.140 sec / 0.016 sec
Unsuccessful query: 0.406 sec / 0.047 sec
The unsuccessful query seems to take longer. So, I set query timeout with:
stmt.setQueryTimeout(0);
intending to set no timeout (when the value is set to zero it means that the execution has no timeout limit). Then, I executed it on Google Apps Script.
However, it doesn't work and get the same error. Could you tell me a solution for this?
This seems to be a known issue. Star ★ and comment on the issue to get Google developers to prioritise the issue. Until the issue is fixed, you can switch back to rhino runtime.
Update to add 2nd fix
After some trial and error I figured out what solved this for me -- Some queries worked, others returned this error.
Fix 1
The common denominator was that it was the queries that had been converted to the multi-line format by the V8 engine / new editor that had this issue.
As an example, switching to the new editor / V8 converted long text strings to be similar to the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber "
+"as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
This query resulted in the error as described. The fix is changing longer queries to be continuous strings like the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
Fix 2
This one was a bit more frustrating. V8 is not as forgiving when it comes to connections to the database. Previous to V8 It would automatically close any connections that lingered, however, it looks like V8 does not like that. I had originally written my scripts to share as few connections as possible but I noticed there were some that I got this error on and it was the ones where a connection might be 'split.' For example:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
var results = sqltojson(query, true, conn);
The function above was causing this error, and I'm assuming it's because the 'conn' variable was being returned from the function but, I'm going to make a wild assumption, that the connection for whatever reason cannot be in two separate functions at once since it was being both returned and continued to be a value in the object 'keyobj' and also as 'conn'. Adding delete keyobj.conn immediately after defining the conn variable did the trick:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
delete keyobj.conn;
var results = sqltojson(query, true, conn);
Doing both of these fixes stopped this error and allowed the script to continue without problems.
The issue was the same as mentioned above and it failed on today so I tried to change the old version and it works for me. FYI
#TheMaster: Trying out connection to MySQL, same time out issue, even when I tried the example at https://developers.google.com/apps-script/guides/jdbc > Write 500 rows of data to a table in a single batch.
Even worse, when I reverted to the rhino interface, the result was the same. That crashes my whole development approach! :(
[Edit] FWIW, it seems to me that both Rhino and V8 don't like keeping a connection (or is it the statement?) open long enough for the above prepareStatement to complete.
So I tried inserting the 500 records as per above linked example, using a prepared SQL statement, which worked OK:
var conn = sqlGetConnection();
var start = new Date();
// conn.setAutoCommit(false);
// var stmt = conn.prepareStatement('INSERT INTO entries '
// + '(guestName, content) values (?, ?)');
// for (var i = 0; i < 500; i++) {
// stmt.setString(1, 'Name ' + i);
// stmt.setString(2, 'Hello, world ' + i);
// stmt.addBatch();
// }
// var batch = stmt.executeBatch();
// conn.commit();
var sql = 'INSERT INTO ' +
'entries (guestName, content) ' +
'values ';
for (var i = 0; i < 500; i++) {
var col1 = "'" + 'Name ' + i + "'"; // Note that the strings had to be ecapsulated
var col2 = "'" + 'Hello, world ' + i + "'"; // in quotes to work with this method
sql = sql + '(' + col1 + ', ' + col2 + '),';
}
sql = sql.substr(0,sql.length-1);
var stmt = conn.createStatement();
var response = stmt.executeUpdate(sql); // executeQuery is only for SELECT statements
conn.close();
var end = new Date();
Logger.log('Time elapsed: %sms, response: %s rows', end - start, response);
}
I got this issue when I submit the update statement for the same records in mysql table.
I set the breakpoint before update statement in my program, and I start the 2 process to run this program. so, the first process will update the mysql table correctly and the second process will get this exception later.
you need add the 'for update ' in you select statement. so the second process will got the zero not the exception when you update the record in the transaction.

JDBC on Google Apps Script. Exception: Statement cancelled due to timeout or client request

I am trying to fetch data from mySQL database on Google Cloud SQL using JDBC from Google Apps Script. However, I got this error:
Exception: Statement cancelled due to timeout or client request
I can fetch some other data successfully. However, some data I can't.
I execute one of the successful queries and one of the unsuccessful queries on mySQL workbench. I can execute the unsuccessful query with no problem on mySQL workbench.
I compared the durations.
Duration / Fetch
-------------------------------------------
Successful query: 0.140 sec / 0.016 sec
Unsuccessful query: 0.406 sec / 0.047 sec
The unsuccessful query seems to take longer. So, I set query timeout with:
stmt.setQueryTimeout(0);
intending to set no timeout (when the value is set to zero it means that the execution has no timeout limit). Then, I executed it on Google Apps Script.
However, it doesn't work and get the same error. Could you tell me a solution for this?
This seems to be a known issue. Star ★ and comment on the issue to get Google developers to prioritise the issue. Until the issue is fixed, you can switch back to rhino runtime.
Update to add 2nd fix
After some trial and error I figured out what solved this for me -- Some queries worked, others returned this error.
Fix 1
The common denominator was that it was the queries that had been converted to the multi-line format by the V8 engine / new editor that had this issue.
As an example, switching to the new editor / V8 converted long text strings to be similar to the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber "
+"as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
This query resulted in the error as described. The fix is changing longer queries to be continuous strings like the following:
var query = "select Street_Number, street_name, street_suffix, street_dir_prefix, postal_code, city, mls, address, unitnumber as unit, uspsid,latitude,longitude from properties.forsale where status = 'active' and zpid is null and property_type = 'residential'"
Fix 2
This one was a bit more frustrating. V8 is not as forgiving when it comes to connections to the database. Previous to V8 It would automatically close any connections that lingered, however, it looks like V8 does not like that. I had originally written my scripts to share as few connections as possible but I noticed there were some that I got this error on and it was the ones where a connection might be 'split.' For example:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
var results = sqltojson(query, true, conn);
The function above was causing this error, and I'm assuming it's because the 'conn' variable was being returned from the function but, I'm going to make a wild assumption, that the connection for whatever reason cannot be in two separate functions at once since it was being both returned and continued to be a value in the object 'keyobj' and also as 'conn'. Adding delete keyobj.conn immediately after defining the conn variable did the trick:
var conn = getconnection() //this is a connection function I have written
var date = date || Utilities.formatDate(new Date(), "America/Chicago", "yyyy-MM-dd");
var keyobj = getkeys(undefined,undefined,conn);
conn = keyobj.conn;
delete keyobj.conn;
var results = sqltojson(query, true, conn);
Doing both of these fixes stopped this error and allowed the script to continue without problems.
The issue was the same as mentioned above and it failed on today so I tried to change the old version and it works for me. FYI
#TheMaster: Trying out connection to MySQL, same time out issue, even when I tried the example at https://developers.google.com/apps-script/guides/jdbc > Write 500 rows of data to a table in a single batch.
Even worse, when I reverted to the rhino interface, the result was the same. That crashes my whole development approach! :(
[Edit] FWIW, it seems to me that both Rhino and V8 don't like keeping a connection (or is it the statement?) open long enough for the above prepareStatement to complete.
So I tried inserting the 500 records as per above linked example, using a prepared SQL statement, which worked OK:
var conn = sqlGetConnection();
var start = new Date();
// conn.setAutoCommit(false);
// var stmt = conn.prepareStatement('INSERT INTO entries '
// + '(guestName, content) values (?, ?)');
// for (var i = 0; i < 500; i++) {
// stmt.setString(1, 'Name ' + i);
// stmt.setString(2, 'Hello, world ' + i);
// stmt.addBatch();
// }
// var batch = stmt.executeBatch();
// conn.commit();
var sql = 'INSERT INTO ' +
'entries (guestName, content) ' +
'values ';
for (var i = 0; i < 500; i++) {
var col1 = "'" + 'Name ' + i + "'"; // Note that the strings had to be ecapsulated
var col2 = "'" + 'Hello, world ' + i + "'"; // in quotes to work with this method
sql = sql + '(' + col1 + ', ' + col2 + '),';
}
sql = sql.substr(0,sql.length-1);
var stmt = conn.createStatement();
var response = stmt.executeUpdate(sql); // executeQuery is only for SELECT statements
conn.close();
var end = new Date();
Logger.log('Time elapsed: %sms, response: %s rows', end - start, response);
}
I got this issue when I submit the update statement for the same records in mysql table.
I set the breakpoint before update statement in my program, and I start the 2 process to run this program. so, the first process will update the mysql table correctly and the second process will get this exception later.
you need add the 'for update ' in you select statement. so the second process will got the zero not the exception when you update the record in the transaction.

Inserting Null values to Mysql database from Sheets using Google Apps Script

I'm currently working on a data ingestion add-on using Google apps script. The main idea is that the users of an application can insert data from sheets to a database. To do so, i'm using the JDBC api that apps script provides
The problem i'm currently having is that when I read a cell from the sheet that is empty apps script uses the type undefined, therefore producing an error a the moment of insertion. How could I do such thing?
My current insert function:
function putData(row, tableName) {
var connectionName = '****';
var user = '****';
var userPwd = '*****';
var db = '******';
var dbUrl = 'jdbc:google:mysql://' + connectionName + '/' + db;
var conn = Jdbc.getCloudSqlConnection(dbUrl, user, userPwd);
var stmt = conn.createStatement();
var data = row
var query = "INSERT INTO "+ db + '.' + tableName +" VALUES (" ;
var i = 0
//The following loop is just to build the query from the rows taken from the sheet
// if the value is a String I add quotation marks
for each (item in row){
if ((typeof item) == 'string'){
if (i == row.length-1){
query += "'" + item + "'";
} else {
query += "'" + item + "',";
}
}else {
if (i == row.length-1){
query += item;
} else {
query += item + ",";
}
}
i++
}
query += ")"
results = stmt.executeUpdate(query)
stmt.close();
conn.close();
}
When I try to insert the word "NULL" in some cases in thinks it is a string and brings out an error on other fields.
When trying to get the data from the Spreadsheet, more precisely from a cell, the value will be automatically parsed to one of these types: Number, Boolean, Date or String.
According to the Google getValues() documentation:
The values may be of type Number, Boolean, Date, or String, depending on the value of the cell. Empty cells are represented by an empty string in the array.
So essentially, the undefined type may be an issue present in the way you pass the row parameter (for example, trying to access cells which are out of bounds).
If you want to solve your issue, you should add an if statement right after the for each (item in row) { line:
if (typeof item == 'undefined')
item = null;
The if statement checks if the row content is of type undefined and if so, it automatically parses it to null. In this way, the content will be of type null and you should be able to insert it into the database.
The recommended way to do what you are doing actually is by using the JDBC Prepared Statements, which are basically precompiled SQL statements, making it easier for you to insert the necessary data. More exactly, you wouldn't have to manually prepare data for the insertion, like you did in the code you provided above. They are also the safer way, making your data less prone to various attacks.
Also, the for each...in statement is a deprecated one and you should consider using something else instead such as the for loop or the while loop.
Furthermore, I suggest you take a look at these links, since they might be of help:
Class JdbcPreparedStatement;
Class Range Apps Script - getValues().