I'm writing a script for an insurance agency that will work like this:
When an employee makes a sale, they enter information about each policy they sold (name, policy number, etc.) in their own Google Spreadsheet, one policy per row.
Script takes spreadsheets from every employee as input and writes them to a table in database.
When policies get paid, they get written to a separate table.
I use an OUTER JOIN to see which employee sold each paid policy.
The issue I'm having is that in step 2, I don't want to write policies to the database that have already been written (i.e. because they were there last time I ran the script). I can think of a few ways to solve this...
Clear the table every time I run the script, so it's being written fresh every time.
Loop through and check if a given policy is already in the database before writing it to DB.
Add a boolean column called "copied to DB", when adding rows to DB check if "copied to DB" is equal to "Yes": if true, don't write the row to DB; if false, change "copied to DB" to yes and write the row to the DB.
I think any of the above methods would work, but they all seem pretty inefficient. Is there anything in SQL or Google Apps Script that would do this more efficiently and minimize database writes?
Currently, the way I'm doing step 2 is I'm copying all the employee sheets to a single "master sheet" that contains all the employees policies, and every time I run the script I clear the master spreadsheet and then copy all the data in, so there are no duplicate rows. This is basically equivalent to method #1 above, but again, it seems like there should be a better way to do this than clearing the spreadsheet every time. (And I'd rather use a database table than writing all the data to a spreadsheet.)
Thank you!
Its easy if the spreadsheet rows are 1) always appended, 2) never changed or deleted.
Loop for each spreadsheet:
Remember in a script property the last row written to the db (one property per spreadsheet, base the script property name on the spreadsheet id).
Start from the row after that last one, write to db and afterwards write the property.
Make your db row primary key something like 'spreadsheet id + row number', and use 'insert or ignore' or equivalent. This is a must for integrity as a script could fail after a db write but before writing the script property for that spreadsheet.
I have a popular chrome extension that does this well for thousands of users.
Related
I want to be able to upload new records into a BigQuery table from Google Sheets. I know how to attach them through Connected Sheets, and have the table mirror the Sheet, however I don't want it to mirror the sheet exactly, I want to be able to upload the sheet into the BigQuery table, and then clear the sheet without it deleting the table. Is there a way to do that through Connected Sheets, or do I need a script to do that.
Thanks.
Everything I find on the topic only shows how to make a table from scratch in BigQuery, and have it mirroring a Google Sheet. This doesn't fit my application because my sheet will get too big if I keep all the records around , too quickly. I want to be able upload the records on a sheet into a database, then clear the sheet without it affecting the database.
If you want to keep everything in BigQuery, and you want to work with exactly one Sheet, I think a saved query using the BigQuery DML (data manipulation language) would be the way to go.
Let's imagine that you have a Google Sheet that you've added to BigQuery as an external table called my_gsheet in a dataset called external. For the sake of argument let's say it has three columns, id, name, and created_at.
We are going to use an insert select statement to append this to a table called all_records in a dataset called big_data, like this:
INSERT
big_data.all_records
(id, name, created_at)
SELECT
id, name, created_at
FROM
external.my_gsheet
Now you can save this query and run it whenever you want to bring the latest spreadsheet-based data into your warehouse table.
The table big_data.all_records should be created by you in advance; if it's going to be very large you should probably think about making it a partitioned table, and if it's going to be queried a lot you may want to cluster certain columns too.
You could even schedule the query to be run regularly, although beware: if you run the query above multiple times, duplicate rows will be created. If you have a unique identifier (e.g. the id column for my record), this can be avoided with a MERGE:
MERGE INTO
big_data.all_records AS target
USING
external.my_gsheet AS sheet
ON
target.id = sheet.id
WHEN
NOT MATCHED
THEN
INSERT (id, name, created_at)
VALUES(id, name, created_at)
This query, unlike the previous one, will only insert rows if they have an id that doesn't already exist in the table. If the id does exist, it will do nothing (you could add a WHEN MATCHED THEN UPDATE clause if you want the ability to overwrite instead).
At present this is a hypothetical question and I haven't begun writing the code for this, mainly because I'm not sure if it is possible to achieve what I need to in Google Forms.
We have a employee contact form that is completed when they join the organisation, obviously people go through changes in life and either get a new mobile or change address, when this happens they change their contact details via a Google form, this adds a new row of data to the responses spreadsheet meaning that we have a duplicate entry for the same employee.
What I want to be able to do, is have the sheet look for duplicate data and overwrite the fields that have changed. I am fairly confident that this is possible, however, I am struggling with the logic.
If it is possible? Could you provide a hint as to how this might work.
It is possible to look for duplicates in a spreadsheet and delete them, however if you do it in the destination spreadsheet of a Google Form - the deleted duplicates will come back at every update.
So you need a workaround:
Create a new spreadsheet (slave) that will be synchronized with the destination spreadsheet (master) on every form submit
The form data from each form submit will be checked for duplicates before being written into the slave
You need a unique identifier, e.g. employee number
You can check either an entry with this identifier already exists in the slave, e.g. with indexOf()
If the identifier does not exist yet - append the new data to the last row of the slave
If the identifier exists already - find the row containing it and overwrite it with new data
Here is my issue. I have a spreadsheet with multiple sheets, and each sheet has about 300-500 rows. I am using ScriptDb to store the data for each sheet.
What I am currently doing is calling a custom function in 300-500 cells in each sheet to populate certain cells with data, and what happens is that some will populate and the rest will error out saying i've queried the database too many times in a short period. Obviously having to query the database for each cell isn't the best solution.
How would I go about querying all the data for the current sheet and then having that data available to grab for each cell. What I've read is that you can't really have "global" variables in GAS, but have to use things such as CacheService or ScriptDB, which is what i'm trying to do. I'm just querying it too much.
Is there some way to populate all the cells from 1 function call instead of 1 call for each cell? What am I missing or what other solutions are there?
Just realized a similar question was asked earlier today: Google Spreadsheet Script invoked too many times per second for this Google user account
Yes its possible. Simply return an array from your function. It will work like an arrayformula.
Of course your cells would need to be contiguous.
At the moment I have a worksheet with Prior Students Names and ID#. There are at the moment about 3000 records.
In a second worksheet I have new names listed (which are entered via a form).
At the moment I am using the following to check if there is a prior record:
=IFERROR(INDEX(Students!A:A,MATCH(K2, Students!B:B; 0)), "NEW STUDENT")
But the spreadsheet is getting quite slow.
Would a google script be better? And if so how?
Thanks
You can try a different formula. For example, in this example spreadsheet I have added 10,000 rows of fake data, in Sheet1. I have joined the first name column with the second name column in a cell, in Sheet2!A1, as below:
=arrayformula(join(";", Sheet1!B:B & " " & Sheet1!C:C))
Then I can perform a text search for a new name into this long string. Eg the following:
=find("Peter Piper"&";", Sheet2!A1)
It is fast.
"Better"? That's just too subjective. Let's only talk performance.
You should be able to write a script that will perform these specific look-ups much faster than your current spreadsheets. The dominant factor is the number of calculations being performed;
Spreadsheet formulas are recalculated every time there is a change in the sheet. So your "new student" checks on students that you already have in the second worksheet are wasting time.
A form submission trigger function would ideally do the lookup in response to a form, and only for the new student data.
A script could use ScriptDB to store all the Prior student names & ids, thereby eliminating the need to read the Prior student spreadsheet over and over, and providing fast lookups.
...
var db = ScriptDb.getMyDb();
var result = db.query({id: newId});
if (!result.hasNext()) {
event.range.offset(0,statusColumn).setValue("NEW STUDENT");
}
Counter-points do exist, here's just a couple.
You need to write javascript code, rather than spreadsheet functions. It's a different skill set.
If you're using ScriptDb for performance, you need to include some maintenance capability; at the very least, you need to load the database the first time.
May be this will work
=dget(Students!A:B,Students!$A$1,{Students!$B$1;$K$2})
given k2 is the search text.
if you get #na!, you get new student. if #num! then you have multiple match. dget is a database function. Thus it can handle large data quite well. however it is not as flexible but efficient function.
I have a two-part question about master/child relationships in workbooks. I have beginner experience with writing code for excel & google spreadsheets so any extra detail would be truly appreciated.
Here is what I'm trying to achieve:
I want to make a google form to collect a set of data for (potentially 100's of people). The option to make changes to the form after submission will be enabled, so the data flow will be pretty dynamic. I've gotten as far as setting this up and creating the master spreadsheet where I can view all of the responses. But there's too much information in one spreadsheet and I'd like to make some child-workbooks to simplify the viewable data for various needs. So here are my questions:
1) How would I write the script to create a child worksheet from the master worksheet with these conditions: on run create a new worksheet called i.e "Child 1-Basic Info", delete all the columns and shift left with the exception of the ones I explicitly want to keep (based on the cell value) i.e "Name", "Age" & "Interests". Bear in mind I would want to eventually create multiple children workbooks, but basically do the same job each time. Just different column parameters i.e "Child 2-Education Info".
2) Along with this, I want to make sure that these children will be automatically updated every time someone submits a new response from my form or updates one they have already submitted. Essentially, the goal is to have any changes in the master ripple into all of the children. Also keep in mind that every time someone submits a new form, the row numbers will change. So the children will need to also recognize this change and update accordingly.
Thank you all in advanced!
With the QUERY() function, you can have secondary sheets that will dynamically update, with no need to use scripts at all. See more here.
Here's an example, a spreadsheet with rows of form-submitted data:
On a secondary sheet in the same spreadsheet, cell A1 contains a query formula that selects only the columns you asked for, "Name, Age, and Interests".
Every new form submission or update will result in recalculation of the query, so it will be kept up-to-date with no further intervention.