I have a script which adds a record to a ScriptDB datastore, then immediately queries all records and displays a count. 90% of the time, the displayed count is incremented as one would expect. However 10% of the time, the count is NOT incremented. Then on the next invocation the count is +2.
ScriptDb is still in 'experimental' mode, I guess you should raise an issue on the issue tracker if you have experienced this repeatedly.
Related
tl;dr: My custom-formula cells all go Loading... and they start filling in results, but they get stuck. A reload immediately provides the completed results. I don't want to reload every time I change a value.
In my sheet almost all cells are interconnected, so changing one cell triggers a recalc. Each row is a year, each column is an identical formula across all rows. Each cell can refer to earlier columns in its row or anything on the prior row, plus a few absolute locations containing the "inputs". Changing an input triggers a recalc of everything.
The column with my custom function will go Loading..., then one cell at a time it returns the value, almost one second per cell. OK, slow, but fine. But often it just stops completely partway down. Sometimes it starts up again, but often it never does.
But the cells WERE all recalculated. My custom function was called and returned values promptly, execution time of 0.125 secs or less usually. If I reload in the browser, I immediately get the fully recalculated sheet. It looks like some link is being severed between Sheets in my browser and Google's servers, so I stop seeing updates.
This is the first time I've ever used Apps Script -- or JavaScript for that matter -- but I have been programming in other ways for decades. If it matters, the custom function is purely mathematical, calling no services except Math, not even Spreadsheet, getting everything it needs in its arguments, using a few functions in the same file. It's a detailed taxes calculator.
I'm using a recent Chromebook.
In order to avoid to have to reload your spreadsheet it's very likely that you will have to follow the guidelines for custom function optimization:
Summary: Instead of using one formula to calculate a single value use your formula to calculate multiple values and return them as an array.
The above will reduce the number of formulas and will improve your spreadsheet performance, but bear in mind that custom functions have a maximum execution time of 30 secs.
Also reducing the size of the data ranges in your spreadsheets and the number of blank rows and columns at the bottom and right of your data ranges will help in improve the performance of your spreadheet.
Related
How to use a custom function with an ArrayFormula
Room For Optimization?
I'm having a hard time wrapping my head around the issue of an ELO-score-like calculation for a large amount of users on our platform.
For example. For every user in a large set of users, a complex formule, based on variable amounts of "things done", will result in a score for each user for a match-making-like principle.
For our situation, it's based on the amount of posts posted, connections accepted, messages sent, amount of sessions in a time period of one month, .. other things done etc.
I had two ideas to go about doing this:
Real-time: On every post, message, .. run the formula for that user
Once a week: Run the script to calculate everything for all users.
The concerns about these two I have:
Real-time: Would be an overkill of queries and calculations for each action a user performs. If let's say, 500 users are active, all of them are performing actions, the database would be having a hard time I think. There would them also run a script to re-calculate the score for inactive users (to lower their score)
Once a week: If we have for example 5.000 users (for our first phase), than that would result into running the calculation formula 5.000 times and could take a long time and will increase in time when more users join.
The calculation-queries for a single variable in a the entire formula of about 12 variables are mostly a simple 'COUNT FROM table', but a few are like counting "all connections of my connections" which takes a few joins.
I started with "logging" every action into a table for this purpose, just the counter values and increase/decrease them with every action and running the formula with these values (a record per week). This works but can't be applied for every variable (like the connections of connections).
Note: Our server-side is based on PHP with MySQL.
We're also running Redis, but I'm not sure if this could improve those bits and pieces.
We have the option to export/push data to other servers/databases if needed.
My main example is the app 'Tinder' which uses a sort-like algorithm for match making (maybe with less complex data variables because they're not using groups and communities that you can join)
I'm wondering if they run that real-time on every swipe, every setting change, .. or if they have like a script that runs continiously for a small batch of users each time.
Where it all comes down to. What would be the most efficient/non-database-table-locking way to do this, with keeping the idea in mind that there will be a moment that we're having 50.000 users for example?
The way I would handle this:
Implement the realtime algorithm.
Measure. Is it actually slow? Try optimizing
Still slow? Move the algorithm to a separate asynchronous process. Have the process run whenever there's an update. Really this is the same thing as 1, but it doesn't slow down PHP requests and if it gets busy, it can take more time to catch up.
Still slow? Now you might be able to optimize by batching several changes.
If you have 5000 users right now, make sure it runs well with 5000 users. You're not going to grow to 50.000 overnight, so adjust and invest in this as your problem changes. You might be surprised where your performance problems are.
Measuring is key though. If you really want to support 50K users right now, simulate and measure.
I suspect you should use the database as the "source of truth" aka "persistent storage".
Then fetch whatever is needed from the dataset when you update the ratings. Even lots of games by 5000 players should not take more than a few seconds to fetch and compute on.
Bottom line: Implement "realtime"; come back with table schema and SELECTs if you find that the table fetching is a significant fraction of the total time. Do the "math" in a programming language, not SQL.
I have a spreadsheet with a matrix set up to count how many times a student has had a lesson with a particular tutor.
The matrix works fine with this formula:
=ARRAYFORMULA(SUM(IF(TERM4!$B$6:$B$2398=B$1,IF(TERM4!$C$6:$C$2398=$A2,1,IF(TERM4!$D$6:$D$2398=$A2,1,FALSE()))),FALSE()))
however due to the number of students/tutors the matrix is 7000 cells, slowing the working sheet down considerably.
Is there a better way to do this. Can I run a google app script to count the matrix on a trigger (eg. once a week) to count the matrix, so the formulas are not slowing the sheet down.
I would also like the formula to return a blank rather than a 0 if the result is FALSE.
Thanks for your help!
Yes its possible to do it with gas. The only part that gets a little complex is that if your script takes over 5min it wont process all the rows. To avoid that, process it by chunks (say 100 at a time) and use scriptProperties to remember which spreadsheet and row you last processed. Each trigger will process as much as it can until all spreadsheets and rows are processed.
I have a script that generates about 20,000 small objects with about 8 simple properties. My desire was to toss these objects into ScriptDb for later processing of the data.
What I'm experiencing though is that even with a savebatch operation that the process takes much longer then desired and then silently stops. By too much time, it's often greater then the 5 min execution limit, though without throwing any error. The script runs so long that I've not attempted to check a mutation result to see what didn't make it, but from a check after exectution it appears that most do not.
So though I'm quite certain that my collection of objects is below the storage size limit, is there a lesser known limit or throttle on accesses that is causing me problems? Are the number of objects the culprit here, should I be instead attempting to save one big object that's a collection of the lessers?
I think it's the amount of data you're writing. I know you can store 20,000 small objects, you just can't write that much in 5 minutes. Write 1000 then quit. Write the next thousand, etc. Run your function 20 times and the data is loaded. If you need to do this more/automated, use ScriptApp.
I have a database table of items, lets call them games. each game has a release date
I run a script that selects a game at random, and updates various bits of information such as price etc from my source data. this script is on a cron to fire at regular intervals throughout the day
There are 20,000 odd game records and growing, so obviously some of these games being kept up to date are more important than others. This is mostly based on the release date, but could include data from other fields too.
Is there any way I can get my batch processing script to select a record based on this importance, without having to run through all results until each one has been updated and then start at the top?
So the frequency of updating the more important games would be higher than the less important ones?
As #Usman mentioned, you need to define a way of measuring importance that works properly. Then, my suggestion would be to have your script update two records each time it ran. You'd choose one of those records at random from among the "important" records, and the other one at random from among all records.
That way you would not reduce your probability of updating any given record, and at the same time you'd increase the probability of updating the important ones.
But, you know, even if you ran your random update script once a second, there's no guarantee you'd get to all 20,000 records daily. The fan of the game you don't update for a week might become annoyed that your data was stale. It might be better to update things on a fixed schedule, or when you get new data for them, rather than randomly.