I'm developing a web application that will use a google spreadsheet as a database.
This will mean parsing up to 30.000 (guestimated size) rows in regular operations for searching ID's etc...
I'm worried about the response times i will be looking at. Does anybody have experience with this? I don't want to waste my time on something that will hit a deadend at an issue like that.
Thanks in advance
Using spreadsheets as a database for this data set is probably not a good idea. Do you already have this spreadsheet set up?
30K rows will allow you to have only 66 columns, is that enough for you? Check the Google Docs size limits help page for more info.
Anyway, Google Spreadsheets have a "live concurrent editing" nature to it that makes it a much slower database than any other option. You should probably consider something else.
Do you intend to use the spreadsheet to display data or only as a storage place.?
In this second option the relative slowness of the spreadsheet will not be an issue since you'll only have to read its data once to get its data in an array and play with that...
This implies of course that you build every aspect of data reading and writing in a dedicated UI and never show the spreadsheet itself , the speed will be depending only on the JavaScript engine on arrays, the speed of UI and the speed of your internet connection... all 3 factors being not very performing if compared to a 'normal' application but with the advantage of being easily shareable and available anywhere.:-)
That said, I have written such a database app with about 20 columns of data and 1000 rows and it is perfectly useable although having some latency even for simple next / previous requests. On the other hand the app can send mails and create docs.... the advantages of Google service integration :-)
You could have a look at this example to see what I'm talking about
Related
I'm writing google apps script web apps that share data between end users who are using the app at the same time.
I can write the data to a spreadsheet and allow others to read it, or put the data into script cache.
Either way I need a server call. The data is not large....I was just wondering if cache was more server efficient/ faster / better practice ?
Thanks
If you use the Cache Service the max time to live for data in a key 6 hours if you set the expiration, otherwise it lives in the cache for 10 mins. And also the maximum length of a key is 250 characters.
So it really depends on the architecture of your app, but using sheets as a database only perhaps isn't the best solution either, although it may be convenient in many cases.
I wanted to create a (nearly) live dashboard from MySQL databases I tried PowerBI, SSRS and other similar tools but they were not as fast as I wanted. What I have in mind is the data to be updated every 1 minute or even less. Is it possible? and are there any free (or inexpensive) tools for this?
Edit: I want to build a wallboard to show some data on a big TV screen. I need it to be real-time. I tried SSRS autorefresh as well but it has a loading sign and very slow, plus PowerBI uses Azure which is very complex to configure and blocked for my country.
This is a topic which has many more layers than to ask which tool is best for this case.
You have to consider
Velocity
Veracity
Variety
Kind
Use Case
of the data. Sure, this is usually only being recounted if talking about Big Data, but will give you a feeling about the size and complexity of data.
Loading
Is the data being loaded and you "just" use it? Or do you also need to load it realtime or near-realtime (for clarification read this answer here)?
Polling/Pushing
Do you want to poll data every x seconds or minutes? Or do you want to work event based? What are the requirements which will need you to show data this fast?
Use case
Do you want to show financial data? Do you need to show data about error and system logs of servers and applications? Do you want to generate insights as soon as a visitor of a webpage is making a request?
Conclusion
When thinking about those questions, keep in mind this should just be a hint to go into one direction or another. Depending on the data and the use case, you might use an ELK stack (for logs), Power BI (for financial data) or even some scripts (for billing).
I have an application I am working on that basically takes the data from Active Collab and creates reports / graphs out of the data. The API itself is insufficient to get the proper data on a per request basis so I resorted to pulling the data down into a separate data set that can be queried more efficiently.
So in order to avoid needing to query the entire API constantly I decided to make use of webhooks in order to make the transformations to the relevant data and lower the need to resync the data.
However I notice not all events are sent, notably the following.
TaskListUpdated
MemberUpdated
TimeRecordUpdated
ProjectUpdated
There is probably more but these are the main ones I noticed so far,
Time reports is probably the most important, in fact it missing from webhooks means that almost any application has a good chance of incorrect data if it needs time record data. Its fairly common to do a typo in a time record and then adjust it later.
So am I missing anything here? Is there some way to see these events reliably?
EDIT:
In order to avoid a long comment to Ilija I am putting the bulk here.
Webhooks apart, what information do you need to pull? API that powers
time tracking reports can do all sorts of cross project filtering, so
your approach to keep a separate database may be an overkill.
Basically we are doing a multi-variable tiered time report. It can be sorted / grouped by any conceivable method you may want to look at.
http://www.appsmagnet.com/product/time-reports-plus/
This is the closest to what we are trying to do, back when we used Active Collab 4 this did the job, but even with it we had to consolidate it in our own spreadsheets.
So the idea of this is to better integrate our Active Collab data into our own workflow.
So the main data we are looking for in this case is
Job Types
Projects
Task Lists
Tasks
Time Records
Categories
Members / Clients
Companies
These items can feed not only our reports, but many other aspects of our company as well. For us Active Collab is the point of truth, so we want the data quickly accessible and fully query-able.
So I have set up a sync system that initially grabs all the data it can from Active Collab and then uses a mix of cron's and webhooks to keep it up to date.
Cron jobs work well for all aspects that do not have "sub items" (projects/tasks/task lists/time records). So those I need to rely on the webhook since syncing them takes to much time to be able to keep it up to date in real time.
For the webhook I noticed the above do not carry through. Time Records I figured out a way around it listed in my answer, and member can be done through the cron. However Task list and project updating are the only 2 of some concern. Project is fairly important as the budget can change and that would be used in reports, task lists has the start / end dates that could be used as well. Since going through every project / task list constantly to see if there is a change is really not a great idea I am looking for a way to reliably see updates for them.
I have based this system on https://developers.activecollab.com/api-documentation/ but I know there are at least a few end points that are not listed.
Cross-project time-record filtering using Active Collab 5 API
This question is actually from another developer on the same system (and also shows a TrackingFilter report not listed in the docs). Due to issues with maintaining an accurate set of data we had to adapt it. I actually notice that you (Ilija) are the person replying and did recommend we move over to this style of system.
This is not a total answer but a way to solve the issue with TimeRecordUpdated not going through the webhook.
There is another API endpoint for /whats-new This endpoint describes changes for the last day or so and it has a category called TrackingObjectUpdatedActivityLog this refers to an updated time record.
So I set up a cron job to check this fairly consistently and manually push the TimeRecordUpdated event through my system to keep it consistent.
For MemberUpdated since the data for a member being updated is unlikely to affect much, having a daily cron for checking the users seems good enough.
ProjectUpdated could technically be considered the same, but with the absence of TaskListUpdated that leads to far to many api calls to sync the data. I have not found a solution for this yet unfortunately.
Doing some testing on google script's ContactsApp and loading in contacts. It looks like it takes as much time to run ContactsApp.getContacts() (loading all contacts) as it does to run ContactsApp.getContact('email') (specific contact). About 14 seconds on each method for my contacts
My assumption is that both methods are calling all contacts and the 2nd only matches on email. This drags quite a bit.
Has anyone confirmed this and is there anyway to keep the loaded contacts in memory between pages (session variable?).
You've got a few options for storing per-user data:
If it's a small amount of data, you can use User Properties
You can store much more data using ScriptDb, but this will be global, so you'll have to segment off user data yourself
If you only need the data for a short amount of time, say, between function calls, you can use the Cache Services. You'll want to use getPrivateCache()
It sounds like for your use case getPrivateCache() is your best option for user specific session-like data storage.
(Just make sure your intended use fits within the terms of service.)
I tried searching through on stackoverflow as well as googling around a lot, but am not able to find answers to my problem (I guess I'm searching for the wrong keywords / terms).
We are in the process of building a recommendation engine, and while we are initially logging all user activity in custom logs (we use ruby / rails), we need to do an EOD scanning of that file and arrange according to the user. We also have some other user data coming in from some other places (his fb activity, twitter timeline, etc), and hence by EOD we want all data for a particular user to be saved somewhere and then run our analyzer code on all of the user's data to generate the recommendations.
The problem is that we are generating a lot of data, and while for the time being we are using a mysql table to store all this data, we are not sure till how much time can we continue to do this, as our user-base grows (we are still testing it out internally with about 10 users with a lot of activity). Plus, as eager developers we would like to try out something new that can suffice our needs.
Any pointers in this direction will be very helpful.
Check out Amazon Elastic Map Reduce. It was built for this very type of thing.