I'm looking into using google sheets as some sort of aggregation solution for different data sources. It's reasonably easy to configure those data sources to output to a common google sheets and it's need to online for sharing. This sheet would act as my raw, un-treated data source. I would then have some dashboards/sub-tables based on that data.
Now, early tests seem to show I'm going to have to be careful about efficiency as it seems I'm pushing against the maximum 2 millions cells for spreadsheets (we're talking about 15-20k rows of data & 100 or so columns). Handling the data also seems to be pretty slow (regardless of cells limits), at least using formulas, even considering using arrays & avoiding vlookups etc...
My plan would be to create other documents (separate documents, not just adding tabs) & refer to the source data through import-range & using spreadsheet-key. Those would be using subsets of the data only required for each dashboards. This should allow me to create dashboard that would run faster than if setup directly off my big raw data file, or at least that's my thinking.
Am I embarking on a fool's errand here? Anyone has been looking into similarly large dataset on google docs? Basically trying to see if what I have in mind is even practical or not... If you have better ideas in terms of architecture please do share...
I ran into a similar issue once.
Using a multi layer approach like the one you suggested is indeed one method to work around this.
The spreadsheets themselves have no problem storing those two million cells, it's the displaying of all the data that is problematic, so accessing it via Import or scripts can be worthwhile.
Some other things I would consider:
How up to date does the data need to be? Import range is slow and can make the dashboard you create sluggish, maybe a scheduled import with the aggregation happening in Google Apps Script is a viable option.
At that point you might even want to consider using BigQuery for the data storage (and aggregation), whether you pull the data from another spreadsheet in this project or a database that will not run into any issues once you exceed 2 million elements would be future proof.
Alternatively you can use fusion tables* for the storage which are drive based , although I think you cannot run sophisticated SQL queries on it.
*: You probably need to enable them in Drive via right click > more > Connect more apps
Related
We have used a spreadsheet for our client management so far.
Now we are growing and want more users to simultaneously work on the data. We don't want to introduce a CRM because we want to store the data locally and already have a pretty sophisticated script that takes the data from the rows of the spreadsheet via the clipboard and runs operations on it.
We have a Synology NAS that supports MariaDB.
I thought this should be a pretty simple database use case for a HTML website in a browser:
Filtering for existing entries
Manipulating existing entries by
row
Adding new entries
Also, it'd be great to restrict visible entries to a certain amount of rows.
Does anyone know of existing templates which support that?
Thank you!
I’d like to push some data from MySQL into Google Sheets. Once I’ve edited my data in Google Sheets, I’d like to push my edited data back into MySQL. Ideally, I’d even like to schedule it to update it every hour, so my data is always live and matches what's in my MySQL.
I’ve looked into Google Sheets Script and it seems that it enables you to type in a SQL query into a cell in Google Sheets and retrieve your queried data. However, the main issue, even though I find a proper way to export my data to sql, is that I have hundreds of tabs across multiple spreadsheets and I’d like to find a way to avoid to manually repeat this job for every tab.
Please have in mind that it is for someone on my team who can’t figure out querying with SQL, has a hard time navigating MySQL, and that I don’t want to train in SQL. I would just like this person to edit Google Sheets and these edits to be reflected back in MySQL, without this person ever having to go into my SQL database.
I think you can also use Google Apps Script to push back the data in mySQL. However, I don't know how scalable this solution would be.
Some tools exist to export data from SQL to Google sheets, like Zapier and add-ons such as Kloud and Blockspring. The thing with Blockspring is that it's targeted to people that are familiar with SQL queries. And none of those solutions allow you to push the edit data back to your database (at least, that I know of... would be very interested if it is otherwise).
So an option would be to use Actiondesk to sync your SQL database with your Google Sheets. You can schedule the synchronisation every hour (even every ten minutes actually), and it would be easy to add new sheets/tab anytime you need to (it's just a matter of few clicks).
Hope this helps!
Disclaimer: I am a back-end engineer at Actiondesk and personally implemented the Googlesheets integration, so I might be kind of biased (but at the same time, I might be the best person to answer your wildest questions on that regard so feel free to shoot them)!
It's possible to connect to MySQL with Apps Script, but you need to disable your firewall or whitelist all of Google's IP addresses (which are subject to change). As you mentioned you'll also need to set up the script for every Sheet or release the script as an add-on. You are also likely to run into difficulty writing back to the database (e.g. handling date formats).
SeekWell lets you automatically send data from SQL to Sheets and can also sync data from Sheets back to a database. It's built specifically to handle this use case, so it will get you up and running faster, but it's a commercial / paid product.
Disclaimer: I built this.
I'm programming a Google apps script store for tiddliwiki (tiddlywiki.com). It receives files and store them within Google Drive. It never overwrites any file, it just creates a new one on every upload. For performance reasons I want to maintain an index of the latest version of each file and its ID. Currently I'm using properties service to achieve this. When a file is uploaded, I store its Name:ID. That way retrieving a file by name does not require to search in the full folder neither check what is the latest version. I'm worried about how many entries can I store on the script properties store. I'm thinking about using a spreadsheet for save the index, but I don't know what is the difference in terms of performance compared to the properties service.
Here is the question: can I stick to properties service to achieve this task or should I switch to google spreadsheets? Will the performance be much worse? Any alternative for the index?
Thanks in advance.
EDIT:
Since this will store only a few hundred of entries, what about using a JSON file as index. Will take it much time to load the text and parse it?
It depends on the amount of files you're expecting. For a few thousand files, the Property service might do and is surely easier to use than a Spreadsheet, but it has a tighter limitation of 500kb per store.
If you think you'll have more files, than it's probably best not to index at all, and do a fast Google Drive search to retrieve your latest file. The search criteria can be very specific and fast (filter by title only or any timestamp criteria). I think it'll be much less trouble in your script (than trying to build/fit a big index into memory, etc).
We are going to need to update data in a Google Fusion Table hourly and are looking into possibly using an SSIS package to accomplish this.
Has anyone had any experience with updating Google Fusion Tables automatically? Any methods work better than others?
I would decompose your problem into smaller problems until you reach the point you have found one you can solve.
How do I perform a task at X interval
In a pure Windows world, I'd use the native Task Scheduler. It's free and works fine for your scenario of "every hour."
Since SSIS is in the mix, that means you would also have access to SQL Agent. It too is a good fit for your scenario and at this point, I would examine your organization and determine what scheduling tool is predominantly used. It might be "neither."
How can I programmatically update Google Fusion Tables
There is a full Fusion API published. They even have a DML syntax for working with data in a table. However, do observe the warning about using the query syntax for more than 500 rows/10k cells/1MB.
Note: You can list up to 500 INSERT statements, separated by
semicolons, in one request as long as the total size of the data does
not exceed 1 MB and the total number of table cells being added does
not exceed 10,000 cells. If you are inserting a large number of rows,
use the import method instead, which will be faster and more reliable
than using many SQL INSERT statements.
How can I use SSIS to update Google Fusion Tables
For anything that's not Out Of the Box with SSIS, I usually re-ask the question as "how do I do X in .NET" because that's what it will boil down to. Since it's a web destination, while SSIS has a web service task, it's not as useful as writing your own .NET caller.
I would envision an SSIS package with at least a Data Flow Task. Depending on where your data is coming from, it'd have a source (OLE DB, flat file, etc), any transformations you need between that and the destination. Your destination will be a Script Component configured as a Destination. There you'll use C# or VB.NET to send your Insert/Update commands to the web server. I found this sample of C# that sounds logical. I've never actually used GFT API so I can't comment on whether there's a better route of doing this.
Copying sanya's comment in here
Warning: the attached sample c# script uses Client Login to
authenticate against Google. This auth method has been deprecated
since April 20, 2012. Usage of OAuth2 is supported.
I have implemented a time booking system based on spreadsheets which the users fill out and then are consolidated into one central (and big) spreadsheet.
After having had a few performance issues the whole application now runs perfectly since several months. However, I will soon run into the size limitation of spreadsheets (400k cells).
In the consolidated spreadsheet I basically do not need more data than the current month. However for statistical purposes I would appreciate if I could make the data easily accessible for the domains users.
Basically the BigQuery Service would be perfect but I did not find an API to write data to it from a spreadsheet. I hesitate to use the Google provided MySQL database for cost reasons.
Are there any other ideas around?
There's a built-in Google BigQuery API for Apps Scripts, you just have to enable it manually under Resources > Use Google APIs. There's also Fusion Tables, that does not have a built-in API but is somewhat simple to use via UrlFetch.
Anyway, if it's statistical purposes, why don't you just "compile" the data in another spreadsheet? e.g. month - amount of entries - total prices - avg etc - etc...