Automatically Update Data in a Google Fusion Table hourly? - ssis

We are going to need to update data in a Google Fusion Table hourly and are looking into possibly using an SSIS package to accomplish this.
Has anyone had any experience with updating Google Fusion Tables automatically? Any methods work better than others?

I would decompose your problem into smaller problems until you reach the point you have found one you can solve.
How do I perform a task at X interval
In a pure Windows world, I'd use the native Task Scheduler. It's free and works fine for your scenario of "every hour."
Since SSIS is in the mix, that means you would also have access to SQL Agent. It too is a good fit for your scenario and at this point, I would examine your organization and determine what scheduling tool is predominantly used. It might be "neither."
How can I programmatically update Google Fusion Tables
There is a full Fusion API published. They even have a DML syntax for working with data in a table. However, do observe the warning about using the query syntax for more than 500 rows/10k cells/1MB.
Note: You can list up to 500 INSERT statements, separated by
semicolons, in one request as long as the total size of the data does
not exceed 1 MB and the total number of table cells being added does
not exceed 10,000 cells. If you are inserting a large number of rows,
use the import method instead, which will be faster and more reliable
than using many SQL INSERT statements.
How can I use SSIS to update Google Fusion Tables
For anything that's not Out Of the Box with SSIS, I usually re-ask the question as "how do I do X in .NET" because that's what it will boil down to. Since it's a web destination, while SSIS has a web service task, it's not as useful as writing your own .NET caller.
I would envision an SSIS package with at least a Data Flow Task. Depending on where your data is coming from, it'd have a source (OLE DB, flat file, etc), any transformations you need between that and the destination. Your destination will be a Script Component configured as a Destination. There you'll use C# or VB.NET to send your Insert/Update commands to the web server. I found this sample of C# that sounds logical. I've never actually used GFT API so I can't comment on whether there's a better route of doing this.
Copying sanya's comment in here
Warning: the attached sample c# script uses Client Login to
authenticate against Google. This auth method has been deprecated
since April 20, 2012. Usage of OAuth2 is supported.

Related

SQL Server & Active Directory

What’s the best practice for integrating SQL Server with Active Directory (AD)?
NB. I’m using SQL Server 2016
Crux of the issue: I'm using SSRS 2016 and have several reports that need to be filtered based on the user accessing the reports. Originally I created a table of users that would need to access the reports. Then in the report builder I passed the UserID as a parameter within the query so that the resulting dataset would be limited to the data the user needed to see.
The problem this created is that the User table would have to be maintained, and Active Directories are dynamic. Now that I have some time to develop a better option, I’d like to link the LDAP data with SQL Server.
I’m wondering what the best practice for doing this is.
One way I pursued this was through an SSIS package ADO.Net connection. Then convert the data. Then load it into a table. Then schedule a job to run the package however often I needed it. This was problematic because for whatever reason I couldn’t get the data conversion process to work.
The second way I’ve been approaching this is to create a linked server instance for the AD. My research has indicated that I’ll need to create a function that overcomes the string limitation of the xp_sprintf Function. Then leverage temp tables and loop through LDAP data to get around the 1000 record limitation from the AD. I've been able to accomplish all this.
At this point though, there appears to be some other issues.
This ultimately increases the code necessary in the views for my reports which may make it harder for other database users to update if & when the time comes. To the point that I'd need to abandon the views and create stored procedures for the reports to pull from.
This also increases transaction counts beyond the SQL Server to include LDAP every time a user accesses a report.
So to resolve that I could wrap the original query of the LDAP data to create a table and then create a job to run that stored procedure every so often.
Either option solves the problem of maintaining the users table which is good, but it isn't perfect because AD changes can take place at any time.
Which option is better here?
If the SSIS package is the better route, I’m curious as to why that is the better route. I’m not opposed to going back and figuring out what it is I’m missing on the SSIS package to make it work.
Are there additional options I should consider if I want to get the most up-to-date Active Directory listing?
Thanks.

Largest practical datasets in Google spreadsheets?

I'm looking into using google sheets as some sort of aggregation solution for different data sources. It's reasonably easy to configure those data sources to output to a common google sheets and it's need to online for sharing. This sheet would act as my raw, un-treated data source. I would then have some dashboards/sub-tables based on that data.
Now, early tests seem to show I'm going to have to be careful about efficiency as it seems I'm pushing against the maximum 2 millions cells for spreadsheets (we're talking about 15-20k rows of data & 100 or so columns). Handling the data also seems to be pretty slow (regardless of cells limits), at least using formulas, even considering using arrays & avoiding vlookups etc...
My plan would be to create other documents (separate documents, not just adding tabs) & refer to the source data through import-range & using spreadsheet-key. Those would be using subsets of the data only required for each dashboards. This should allow me to create dashboard that would run faster than if setup directly off my big raw data file, or at least that's my thinking.
Am I embarking on a fool's errand here? Anyone has been looking into similarly large dataset on google docs? Basically trying to see if what I have in mind is even practical or not... If you have better ideas in terms of architecture please do share...
I ran into a similar issue once.
Using a multi layer approach like the one you suggested is indeed one method to work around this.
The spreadsheets themselves have no problem storing those two million cells, it's the displaying of all the data that is problematic, so accessing it via Import or scripts can be worthwhile.
Some other things I would consider:
How up to date does the data need to be? Import range is slow and can make the dashboard you create sluggish, maybe a scheduled import with the aggregation happening in Google Apps Script is a viable option.
At that point you might even want to consider using BigQuery for the data storage (and aggregation), whether you pull the data from another spreadsheet in this project or a database that will not run into any issues once you exceed 2 million elements would be future proof.
Alternatively you can use fusion tables* for the storage which are drive based , although I think you cannot run sophisticated SQL queries on it.
*: You probably need to enable them in Drive via right click > more > Connect more apps

Dynamically changing Report's Shared Data Source at Runtime

I'm looking to use SSRS for multi-tenant reporting and I'd like the ability to have runtime-chosen Shared Data Sources for my reports. What do I mean by this? Well, I could be flexible but I think the two most likely possibilities are (however, I'm also open to other possibilities):
The Shared Data Source is dictated by the client's authentication. In my case, the "client" is a .NET application and not the user, so if this is a viable path then I'd like to somehow have the MainDB (that's what I'm calling it) Shared Data Source selected by the Service Account that the client logs in as.
Pass the name of the Shared Data Source as a parameter and let that dictate which one to use. Given that all of my clients are "trusted players", I am comfortable with this approach. While each client will have its own representative Service Account, it's just for good measure and should not be important. So instead of just calling the data source MainDB, we could instead have Client1DB and Client2DB, etc. It's okay if a new data source means a new deployment but I need this to scale easily enough as well to ~50 different data sources over time.
Why? Because we have multiple/duplicate copies of our production application for multiple customers but we don't want to duplicate everything, just the web apps and databases. We're fine with some common "back-end" things. And for SSRS, because of how expensive licenses are (and how rarely reports are ran by our users), we really want to have just a single back-end for all of our customers (I actually have a second one on standby for manual disaster recovery situations - we don't need to be too fancy here as reports are the least important DR concern we have).
I have seen this question which points to this post but I was really hoping there was a better way than this. Because of all of those additional steps/efforts/limitations/etc, I'd rather just use PowerShell to script duplicate deployments of the reports with tweaked hardcoded data sources instead of standardizing on the steps in that post. That solution feels WAY too hacky to me and doesn't seem to scale very well at all.
I've done this a bunch of terrible ways (usually hardcoded in a dynamic script), and then I discovered its actually quite simple.
Instead of using Shared Connection, use the Embedded Connection and create your Connection string based on params (or any string manipulation code)....

Refreshing a reporting database

We are currently having an OLTP sql server 2005 database for our project. We are planning to build a separate reporting database(de-normalized) so that we can take the load off from our OLTP DB. I'm not quite sure which is the best approach to sync these databases. We are not looking for a real-time system though. Is SSIS a good option? I'm completely new to SSIS, so not sure about the feasibility. Kindly provide your inputs.
Everyone has there own opinion of SSIS. But I have used it for years for datamarts and my current environment which is a full BI installation. I personally love its capabilities to move data and it still is holding the world record for moving 1.13 terabytes in under 30 minutes.
As for setup we use log shipping from our transactional DB to populate a 2nd box. Then use SSIS to de-normalize and warehouse the data. The community for SSIS is also very large and there are tons of free training and helpful resources online.
We build our data warehouse using SSIS from which we run reports. Its a big learning curve and the errors it throws aren't particularly useful, and it helps to be good at SQL, rather than treating it as a 'row by row transfer' - what I mean is you should be creating set based queries in sql command tasks rather than using lots of SSIS component and dataflow tasks.
Understand that every warehouse is difference and you need to decide how to do it best. This link may give you some good idea's.
How we implement ours (we have a postgres backend and use PGNP provider, and making use of linked servers could make your life easier ):
First of all you need to have a time-stamp column in each table so you can when it was last changed.
Then write a query that selects the data that has changed since you last ran the package (using an audit table would help) and get that data into a staging table. We run this as a dataflow task as (using postgres) we don't have any other choice, although you may be able to make use of a normal reference to another database (dbname.schemaname.tablename or somthing like that) or use a linked server query. Either way the idea is the same. You end up with data that has change since your query.
We then update (based on id) the data that already exists then insert the new data (by left joining the table to find out what doesn't already exist in the current warehouse).
So now we have one denormalised table that show in this case jobs per day. From this we calculate other tables based on aggregated values from this one.
Hope that helps, here are some good links that I found useful:
Choosing .Net or SSIS
SSIS Talk
Package Configurations
Improving the Performance of the Data Flow
Trnsformations
Custom Logging / Good Blog

Migration strategies for SQL 2000 to SQL 2008

I've perused the threads here on migration from SQL 2000 to SQL 2008 but haven't really run into my question, so here we go with another one.
I'm building a strategy to move specific SQL 2000 databases to a new SQL 2008 R2 instance. My question comes with regards to the best method for transferring the schema and data. One way I know of is to do the quick 'n' dirty detach - copy - attach method, which should work so long as I've done my homework wrt compatibility and code and such.
What if, though, I wrote the schema and logins via script and then copied the data via SSIS? I'm thinking of trying that so I can more easily integrate some of my test cases into the package (error handling and whatnot). What would I be setting myself up for if I did this?
Since you are moving the data between servers or instances, I would recommend moving the data via data flows. If you don't expect to run the code more than once, then you can let the wizard generate your code for this move. However, when I did this once 2+ years ago, the wizard code generated combined execute sql tasks that combined many "create table" commands into one task and created a few data flow tasks that had multiple source and destinations in them to insert data in the destination. This was good to get up and running, but it was inadequate when I wanted to refresh the tables one more time after I modified the schema of the new target tables. If you expect to run the refresh more than once, then you may want to take the time to create the target schema first and then manually create the data flows.
Once you have moved the data, then you can enable full-text search on the new server. I don't believe you will need to have this enabled on your first load.
One reason I recommend against the detach-attach method for migration is that you bring all the dirty laundry from the 2000 database to the 2008 R2 database. If you had too lax security on the 2000 server or many ancient users that shouldn't exist, it could be easier to clean this up by starting from scratch. If you use the detach-attach method, then you have to worry about users.