I have a data warehousing solution formed of a series of databases, SSIS packages and an SSAS database. The SSIS packages and SSAS database all sit within source control using Team Foundation Server.
What I'd like to be able to do is branch the SSAS and SSIS projects to enable us to work on multiple streams of work and then be able to merge the projects back in prior to release to a production environment.
TFS allows me to branch my projects with little effort, however merging them back together afterwards results in trawling through pages and pages are difficult to consume XML.
How are other people dealing with this situation? Are there any tools available on the market to deal with exactly these situations?
As documented in this blog post by Jamie Thomson, SSIS files are effectively binary files so should be treated as non-mergable.
http://consultingblogs.emc.com/jamiethomson/archive/2007/08/06/SSIS_3A00_-Team-Development-Experiences.aspx
He also recommends making packages as modular as possible if you want to have multiple team members working on the same project - this is something we've adopted.
There is a tool called BIDS Helper which provides a 'smart diff' for SSIS files which can be useful for determining changes between versions.
http://bidshelper.codeplex.com/wikipage?title=Smart%20Diff&referringTitle=Documentation
But, generally, SSIS files should be treated as non-mergable if you want to avoid hours of pain - we've switched on exclusive check out on all .dtsx files in TFS so that people don't tread on each other's toes.
Related
I have hundreds of BIML scripts and I have to convert each into SSIS package. The only process I figured out is to manually right click the biml file and convert it click the Generate SSIS Package. (Please follow the link to visualize it). How do I automate this process. In other words, how can I programmatically convert all the biml scripts into their corresponding SSIS packages..
https://www.google.com/url?sa=i&url=http%3A%2F%2Fwww.erikhudzik.com%2Ftag%2Fssis%2F&psig=AOvVaw3vHH8scEdHu5w-JUDrHyLi&ust=1657797254349000&source=images&cd=vfe&ved=0CAkQjRxqFwoTCLDFmZfe9fgCFQAAAAAdAAAAABAR
You should be able to select multiple files, right click and generate them all at the same time.
You can also reference one biml script from another. So you can have your main entry point which contains a <packages> element and then reference other scripts within that which define each package.
Finally, if you have biml studio, this comes with a command line utility which would allow you to do it programmatically.
There are 3 tools for transforming Biml into DTSX packages: BimlExpress, BimlStudio (formerly known as Mist) and the precursor to BimlExpress - BidsHelper. That product no longer exists and the Biml bits in it and has been rebranded.
Under the covers, BimlStudio is going to invoke Bimlc.exe which is the Biml Compiler and that is how scripts become packages. Buy it outright or rent it monthly, depending on your needs. This is your only choice for unattended/automatic/automated builds.
BimlExpress is the free tool that can also transform scripts into packages. It requires mouse clicks to build packages.
The big difference between the two, for the beginner at least, is convenience. I have ScriptedPackageA and ScriptedTablesB which makes Package1. In BimlStudio, I can set the properties so that one is live (tables) and just evaluate/expand the Package script. In BimlExpress, I need to shift/control click the scripts I want to be compiled/referenced.
Also if you have hundreds of Biml scripts... you might not have understood the idea behind Biml. For reference, I have about about 7000 .biml files on my machine but I bet I have less than 30 describing package patterns. The only way I'm at my large number of Biml files is that I have scripted a number of databases with an file per table.
Generally speaking, you want to distill your approach down to distinct patterns and then throw your metadata against it. How many ways can you have a package that loads from file to database?
We do a lot of TFS analysis and reporting in Excel, using PowerPivot to map to 'executive friendly' terms (like dates vs iterations) and they are quite please with all that. However, we also now have several TFS 'projects' (instances, versions, whathaveyou), and an Excel workbook cannot link to more than one project at a time. Also as the projects grow larger and larger, my machine struggles to process. I'd like to put the mapping tables AND the TFS data from several instances into an Access database.
Question 1
Is there any way to link Access to TFS? If not, I'm fine having TFS data in Excel and linking to Excel, but with the 'header' TFS insists on putting in your export, the linked table has issues (like not having headers in first row, and then always having two erroneous records upon every refresh).
Question 2
Any thoughts on how to get around the funky header?
There is no native way to link Ms Access to TFS or to easily get rid of the header row that's added in Excel when linking a table. I'm afraid that the direction you're seeking will not help you in the end. It will only put all the computational pressure on Access instead of Excel.
The way to go here would be to setup the TFS reporting features, which offloads a lot of the processing to SQL Server Analysis Services (which can be used as a source for Excel) and reporting is provided by SQL Server Report Server in that case.
Unfortunately, the Reporting part of TFS hasn't had a decent update since 2010 and can be a bit archaic to use. For VSTS (the cloud-based version of TFS) it's now possible to link the account to PowerBI, which does everything you're after. This feature is not available on-premise though, it may be a great reason to move to the cloud. PowerBI can handle large amounts of data, can be connected to one or more VSTS accounts and can slice and dice data from multiple accounts and projects with ease.
I am responsible for 196 different RDL files in a couple of dozen different folders and sub-folders and there are some key missing functionalities in SSRS that make maintaining report servers kind of a nightmare.
We get caught out all the time with differences between dev, test, and production. The project in visual studio doesn't allow you to deploy folder structures and there seems to be no easy way to compare everything deployed on various report servers.
What if any ways are there to make this a bit more maintainable?
You have a few options.
First one is to compare your ReportServer databases between your environments. Specifically the Catalog table which houses your folder structure and reports (which have a type = 2 I believe).
Second is the reporting services scripter too which can be found here:
http://www.sqldbatips.com/showarticle.asp?ID=62
This is an awesome tool that allows you to manage all your different environments and script them out for backup/restore and comparing.
We are doing a huge Data Migration Project using SSIS packages. We were insisted on not using stored procedures in SSIS packages. Can you please suggest whether we should be using stored procedures in SSIS packages or not? What are the advantages of using stored procedures?
It is correct that merge statements can easily be used in SSIS and your directive to encapsulate everything in SSIS is not necessary, as SQL processing aggregations faster than SSIS, for example. Further, if you are not deploying to SSISDB or have proper logging wrappers or email alerts, then troubleshooting your ETL is going to be more difficult via the SQL agent than otherwise as the errors are frequently more cryptic - thus the SSISDB and its reports in 2012. SSIS can be extremely powerful, however.
Here is a fairly blatant benchmark that will tell you never to use the out of the box SCD ever in SSIS. Taskfactory however does have a nice deployable which does basically merges behind the scene.
SSIS has more powerful functions than Stored Procedures.
However you can easily use Execute T-SQL Statement tasks in SSIS for existing tasks, and then build out from there.
SSIS is superior at the vast majority of ETL
Below Via Microsoft
Microsoft Integration Services is a platform for building enterprise-level data integration and data transformations solutions. You use Integration Services to solve complex business problems by copying or downloading files, defining business logic, sending e-mail messages in response to events, updating data warehouses, cleaning and mining data, and managing SQL Server objects and data. The packages can work alone or in concert with other packages to address complex business needs. Integration Services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources, and then load the data into one or more destinations.
Integration Services includes a rich set of built-in tasks and transformations; tools for constructing packages; and the Integration Services service for running and managing packages. You can use the graphical Integration Services tools to create solutions without writing a single line of code; or you can program the extensive Integration Services object model to create packages programmatically and code custom tasks and other package objects.
A stored procedure in SQL Server is a group of one or more Transact-SQL statements or a reference to a Microsoft .NET Framework common runtime language (CLR) method. They can be called from within SSIS just the same as the unencapsulated SQL statement. For more information about it, please see: http://msdn.microsoft.com/en-us/library/ms190782(v=sql.110).aspx
I've hit the inevitable state where I have to do a diff on the code within two versions of a SSIS package.
What have you used successfully other than what I'm going to do now by opening up 2 instances of VS and go over it box by box and variable by variable?
Note: The things that are important to compare in my case are:
Variables
Code in Execute SQL Tasks
Order of Tasks
Data Flows
Order of data flow components
Try BIDS Helper. It prepares both versions by normalizing whitespace and splitting long lines before making the comparison. Most changes can be easily recognized.
A Visual Studio extension has been deployed in March 2017 to Visual Studio Marketplace. It is called SSIS Compare and Merge Tool. You can install it using Tools menu than Extensions and updates and search for online extensions or download and install the .VSIX file from Visual Studio Marketplace
https://marketplace.visualstudio.com/items?itemName=TamasTIPost.SSISCompareMergeTool-18170
There are some utilities that will do this:
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx?dg=microsoft.public.sqlserver.dts&tid=0619e97f-4dd4-4946-bd41-888e751a5d72&cat=en_US_2b8e81a3-be64-42fa-bd81-c6d41de5a219&lang=en&cr=US&sloc=&p=1
ApexSQLDiff
Randy
I use Notepad++ to compare dtsx (xml) files. Some times even write some code to extract components for example to extract all SQL stored in a large SSIS package with dozens of ExecuteSQL tasks.
UPDATE: Just found this Bi xPress. Here is a video explaining how it works
Updated with the latest locations and consolidated the different answers.
There are a number of tools that try to help managing, comparing and merging SSIS and other otherwise difficult to merge file formats favored by SQL Server Analysis Services, Integration Services and reporting services.
Putting SQL code in stored procedures and managing these using SSDT and Git/TFVC is a useful first step. For the more exotic file formats extend your toolbelt with additional tools like:
BI Developer Extensions (formerly BIDS Helper) free!
Apex SQL Diff Pro
Bi xPress
SSIS Compare Merge Tool
Each work slightly differently and the cost varies, but they all apply normalization and visualization to help you understand and potentially merge the differences between these files.
Aside from that, many changes may be possible with enough understanding of the XML using tools like Notepad++ or Araxis Merge. Many merge tools now have have special XML compare/merge capabilities where you can configure how files should be normalized prior to comparison.