SSRS 2008 - Create a chart of a directed graph to visualise ETL jobs

SSRS 2008 - Create a chart of a directed graph to visualise ETL jobs - reporting-services

I can't find anything that hints towards native support for charting graph data structures (otherwise known as "network maps" by some), and in my case, a directed graph. I'm wanting to create a visualisation of our ETL dependency chain at work to show the steps that each different 'job' is reliant on before being able to proceed.
Questions:
Has anybody been able to 'simulate\hack\workaround' this lack of out-of-the-box functionality in SSRS?
Any ideas on how to possibly achieve this if no-one has thought of doing this before?
EDIT - 2014-10-30
Two years and no answer so I've accepted the most promising advice on a workaround to get what is needed, as no direct functionality has been found.

From left field:
You could wrap an SSIS package around your "ETL jobs". The SSIS Control Flow surface has a GUI for expressing task dependancies. It's functional if not not visually outstanding. Your "ETL jobs" could be Execute SQL Task or Execute Process Task objects. You can connect the precedence constraints to show dependancies.
This could either be for real use or just for documentation purposes. If you use it for real you'll find its a great way to control ETL dependancies and parallelism.

Related

Integration between Foundry Workshop and Code Repositories

I encountered this technical issue when developing applications in what-if scenarios using Palantir Workshop. Basically, our users generally make changes in Workshop. When all the changes are complete, users can trigger a build in Code Repositories from Workshop for an execution of a Python transform function. Multiple datasets are taken as inputs to the transform functions. What would be the recommended technique for this integration to happen?
Thank you for your attention!
I tried to use build schedules manager together with "apply scenario". Although it can work, but the biggest con lies in that:
the build pipeline is more manual and ad-hoc instead of automated.
the "apply scenario" in Workshop works against our expectation because the data were expected to not override the ontology objects. However, for the trigger to happen, I have to apply the changes made at the Workshop back to the ontology for a chained trigger of build.

Is there a function in SSIS to extract functionality from a package to be added to other packages?

I just added error handling functionality to an SSIS package that I am upgrading, and I need to add this same error handling to about 30 more packages. Is there a way to extract the error handling control flow, parameters, variables, etc. so that I can easily add them to the rest of the packages?
I am using Visual Studio Enterprise 2019 and SSIS 15.0.
I found a bunch of articles on BIML, but it looks like that is only for creating new packages. I am aware that copy and paste exists, but I would like to try to find a solution that is easy to apply across future packages as well as the current packages being updated. Apologies if this question has already been asked, I searched, but I'm not sure that I even really know what search terms would be applicable.

Yes, Biml is an excellent choice for creating consistent packages going forward. Even if you're only generating empty packages with error handling logic, that's a pattern and that's the power of Biml.
With the change to BimlExpress and the now free ability to reverse engineer packages, an approach could be to reverse engineer the packages to Biml. That would all be static tier but you'll need to select all and then in a new BimlScript file, add the error handling like so
<#
foreach(AstPackageNode apn in this.RootNode.Packages)
{
if (!apn.Events.Where(x => x.EventType==EventType.OnError).Any())
{
AstTaskEventHandlerNode onError = new AstTaskEventHandlerNode(null);
onError.EventType = EventType.OnError;
onError.Name = "OnError";
// TODO: add tasks and such
apn.Events.Add(onError);
}
//WriteLine(apn.GetBiml());
}
#>
Once that's looking good, you right click on everything at once and generate packages.
A non-Biml approach is going to test your C# (or VB.NET) skills. I've not touched this type of SSIS dev in more than a decade but the concept will remain the same. https://billfellows.blogspot.com/2016/10/what-packages-still-use-configuration.html
You'll need to find all the SSIS packages. For each one of those, use a reference to the DTS Runtime application to load it. Then look at the package's Events collection and if there isn't an OnError, you're going to have to add one to the collection and then add all the associated tasks, configure them and then save.

Informatica PowerCenter pipelines to Azure Data Factory

I am trying to move my informatica pipelines in PC 10.1 to Azure Data Factory/ Synapse pipelines. Other than rewriting them from scratch, is there a way to migrate them somehow.. I am not finding any tools to achieve this as well. Has anyone faced this problem. Any leads on how to proceed ahead.
Thanks

There are no out of box solutions available to complete this migration. Unfortunately, you will have to author them again.

Informatica PowerCenter pipelines are a physical implementation of an Extract Transform Load (ETL) process. Each provider has different approaches to the implementations and they do not necessarily map well from one to another. Core Azure Data Factory (ADF) is actually more suited to Extract, Load and Transform (ELT), unless of course you use Data Flows.
So what you have to do is:
map out physically what your current pipeline is doing, if you don't have that documentation already. A simple spreadsheet template mapping out the components of the existing pipeline, tracking source, target plus any transformations will suffice
logically map out what the pipeline is doing; ie without using PowerCenter- specific terminology lay out what the "as is" pipeline is doing. A data flow diagram is a great way to do this
logically map out what the "to be" pipeline should do; ie without using any ADF-specific terminology, attempt to refine the "as is" pipeline to its simplest form
using expert knowledge of the ADF components (eg Copy, Lookup, Notebook, Stored Proc to name but a few) map from the logical "to be" to the physical (in the loosest sense of the word, it's all cloud now right : ), eg move data from place to place with the Copy activity, transform data in a SQL database using the Stored Proc activity, a repeated activity might use a For Each loop (bear in mind these execute in parallel), do sophisticated transformations or processing using Databricks notebooks if required and so on. If you require a low-code approach, consider Data Flows.
So you can see it's just a few simple steps. Good luck!

SSIS; Is it better than writing code?

I have a ETL project, that has alot of data that needs cleaning. We're talking about alot of complex transformations. The process needs to take place nightly, and has to finish within a certain amount of time (10 hours). To this end it is best that the ETL use all the processor cores on the system.
Which would be better to use to perform complex ETL transforms in a multi processor environment:
SSIS
or
Dot Net Framework 4 (let me qualify that. I can write and application using entity framework and parallel tasks to do the complex data transforms that are required. Writing an application to do the ETLing isn't a problem, however I'm trying to use the best tool for the job.)
I know it's an unfair question; that SSIS is a technology and dot net is a framework but still...

Yes, working with SSIS is a chore and every project for which I have used it has amazed me by how much longer it took than expected. To be fair, I suppose that a solution to most any problem eventually could be fashioned using either one given enough time.
Using either tool usually involves doing some research and learning in each project. Learning about .NET leaves me edified. Struggling with patchy work-arounds and arcane code hacks to make SSIS work leaves me deflated.
What could be more elementary in software coding than reading from and writing to variables in memory? How complex could it possibly be in any language? How many restrictions on what, when, and where could there be on performing such a rudimentary task? To find the answer, search the internet for the phrase "ssis write to variables in script". SSIS takes complexity to a whole new level, even for the simplest of operations! God help you if you have to write to a package variable within a data flow task.

i'll say no.
i started to write an ETL job, and got stymied by the first column of data: a formatted date time. SSIS was unable to make heads or tails of it.
Perhaps you can spent weeks trying to figure out how to convince SSIS to do what you want - but it's much easier just to get it done.

SSIS is a tool specifically for doing the job you mention. It's ideal for ETL processing and has a lot of common tasks built-in; in a custom .Net framework you'd have to code these from scratch.

What do you think of developing for the command line first?

What are your opinions on developing for the command line first, then adding a GUI on after the fact by simply calling the command line methods?
eg.
W:\ todo AddTask "meeting with John, re: login peer review" "John's office" "2008-08-22" "14:00"
loads todo.exe and calls a function called AddTask that does some validation and throws the meeting in a database.
Eventually you add in a screen for this:
============================================================
Event: [meeting with John, re: login peer review]
Location: [John's office]
Date: [Fri. Aug. 22, 2008]
Time: [ 2:00 PM]
[Clear] [Submit]
============================================================
When you click submit, it calls the same AddTask function.
Is this considered:
a good way to code
just for the newbies
horrendous!.
Addendum:
I'm noticing a trend here for "shared library called by both the GUI and CLI executables." Is there some compelling reason why they would have to be separated, other than maybe the size of the binaries themselves?
Why not just call the same executable in different ways:
"todo /G" when you want the full-on graphical interface
"todo /I" for an interactive prompt within todo.exe (scripting, etc)
plain old "todo <function>" when you just want to do one thing and be done with it.
Addendum 2:
It was mentioned that "the way [I've] described things, you [would] need to spawn an executable every time the GUI needs to do something."
Again, this wasn't my intent. When I mentioned that the example GUI called "the same AddTask function," I didn't mean the GUI called the command line program each time. I agree that would be totally nasty. I had intended (see first addendum) that this all be held in a single executable, since it was a tiny example, but I don't think my phrasing necessarily precluded a shared library.
Also, I'd like to thank all of you for your input. This is something that keeps popping back in my mind and I appreciate the wisdom of your experience.

I would go with building a library with a command line application that links to it. Afterwards, you can create a GUI that links to the same library. Calling a command line from a GUI spawns external processes for each command and is more disruptive to the OS.
Also, with a library you can easily do unit tests for the functionality.
But even as long as your functional code is separate from your command line interpreter, then you can just re-use the source for a GUI without having the two kinds at once to perform an operation.

Put the shared functionality in a library, then write a command-line and a GUI front-end for it. That way your layer transition isn't tied to the command-line.
(Also, this way adds another security concern: shouldn't the GUI first have to make sure it's the RIGHT todo.exe that is being called?)

Joel wrote an article contrasting this ("unix-style") development to the GUI first ("Windows-style") method a few years back. He called it Biculturalism.
I think on Windows it will become normal (if it hasn't already) to wrap your logic into .NET assemblies, which you can then access from both a GUI and a PowerShell provider. That way you get the best of both worlds.

My technique for programming backend functionality first without having the need for an explicit UI (especially when the UI isn't my job yet, e.g., I'm desigining a web application that is still in the design phase) is to write unit tests.
That way I don't even need to write a console application to mock the output of my backend code -- it's all in the tests, and unlike your console app I don't have to throw the code for the tests away because they still are useful later.

I think it depends on what type of application you are developing. Designing for the command line puts you on the fast track to what Alan Cooper refers to as "Implementation Model" in The Inmates are Running the Asylum. The result is a user interface that is unintuitive and difficult to use.
37signals also advocates designing your user interface first in Getting Real. Remember, for all intents and purposes, in the majority of applications, the user interface is the program. The back end code is just there to support it.

It's probably better to start with a command line first to make sure you have the functionality correct. If your main users can't (or won't) use the command line then you can add a GUI on top of your work.
This will make your app better suited for scripting as well as limiting the amount of upfront Bikeshedding so you can get to the actual solution faster.

If you plan to keep your command-line version of your app then I don't see a problem with doing it this way - it's not time wasted. You'll still end up coding the main functionality of your app for the command-line and so you'll have a large chunk of the work done.
I don't see working this way as being a barrier to a nice UI - you've still got the time to add one and make is usable etc.
I guess this way of working would only really work if you intend for your finished app to have both command-line and GUI variants. It's easy enough to mock a UI and build your functionality into that and then beautify the UI later.
Agree with Stu: your base functionality should be in a library that is called from the command-line and GUI code. Calling the executable from the UI is unnecessary overhead at runtime.

#jcarrascal
I don't see why this has to make the GUI "bad?"
My thought would be that it would force you to think about what the "business" logic actually needs to accomplish, without worrying too much about things being pretty. Once you know what it should/can do, you can build your interface around that in whatever way makes the most sense.
Side note: Not to start a separate topic, but what is the preferred way to address answers to/comments on your questions? I considered both this, and editing the question itself.

I did exactly this on one tool I wrote, and it worked great. The end result is a scriptable tool that can also be used via a GUI.
I do agree with the sentiment that you should ensure the GUI is easy and intuitive to use, so it might be wise to even develop both at the same time... a little command line feature followed by a GUI wrapper to ensure you are doing things intuitively.
If you are true to implementing both equally, the result is an app that can be used in an automated manner, which I think is very powerful for power users.

I usually start with a class library and a separate, really crappy and basic GUI. As the Command Line involves parsing the Command Line, I feel like i'm adding a lot of unneccessary overhead.
As a Bonus, this gives an MVC-like approach, as all the "real" code is in a Class Library. Of course, at a later stage, Refactoring the library together with a real GUI into one EXE is also an option.

If you do your development right, then it should be relatively easy to switch to a GUI later on in the project. The problem is that it's kinda difficult to get it right.

Kinda depends on your goal for the program, but yeah i do this from time to time - it's quicker to code, easier to debug, and easier to write quick and dirty test cases for. And so long as i structure my code properly, i can go back and tack on a GUI later without too much work.
To those suggesting that this technique will result in horrible, unusable UIs: You're right. Writing a command-line utility is a terrible way to design a GUI. Take note, everyone out there thinking of writing a UI that isn't a CLUI - don't prototype it as a CLUI.
But, if you're writing new code that does not itself depend on a UI, then go for it.

A better approach might be to develop the logic as a lib with a well defined API and, at the dev stage, no interface (or a hard coded interface) then you can wright the CLI or GUI later

I would not do this for a couple of reasons.
Design:
A GUI and a CLI are two different interfaces used to access an underlying implementation. They are generally used for different purposes (GUI is for a live user, CLI is usually accessed by scripting) and can often have different requirements. Coupling the two together is not a wise choice and is bound to cause you trouble down the road.
Performance:
The way you've described things, you need to spawn an executable every time the GUI needs to d o something. This is just plain ugly.
The right way to do this is to put the implementation in a library that's called by both the CLI and the GUI.

John Gruber had a good post about the concept of adding a GUI to a program not designed for one: Ronco Spray-On Usability
Summary: It doesn't work. If usability isn't designed into an application from the beginning, adding it later is more work than anyone is willing to do.

#Maudite
The command-line app will check params up front and the GUI won't - but they'll still be checking the same params and inputting them into some generic worker functions.
Still the same goal. I don't see the command-line version affecting the quality of the GUI one.

Do a program that you expose as a web-service. then do the gui and command line to call the same web service. This approach also allows you to make a web-gui, and also to provide the functionality as SaaS to extranet partners, and/or to better secure the business logic.
This also allows your program to more easily participate in a SOA environement.
For the web-service, don't go overboard. do yaml or xml-rpc. Keep it simple.

In addition to what Stu said, having a shared library will allow you to use it from web applications as well. Or even from an IDE plugin.

There are several reasons why doing it this way is not a good idea. A lot of them have been mentioned, so I'll just stick with one specific point.
Command-line tools are usually not interactive at all, while GUI's are. This is a fundamental difference. This is for example painful for long-running tasks.
Your command-line tool will at best print out some kind of progress information - newlines, a textual progress bar, a bunch of output, ... Any kind of error it can only output to the console.
Now you want to slap a GUI on top of that, what do you do ? Parse the output of your long-running command line tool ? Scan for WARNING and ERROR in that output to throw up a dialog box ?
At best, most UI's built this way throw up a pulsating busy bar for as long as the command runs, then show you a success or failure dialog when the command exits. Sadly, this is how a lot of UNIX GUI programs are thrown together, making it a terrible user experience.
Most repliers here are correct in saying that you should probably abstract the actual functionality of your program into a library, then write a command-line interface and the GUI at the same time for it. All your business logic should be in your library, and either UI (yes, a command line is a UI) should only do whatever is necessary to interface between your business logic and your UI.
A command line is too poor a UI to make sure you develop your library good enough for GUI use later. You should start with both from the get-go, or start with the GUI programming. It's easy to add a command line interface to a library developed for a GUI, but it's a lot harder the other way around, precisely because of all the interactive features the GUI will need (reporting, progress, error dialogs, i18n, ...)

Command line tools generate less events then GUI apps and usually check all the params before starting. This will limit your gui because for a gui, it could make more sense to ask for the params as your program works or afterwards.
If you don't care about the GUI then don't worry about it. If the end result will be a gui, make the gui first, then do the command line version. Or you could work on both at the same time.
--Massive edit--
After spending some time on my current project, I feel as though I have come full circle from my previous answer. I think it is better to do the command line first and then wrap a gui on it. If you need to, I think you can make a great gui afterwards. By doing the command line first, you get all of the arguments down first so there is no surprises (until the requirements change) when you are doing the UI/UX.

That is exactly one of my most important realizations about coding and I wish more people would take such approach.
Just one minor clarification: The GUI should not be a wrapper around the command line. Instead one should be able to drive the core of the program from either a GUI or a command line. At least at the beginning and just basic operations.
When is this a great idea?
When you want to make sure that your domain implementation is independent of the GUI framework. You want to code around the framework not into the framework
When is this a bad idea?
When you are sure your framework will never die

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008