Using SSIS tools? - ssis

I want to import data from a relational database in order to integrate it and load it in a transit database that I will use to form OLAP cubes. I've seen many tutorials about SSIS and they're all so basic and working on just one data flow task.
Now I wonder if I have to use one data flow for each table that I gonna bring or for each group of tables which are related to each other. Many details concerning BI tools are still unclear.
I really appreciate your help and if you can propose some advaced tutorials for me that will be grate too. :)
I've also another question concerning the transit database is it gonna be multidimensional and have I to create an empty one first??

Now I wonder if I have to use one data flow for each table
Most developers do it this way, mostly for ease of logging, but also having multiple sources and targets in a single data flow task means it multi-threads, and you lose control over it. In my case, in each data flow task I'm grabbing the row counts in the source / inserted / updated / deleted / failed validation / no action, and that's not possible if there are multiple sources and destinations.
I really appreciate your help and if you can propose some advanced tutorials for me that will be grate too.
I recommend searching Pragmatic Works free SSIS training webinars, and while you're at it see if they are offering an SSIS workshop in your area. Of course other people will have other preferences, and your mileage may vary.
Also, define 'advanced'.
I've also another question concerning the transit database is it gonna be multidimensional and have I to create an empty one first??
Not enough information here to give you an answer, and how about splitting this off as a separate question with details, as opposed to bundling multiple questions (with an unknown amount of follow-up questions) in a single SO question.
Good luck.

Related

Best practice to use several APIs or data sources for one application

I want to build an application that uses data from several endpoints.
Lets say I have:
JSON API for getting cinema data
XML Export for getting data about ???
Another JSON API for something else
A csv-file for some more shit ...
In my application I want to bring all this data together and build views for it and so on ...
MY idea was to set up a database by create schemas for all these data sources, so I can do some kind of "import scripts" which I can call whenever I want to get the latest data.
I thought of schemas because I want to be able to easily adept a new API with any kind of schema.
Please enlighten me of the possibilities and best practices out there (theory and practice if possible :P)
You are totally right on making a database. But the real problem is probably not going to be how to store your data. It's going to be how to make it fit together logically and semantically.
I suggest you first take a good look at what your enpoints can provide. Get several samples from every source and analyze them if you can. How will you know which data is new? How can you match it against existing data and against data from other sources? If existing data changes or gets deleted, how will you detect and handle that? What if sources disagree on something? How and when should you run the synchronization? What will you do if one of your sources goes down? Etc.
It is extremely difficult to make data consistent if your data sources are not. As a rule, if the sources are different, they are not consistent. Thus the proverb "garbage in, garbage out". We, humans, have no problem dealing with small inconsistencies, but algorithms cannot work correctly if there are discrepancies. Even if everything fits together on paper, one usually forgets that data can change over time...
At least that's my experience in such cases.
I'm not sure if in the application you want to display all the data in the same view or if you are going to be creating different views for each of the sources. If you want to display the data in the same view, like a grid, I would recommend using inheritance or an interface depending on your data and needs. I would recommend setting this structure up in the database too using different tables for the different sources and having a parent table related to all them that has a type associated with it.
Here's a good thread with discussion about choosing an interface or inheritance.
Inheritance vs. interface in C#
And here are some examples of representing inheritance in a database.
How can you represent inheritance in a database?

How do I set up the architecture for a "big data" analysis project?

A friend of mine and I are in our senior year and will be starting a senior project soon. We had the idea to do a data analysis and data visualization project for it. Our project involves reading a CSV file that is updated every 2 minutes, parsing that data, then storing it in a database. Once that data is stored we want to run some analysis on it and provide an API through which we could access that data to visualize in some way. Our end goal would be to build an Android app that displays some of the raw data from the CSV and the analysis in a user friendly format. I talked to another CS Major and he explained that I would need a few different servers to accomplish this: One for the storage, another for analysis, and another for some type of queue that would make sure things don't get screwy while we are doing scraping and analysis. The problem is, I don't really know where to start with this. I've done some work with a SQL database before and a PHP front end, but nothing with multiple servers. I've heard of tools to use with big data projects like Hadoop but i'm not exactly sure where it fits in. If someone could point me to a resource of some kind to explain, or explain themselves, how I would start to structure this kind of project, that would be awesome!
Since you don't have much experience with these things you'll probably want to look at projects like Cloudera. Specifically their resources page has a nice set of videos and articles.
Another source of solid information (that I personally use) is by clicking on an Stack Overflow tag and selecting the votes option. Many good questions on a plethora of big data topics already exists.

How to migrate existing database from Domino Server to Relational database (MySQL)

Is there any good way to migrate existing database from Domino Server to Relational database like MySQL without using any tool.
I've explored a bit about this and got to know that its possible using XML but don't know how and what'll be the procedure.
Any help would be appreciated.
Without using any tool: NO.
There are two big difficulties in exporting data:
First is the Notes Richtext, which is a proprietary format that has to be "transcoded" somehow. This is not an easy thing to do "manually" and needs either a lot of coding or some kind of tool.
Second is the fact, that there is no "forced" structure in Notes documents. There can be several forms that "define" how the documents look and there can be different versions of these forms that have been used over the past. A document may or may not contain any number of fields in any thinkable type (the field may even be number in one document and text in the other).
You have to KNOW the structure of your documents to get them out. Of course you can simply export them as "Structured Text" or as "Comma separated values", to get -most- of it, but then you need views that show the documents in the order you need them. Exporting them as XML is another "standard" way to get the data, but then you need to understand the xml to get it into your relational database.
Short: Without (at least very little) coding knowledge OR a tool (that costs money) there is no chance for getting the data out.
Ah yes, there is an "ODBC driver" for Lotus Notes / Domino, but that will not help you much, if you do not know the structure of your documents and how Notes- Databases work, it will also not work.
As Torsten said above, you can't do it without a tool, either you buy one or write one yourself.
I wrote a tool like that several years ago to export Notes databases as XML. There is a bit of work, especially with the rich text fields. You also may want to export/detach attachments and embedded images.
You can read more about my export tool here: http://www.texasswede.com/websites/texasswede.nsf/Page/Notes%20XML%20Exporter

Generate schema for a analysis services database project

I was given a task to have a better understanding of several ETL packages that were created in a Database project using Business Intelligence Development Studio(SQL 2005).
Currently I have to open each master package, package and then data flow and so on to discover the relationships that exists with either the source tables and the destination tables.
I realized that probably a good way to more easily get that information would be having a tool similar to what SchemaSpy does with a normal Database. That would provide my a high level detail of the relationships that exist.
Anyone knows an application/script that could help me achieving this result?
I tried to search, but I must admit that I was getting the feeling that I wasn't really searching in the right direction as most of my searches ended up pointing for database comparisons.
Turned out, the only way I found to do this was to parse the xml inside the packages and extract the relationships. And then using Graphviz (the same visual component used by schema spy) create the diagrams.
Unfortunately this was an expensive thing to do and I never finished the project. Mainly due to lack of knowledge around the xml structure but it is definitely possible to be achieved

Mutiple Tables import using single dataflow in ssis

I have 10 tables I am importing to another sql server database using SSIS.
Do I have to create 10 different Dataflow tasks or can I proceed with one Dataflow task and add the 10 tables to it?
I have tried to use a single dataflow task but it is only allowing for a single table.
Do all the source tables share one common schema?
Do all the destination tables share one common schema (which doesn't have to be the same as the common schema for the source tables)?
If the answer to both questions is "yes", then you can in fact write a single Data Flow Task (whose connection managers are parameterized) and put it in a Foreach Loop container.
If the answer to either (or both) of those questions is "no", then you'll have to have separate sources and destinations. You might want to investigate Business Intelligence Markup Language as a way to generate those data flows automatically, although it's probably overkill for "only" ten tables.
The answer depends upon you and your best practices and how many developers you will have working on projects at the same time.
It is entirely possible to put more than one set of tables in a single dataflow. You can simply add additional sources and destinations to your dataflow. However, this is almost never a good idea as it adds to the maintenance effort later in the lifecycle of your project. It makes it more difficult to find and debug errors. It makes the entire project more complex.
If you are working alone and you will be building and maintaining this project's full lifecycle by yourself, then by all means do whatever you feel most comfortable with.
If you are in a group that may all maintain this project, I would suggest that you, at a minimum, break out the dataflow to different tables into different dataflow tasks.
If you are in a larger group and for more flexibility in maintenance, I would suggest that each dataflow be broken out into a different package (assuming 2008 or below. I have not played with the 2012 project models yet, so won't comment on them here), so that each can be worked on by different developers simultaneously. (I would actually recommend coding this way even if you are the only one on the project, but that is just the style I have developed over my career.)