I was given a task to have a better understanding of several ETL packages that were created in a Database project using Business Intelligence Development Studio(SQL 2005).
Currently I have to open each master package, package and then data flow and so on to discover the relationships that exists with either the source tables and the destination tables.
I realized that probably a good way to more easily get that information would be having a tool similar to what SchemaSpy does with a normal Database. That would provide my a high level detail of the relationships that exist.
Anyone knows an application/script that could help me achieving this result?
I tried to search, but I must admit that I was getting the feeling that I wasn't really searching in the right direction as most of my searches ended up pointing for database comparisons.
Turned out, the only way I found to do this was to parse the xml inside the packages and extract the relationships. And then using Graphviz (the same visual component used by schema spy) create the diagrams.
Unfortunately this was an expensive thing to do and I never finished the project. Mainly due to lack of knowledge around the xml structure but it is definitely possible to be achieved
Related
We have a rails app that has a MySQL backend, each client has one DB and the schema is identical. We use a custom gem to change the DB based on the URL of the request (This is some legacy code that we are trying to move away from)
We need to capture some changes from those MySQL databases (Changes in inventory, some order information, etc) transform and store in a single MongoDB database (multitenant data store), this data will be used for analytics at first, but our idea is to move everything there.
There was something in place to do this, using AR callbacks and Rabbit, but to be honest it wasn't working correctly and it looked like it was more trouble to fix it than to start over with a fresh approach.
We did some research and found some tools to do ETL but they are overkill for our needs.
Does anyone have some experience with a similar problem?
Recommendations on how to architect and implement this simple ETL
Pentaho provides change-data-capture option which can solve Data-synchronization problems.
If by Overkill you mean Setup, Configuration, then Yes that is the common problem with ETL tools and PENTAHO is the easiest among them.
If you can provide more details, I'll be glad to provide an elaborate answer.
I want to import data from a relational database in order to integrate it and load it in a transit database that I will use to form OLAP cubes. I've seen many tutorials about SSIS and they're all so basic and working on just one data flow task.
Now I wonder if I have to use one data flow for each table that I gonna bring or for each group of tables which are related to each other. Many details concerning BI tools are still unclear.
I really appreciate your help and if you can propose some advaced tutorials for me that will be grate too. :)
I've also another question concerning the transit database is it gonna be multidimensional and have I to create an empty one first??
Now I wonder if I have to use one data flow for each table
Most developers do it this way, mostly for ease of logging, but also having multiple sources and targets in a single data flow task means it multi-threads, and you lose control over it. In my case, in each data flow task I'm grabbing the row counts in the source / inserted / updated / deleted / failed validation / no action, and that's not possible if there are multiple sources and destinations.
I really appreciate your help and if you can propose some advanced tutorials for me that will be grate too.
I recommend searching Pragmatic Works free SSIS training webinars, and while you're at it see if they are offering an SSIS workshop in your area. Of course other people will have other preferences, and your mileage may vary.
Also, define 'advanced'.
I've also another question concerning the transit database is it gonna be multidimensional and have I to create an empty one first??
Not enough information here to give you an answer, and how about splitting this off as a separate question with details, as opposed to bundling multiple questions (with an unknown amount of follow-up questions) in a single SO question.
Good luck.
Am basically from Microsoft background working much on SSIS for ETL sought of project.
Now I got another project on hand to deal with loading of .csv files into MySql database. In process of loading these tables data has to go through some transformations and then into destination table. It is much of ETL project.
Client doesn't have SSIS (BIDS) and am compelled to use open source tools.
I did bit of research and found Talend Data Integration tool best fits for my situation.
As am new to this environment and am sure there are experts in this area, I need some advice on best tools to do ETL of this type and best practices.
If need any futher information please let me know.
If I remember correctly, PhpMyAdmin can import CSV into MySQL, and this question is about a similar topic too, but these don't come close to what SSIS can offer...
Yes you are right Talend Open Studio is pretty good tool with hundreds of connector,
in your case just create job which take CSV as your source and MySQl is destination apply any transformation if required and load it.
you can get more information on CSV to MySQL load with examples Talend forum
if you have any base plan then, share with me, I can guide you how to transfer CSV to MySQL table.
Basically have many huge delimited files that I know I can import as a table, but I need to map that data to an existing rational multi-table MySQL database. There should not be any conflict with datatypes, but I'm super new to this, so please point out anything I should be watching for. Clearly I'm not going to run this in production either until I know it works.
Not 100% sure stackoverflow is the right place to ask a database question, but I couldn't find any other Stack Exchange that was a better fit.
Posted this question on SuperUser looking for a GUI to do this, but I up for coding this is it gets the job done. As such there is no target language, just the requirement that the database be MySQL.
Also, found this stackoverflow Q/A that deals with MS-SQL's SSIS (which I'm not planning on using due to cost, but the content and issues faced are of the same nature it appears.) --
Loading Multiple Tables using SSIS keeping foreign key relationships
I'd suggest using the ETL(extract translate load) tool from the Pentaho Business Intelligence package. It's got a bit of a learning curve but it'll do exactly what you're looking for. Their ETL tool is called Kettle and it's extremely powerful once you get the hang of it.
There are two versions of Pentaho, an enterprise version that has a free trial, and a free community version. The community version is more than capable but you might give the enterprise version a test ride too.
Here's some links
Pentaho Community Edition Site
Kettle Site
Pentaho Enterprise Site
Update: Multiple table outputs
One of the key steps in your transformation is going to be a combination lookup-update. This step checks a given table to see if a record from your data-stream exists and inserts a new record if it does not. Regardless of whether it's a new or old record it's going to append the key field from that record into your data-stream. As you keep going you'll use these keys as foreign keys as you import data into related tables.
After looking at a lot of questions..i found no real answer for this.
I redisigned an Database for our customer.
With Microsoft Access i found a good Tool to get old table Data in my new well formed Database Structure. It is really easy but takes a lot of time (cause handling old Data with a lot of care).
Are there any Open Source Tools that bring that facilities like Microsoft Access?
To clear it up: I "just" want to reorder old Firebird Database Data in a new "best-practise" Way.
Edit:
I would be really nice if i can get a Log File or something similar to have some documentation on the changes.
Update:
After checking some of the Tools of that Wikipedia Site. I found no real Logging Mechanism.
How do you documentate the changes on a Database? Simply by writing it down?
Result:
So i dont got an real answer...i ma still searching for an nice tool. thnak you guys for the hints and your thoughts regarding this question. I want to reward Kenneth Cochran with the Bounty cause he pointed me to ETL. Thank you!
Talend's Open Source ETL supports FireBird. Very cool tool.
http://www.talend.com/download.php?src=DataGovernanceBlog
It sounds like what you're asking for is an ETL(extract, transform, load) tool.
Wikipedia has a list of open source tools that may help with this. I've not used any of them personally.
Well, I used the Pentaho suite for doing ETL using their Kettle tool.
It's quite easy to use and should be more than enough to reach your intent.
And it's open source.
Give a look at it.
I advice you to use a tool like IBExpert or Database Workbench which are the best tools for Firebird.
For migrating Firebird 1.5 to Firebird 2.1 : you just have to make a backup of your database with Firebird 1.5 server and restore your database with Firebird 2.1 server
I've used Excel in the past to document data model changes - each worksheet used the application version in order to sync with our tags in CVS. Every thing was logged in it - columns that were removed as well as minor alterations to datatypes like varchar(10) to varchar(20) etc along with a note describing why the change was made.
Personally, I've only ever scripted things like these as DDL/DML scripts broken into a script that dealt with table creation, constraint dropping, index drops, DML script(s), constraint application, index application, and removing orphaned tables.
If you want a basic ETL tool, that is client based (and cheap at $300), look at Advanced Query Tool. It mainly queries any type of ODBC connection(including Excel files set up that way), but also has some extended features, including moving data. And has a command line interface. http://www.querytool.com/
I've used it instead of Informatica for one-off jobs, but I've also used to extract from Excel to another file for business users, for a few months, scheduled from my desktop.