Being an SSIS newbie, I am trying to figure out the best possible way to transfer multiple tables. I am trying to import multiple tables from one database to another. I could write multiple parallel data flows for each table, however, I want to be smart about it.
For each of the tables, If I were to generalize,
I need to transfer rows from one table to a table in another database
I need to count the number of rows transferred
Have to record the start and finish time of the data transfer for each table
Record any errors
I am trying not to use Stored procedures since I want people to not have to dig deep into the DB to get the rules for this transformation. I would ideally like to have this done at the SSIS level using the components that therefore can be seen visually and understood.
Any best practises that people have used before?
I would ideally want to do something like
foreach (table in list of tables to transfer)
transfer (table name)
To make a generic table handler you would have to programatically construct the data flow. AFAIK SSIS has no auto-introspection facility. A script task will allow you to do this, and you can get the table metadata from the source. However, you will have to programatically construct the data flow, which means fiddling with the API.
I have worked on a product where this was done, although I didn't develop that component, so I can't offer words of wisdom off the top of my head as to how to do it. However, you can find resources on the web that explain how to do it.
You can find the table structure and types of the columns by querying against the system data dictionary. See this posting for some links to resources describing how this, including a link to a code sample.
What is your destination database doing with this info? Is it simply reading it?
Perhaps you would be best served by replicating the tables.
You could create a config table that has a list of your tables you want to move and then use a for loop to do something repeatedly....but what to do.
http://blogs.conchango.com/jamiethomson/archive/2005/02/28/SSIS_3A00_-Dynamic-modification-of-SSIS-packages.aspx
Below the bullet points, he states that SSIS cannot be modified to change metadata at run time. And to make it easy to maintain....you're going the wrong direction.
I'd keep it simple and use the wizard and then customize with logging/notifications etc.
Maybe you can call the stored procedure inside of your ssis scripts. Here is an example of how you might be able to use the sp
http://blog.sqlauthority.com/2012/10/31/sql-server-copy-data-from-one-table-to-another-table-sql-in-sixty-seconds-031-video/
Related
I want to build an application that uses data from several endpoints.
Lets say I have:
JSON API for getting cinema data
XML Export for getting data about ???
Another JSON API for something else
A csv-file for some more shit ...
In my application I want to bring all this data together and build views for it and so on ...
MY idea was to set up a database by create schemas for all these data sources, so I can do some kind of "import scripts" which I can call whenever I want to get the latest data.
I thought of schemas because I want to be able to easily adept a new API with any kind of schema.
Please enlighten me of the possibilities and best practices out there (theory and practice if possible :P)
You are totally right on making a database. But the real problem is probably not going to be how to store your data. It's going to be how to make it fit together logically and semantically.
I suggest you first take a good look at what your enpoints can provide. Get several samples from every source and analyze them if you can. How will you know which data is new? How can you match it against existing data and against data from other sources? If existing data changes or gets deleted, how will you detect and handle that? What if sources disagree on something? How and when should you run the synchronization? What will you do if one of your sources goes down? Etc.
It is extremely difficult to make data consistent if your data sources are not. As a rule, if the sources are different, they are not consistent. Thus the proverb "garbage in, garbage out". We, humans, have no problem dealing with small inconsistencies, but algorithms cannot work correctly if there are discrepancies. Even if everything fits together on paper, one usually forgets that data can change over time...
At least that's my experience in such cases.
I'm not sure if in the application you want to display all the data in the same view or if you are going to be creating different views for each of the sources. If you want to display the data in the same view, like a grid, I would recommend using inheritance or an interface depending on your data and needs. I would recommend setting this structure up in the database too using different tables for the different sources and having a parent table related to all them that has a type associated with it.
Here's a good thread with discussion about choosing an interface or inheritance.
Inheritance vs. interface in C#
And here are some examples of representing inheritance in a database.
How can you represent inheritance in a database?
I have 10 tables I am importing to another sql server database using SSIS.
Do I have to create 10 different Dataflow tasks or can I proceed with one Dataflow task and add the 10 tables to it?
I have tried to use a single dataflow task but it is only allowing for a single table.
Do all the source tables share one common schema?
Do all the destination tables share one common schema (which doesn't have to be the same as the common schema for the source tables)?
If the answer to both questions is "yes", then you can in fact write a single Data Flow Task (whose connection managers are parameterized) and put it in a Foreach Loop container.
If the answer to either (or both) of those questions is "no", then you'll have to have separate sources and destinations. You might want to investigate Business Intelligence Markup Language as a way to generate those data flows automatically, although it's probably overkill for "only" ten tables.
The answer depends upon you and your best practices and how many developers you will have working on projects at the same time.
It is entirely possible to put more than one set of tables in a single dataflow. You can simply add additional sources and destinations to your dataflow. However, this is almost never a good idea as it adds to the maintenance effort later in the lifecycle of your project. It makes it more difficult to find and debug errors. It makes the entire project more complex.
If you are working alone and you will be building and maintaining this project's full lifecycle by yourself, then by all means do whatever you feel most comfortable with.
If you are in a group that may all maintain this project, I would suggest that you, at a minimum, break out the dataflow to different tables into different dataflow tasks.
If you are in a larger group and for more flexibility in maintenance, I would suggest that each dataflow be broken out into a different package (assuming 2008 or below. I have not played with the 2012 project models yet, so won't comment on them here), so that each can be worked on by different developers simultaneously. (I would actually recommend coding this way even if you are the only one on the project, but that is just the style I have developed over my career.)
I am working on project in economy where I need to manage large data in linked tables, with many foreign keys.
I have a few years of experience as Oracle DBA so I can manage all that without problem,
But I need to be able to share the data with others that have no knowledge in databasing,
There for I need to get them some graphic way to show the data.
Toad does all what I need and much more, but the program is too complicated for my needs.
Instead of wasting time on writing a program in C# that manages the data, I am looking for a good program that:
Shows Main table/view data with option to filter/group/order with drop-box options
Selects a row and shows data in linked tables by the selected row's data
Makes report/data pages of my selected rows that I can adjust and print. (I know it's a bigger request)
In Addition, what is the most fit and easy DB for it? I worked only with Oracle but it's sure too much for that, so MySql? Access (I tried first to do all that in Access program but It's just too hard to adjust forms and so, and in the end you cannot make a publish version.
And the tables will be with start_date and finish_date that follow each other for same ID,
But I don't think it will be a problem that I can maximum make view that shows only the last one. (BTW, what is the name of such table, I never knew that)
Take a look at Tableau http://www.tableausoftware.com
It will let users view the data many different ways, makes great visualizations and works with most databases. It's a read-only tool, so it's perfect for safe reporting, but you'll need to couple it with something else if you want your users to make changes to the data.
You can define a connection in Tableau that sets up the relationships for users that don't understand foreign keys and the like. Or make views that hide those details, of course, regardless of the tool you choose.
Does anyone have any pointers on how to go about creating a "wizard interface" using Access 2010? I need a sequential set of forms that will be capable of branching the flow based on answers from the user and data found in the database. I have used Access before for some CRUD/Reports type of applications, but in this case I can't seem to wrap my head around how to get started on such a complex machine.
Before anyone suggests it, I cannot use anything but Access due to client requirements.
I feel your pain ... working with Access gets so difficult where there are complex requirements.
Gather and document the requirements
Make sure you've teased out every possible wrinkle and contingency from the client, and put it into a flow chart or something.
Extract the models
Figure out what models are being used -- customers, addresses, vendors, products, etc. These will have to be created as tables or adapted to existing ones.
Extract other variables
What could potentially change over time and/or what will the client want to be able to change via an admin screen? You'll have to decide which of these variables to put into tables, and which are ok in the code (form logic and/or VBA).
Design the tables for the wizard views
I imagine you'll want a wizard screens table, where each row corresponds to a step; each should have (other than an id column) a previous screen column, and a form name or form template name column. You'll need a second table choices with a many-to-one foreign key linking back to screens; each row here will correspond to a possible outcome of the view, and the target next step in the wizard.
Design the forms
Finally, design the forms corresponding to each wizard step or template, pulling data from the structures in 1-4 as needed.
I recently inherited a website and they have a simple back-end area which was created using phpmaker. The back-end displays various MYSQL database tables.
There are two tables which hold registration information related to promotions/contests the company runs online. The client wants to begin archiving the registration data monthly, but still have the data accessible for future export or review.
So, can anyone tell me what the best approach would be to achieve this? I read about partitioning and Maatkit, but I'm not sure which - if either - would be a smart choice.
I would prefer to keep the table names the same because the table name is referenced in several instances within the PHP code running the promo/contest applications. I would also like for everything to be 'automatic' or at least executed at the click of a button; though I realize that might not be completely realistic.
I should note that I do not have the phpmaker project file and have been unable to obtain it.
Any help on this matter would be a great help.
MK-Archiver This is a good way to archive live mysql database tables
What MK- Archiver does is to archive rows from a table to another table and/or a file