Slowely changing dimension task is not working in data flow task.. but if I remove it and use the OLEDB destination it is inserting the data. I have no idea why it is not inserting data into the table. SCD and leaf task is also not converting to yellow or green color.
I have to do insert & update on the table based on two columns.
This is what I have tried:
When I have used only OLEDB destination - everything works fine
When I am doing look up and sending matched output to command task (where I am using the update command) and no match to OLEDB destination - nothing works.
When I am using SCD and using two columns as business key and other as changing attribute - nothing works.
with "nothing works" I mean, even these task are not converting in yellow or green.
Any idea what could be the reason for this ?
Even when you get it to work, you will realise that it's slow as hell and missing lots of configuration options.
Therefore I recommend this handy component initiated and recommended by the godfather of ETL & data warehousing, Ralph Kimball.
I personally will never return to the SSIS standard component. Try it, it's awesome.
Related
I have a package that is essentially trying to copy 26 tables from oracle to sql server.
Its not a complete table copy, we are looking for records that belong to certain 'Regions' of our company.
I pull the data from oracle
I started just doing this with elbow grease, but each of the 26 tables required several variable to do the deletes, the fetches etc.
Long story short, I decided to use variables to represent the table names (source, temp and target).
This allowed me to copy/paste one sequence and effectively bypass a lot of click click in bids.
The problem I am running into is that the meta data seems to be very fragile. Sequences all seem to run on their owwn, but when i run the whole package, it breaks. And never in the same place.
Is this approach just a bad idea w/ SSIS?
So just to take this off the board....
Each sequence container had the following ops
Script task - set variables
Execute SQL task - delete from temp
data flow SourceToTemp -
ole db source - used a generic select * from tbl to temp_tbl
derived column1 - insert a timestamp column
oledb destination - map all the columns into a temp table (**THIS IS THE BIG PROBLEM CHILD)
Execute SQL task - delete from target
Execute SQL task - insert target select from temp
the oleDB destination is the piece that kept breaking.
Since it references variables, I had to be very careful at design time to set the variables correctly before opening one of the data flows.
I am pretty sure this is the problem. Since I can not say w/ certainty when SSIS refreshes meta data in the design environment, I cant be sure if/when sequence X refreshed while the variables were set to support sequence Y.
So while it conceptually should work at run time, dev time is a change control night mare.
I have changed all the oleDB destinations to point to a hard table name. this is really a small concession since there are 4 sql statements that are still driven by variables. (saving me a lot of clicking and typing)
This small change has eliminated the 'shifting sands' problem.
Take-a-way lesson : dont have an oledb destination be basesd on a variable.
thanks for the comments
I'm trying to figure out how to convert my existing DTS files to DTSX hosted on a SQL 2005 server.
In my first try (first DTS) I'm already stuck.
I don't wanna look how things are done using DTS and wanna focus on the new DTSX techniques.
What has to be done.
Check if input file exists else exit.(not done yet)
Truncate destination table
Import file into DB
Report if everything was alright.
step import file is where I'm stuck. I have a fixed columns sized flat tekst file where housenumber and extension are in a single column. The Database has two columns for it.
I first tried a derived column but could find a check for splittng the (first) numeric part.
When searching for the use of regex i read about the "script component" which i read isn't compatible with SQL 2005.
Is there another possibility?
This brings me to a second question: Is it possible to use SQL Server Data Tools (SSDT) with SQL Server 2005.
You have a job ahead of you. You will need to use Derived Column transformations, so look them up. It will be helpful to add Data Viewers to your dataflows so you can see what data is moving through the flow, and what SSIS thinks is in there.
In your case, you are going to have to manipulate strings. There are two string data types, DT_STR and DT_WSTR (e.g., VARCHAR and NVARCHAR). SSIS is very particular about data types, and you may have to convert one type into the other using cast operations. E.g., (DT_WSTR, 50)Blah converts fifty characters from column Blah into DT_WSTR. The DT_STR type also needs to know the code page,. e.g. something like 1252.
You will likely need the SSIS version of Immediate If, which checks if a condition is true, and returns either one result or the other: Blah==100?1:0 The result of this will be a 1 for those records where column Blah equals 100, and 0 for all other records. You can nest these Immediate If statements, with one inside the next, inside the next, adn so forth.
You will need at least two new columns being created in the Derived Column widget, one for the numbers, and one for what follows. So here is one very painful way to do what you want. Use a string function to check whether the first character is a number, and then either return it, or don't. Do the same for the second character, etc.
Of course, I'm sure there's a better way to do this. Your toolbox consists of the functions in the Derived Column widget, so that's what you've got to work with. (Or, alternatively, you might do better in this case with a SQL UPDATE statement, which you would execute as a subsequent task in the Control Flow above, not in the Data Flow below.)
Fair warning: the SQL Server 2005 version of SSIS has many, many, many bugs and frustrations which were fixed or improved in later versions of SQL Server. Even if you just go to SQL Server 2008 you will save yourself many headaches.
I have a bit of a problem. When I set up a SSIS package and i fire it off it shows me the amount of rows that is going into the SQL table, but when I query the table there is almost 40000 rows missing from what the last count was after the conditional split that I have in the package.
What causes this problem? Even if I have it on normal table or view it still does the same thing. But here I have to use the fastload option as it is a lot of source files being loaded. This is only testing before sending it to production and I am stuck at the moment. Is there a way I can work around this problem and get all the data that is supposed to be pumped into the table. please also take note that in the conditional split it removes any NULL values as seen in first picture.
Check the Error Output (under Connection Manager and Mappings) within Destination Component. If the Error setting is set to Ignore Failure or Redirect Row, the component will succeed, but only the successful rows will be inserted.
What is the data source? Try checking your data and make sure you don't have any terminators stored in one of the rows.
I have no idea whether this can be done or not, but basically, I have the following data flow:
Extracts the data from an XML file (works fine)
Simply splits the records based on an enclosed condition (works fine)
Had to add a derived column object due to some character set issues (might be better methods, but it works)
Now "Step 4" is where I'm running into a scenario where I'd only like to insert the values that have a corresponding match in my database, for instance, the XML has about 6000 records, and from those, I have maybe 10 of them that I need to match back against and insert them instead of inserting all 6000 of them and doing the compare after the fact (which I could also do, but was hoping there'd be another method). I was thinking that I might be able to perform a sql insert command within the OLE DB DESTINATION object where the ID value in the file matches, but that's what I'm not 100% clear on or if it's even possible for that matter. Should I simply go the temp table route and scrub the data after the fact, or can I do this directly in the destination piece? Any suggestions would be greatly appreciated.
EDIT
Thanks to the last comment from billinkc, I managed to get bit closer, where I can identify the matches and use that result set, but somehow it seems to be running the data flow twice, which is strange.... I took the lookup object out to see whether it was causing it and somehow it seems to be the case, any reason why it would run this entire flow twice with the addition of the lookup? I should have a total of 8 matches, which I confirmed with the data viewer output, but then it seems to be running it a second time for the same file.
Is there a reason you can't use a Lookup transformation to find existing records. Configure it so that it routes non-match records to the no match output and then only connect the match found connector to the "Navigator Staging Manager Funds"
I believe that answers what you've asked but I wonder if you're expressing the right desire? My assumption is the lookup would go against the existing destination and so the lookup returns the id 10 for a row. All of the out of the box destinations in SSIS only perform inserts, so that row that found a match would now get doubled. As you are looking for existing rows, that usually implies you'd want to perform an update to an existing row. If that's the case, there is a specially designed transformation, the OLE DB Command. It is the component that allows for updates. There is a performance problem with that component, it issues a single update statement per row flowing through it. For 10 rows, I think it'd be fine. Otherwise, the pattern you'd use is to write all the new rows (inserts) into your destination table and then write all of your changed rows (updates) into a second staging-type table. After the data flow is complete, then use an Execute SQL Task to perform a set based update statement.
There are third party options that handle combined upserts. I know Pragmatic Works has an option and there are probably others on the tasks and components site.
I would like to make a package that would copy data from a table only if table is not empty. I know how to do count and how to make a package for copying data but problem is that Source can't have any inputs so I don't know how to do it. Any suggestions?
I don't understand your comment about dragging a "green line from a package to a source" but instead of trying to determine in advance if the table is empty, just do your copy anyway and then see how many rows were copied:
Create a package variable for the rowcount
Populate the variable using the rowcount transformation
Use an expression in the precedence constraint to check the variable: if it's greater than zero then continue executing the rest of your package
#Pondlife I don't think you can use precedence constraint on the data flow task, can you?
I believe you can use it only on the control flow.
I would add a "Execute SQL Task" with the count, sending the result to a variable and from this task, I would drag the green arrow to the Data Flow task that makes the copy and on this arrow I would add the expression on the precedence constraint.
As you have correctly noted, a data flow source does not accept input so one cannot perform logic in the dataflow to determine whether this task should run.
Cannot create connector.
The destination component does not have any available inputs for use in creating a path.
However, there's nothing stopping you from setting up this logic in your control flow. I would use a query that hits the DMVs for a fast rowcount on the destination system, filtered to only the tables I wished to replicate.
Armed with the list of empty tables, it'd probably depend how I'd handle it. For a small number of tables, I'd define N dataflows all with a do nothing script task as a precedent and then use an expression on table name to enable a path, much like I did on this question.
If there are many tables, I'd define a package per table and then invoke execute package task with the package name built dynamically based on the empty table name.