I have a bit of a problem. When I set up a SSIS package and i fire it off it shows me the amount of rows that is going into the SQL table, but when I query the table there is almost 40000 rows missing from what the last count was after the conditional split that I have in the package.
What causes this problem? Even if I have it on normal table or view it still does the same thing. But here I have to use the fastload option as it is a lot of source files being loaded. This is only testing before sending it to production and I am stuck at the moment. Is there a way I can work around this problem and get all the data that is supposed to be pumped into the table. please also take note that in the conditional split it removes any NULL values as seen in first picture.
Check the Error Output (under Connection Manager and Mappings) within Destination Component. If the Error setting is set to Ignore Failure or Redirect Row, the component will succeed, but only the successful rows will be inserted.
What is the data source? Try checking your data and make sure you don't have any terminators stored in one of the rows.
Related
Imagine that you want to save in a variable the number of rows the were updated or deleted in a table.
This is the steps that i did:
First, in the Control flow i created a Data Flow Task.
Them, in the Data Flow, i created a source(in my case is a excel file), then i proceeded to create two variables to count those rows- countDeleted and countUpdated, then connected the variables to two row count transformations, and them connected my destination (OLE DB).
Now in the control flow, what do i do??
Create a SQL execute task?? or a Script task?? What is the best way to do it?? What is the piece of code to use??
Thanks for youy help.
PS: i only have 4 weeks off SSIS, sorry for my noobieness :)
An OLD DB destination only inserts. It can't UPDATE or DELETE
What's your logic for updating or deleting?
If you're just starting out and reading about doing things in SSIS you will eventually find advice to use the OLE DB Command to perform row by row delete and inserts.
In my opinion this is to be avoided. It does not scale (works fine for small recorsets then fails for large recordsets), and it is difficult to maintain parameter mappings in the OLE DB Command. Although you should try it anyway to familiarise yourself with it.
My advice is to load the Excel data into a staging table, perform batch DELETE and UPDATE statements to load the data and use ##ROWCOUNT to capture the records updated.
For example;
Your existing described dataflow can be used to load into a table called StagingTable
Before your dataflow you should run an Execute SQL Task (This is in the Control Flow pane, not the Data Flow pane) that clears the staging table:
TRUNCATE TABLE StagingTable;
So first get that working - repeatedly running your package clears the staging table then loads Excel into it without creating duplicates
This in itself is a challenge as Excel is a terrible data interchange format.
Once you have that working, you add an execute SQL task to the end that runs some SQL that deletes the records you want and captures the count. For example:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
SELECT ##ROWCOUNT;
Then you follow the instructions here to load that back to your SSIS variable
http://microsoft-ssis.blogspot.com/2011/03/rowcount-for-execute-sql-statement.html
What are you doing with this row count? Are you writing it to a logging table? Save
yourself the bother of pulling it back into an SSIS variable and just write it directly:
DELETE FROM MyFinalTable WHERE PriamryKey IN (SELECT PrimaryKey FROM StagingTable);
INSERT INTO LogTable(Table,Operation,Type)
SELECT 'MyFinalTable','Delete', ##ROWCOUNT;
In my experience it is not a good idea to build convoluted logic into SSIS packages if you can instead do in a database. Although it does depend on the person who has to eventually maintain it. Hopefully you can appreciate that this T-SQL approach is a more straightforward code based approach as opposed to having to dig around in property pages and events and other places inside SSIS packages.
I assume that you're using an Execute SQL Task for the updates and deletes? As #Nick.McDermaid mentioned, using an OLE DB Command within a Data Flow presents various issues when performing DML. You can find the number of rows updated, inserted, or deleted in a table through an Execute SQL Task by using the ExecValueVariable property of this task. Set the variable that will hold the row count to this property and it will return the number of affected rows. Note that is will only return the number of rows impacted by the last statement in the Execute SQL Task, regardless of batches (i.e. GO separators) are in the component.
I've been searching for a quick way to do this after my first few thoughts have failed me, but I haven't found anything.
My Issue
I'm importing raw client data into an Access database where the flat file they provide is parsed and converted into a standardized format for our organization. I do this for all of our clients, but this particular client's software gives us a file that puts "(NULL)" in every field that should be NULL. lol as a result, I have a ton of strings rather than a null field!
My goal is to do a data cleanse of the entire TABLE, rather than perform the cleanse at the FIELD level (as I do in my temporary solution below).
Data Cleanse
Temporary Solution:
I can't add those strings to our datawarehouse, so for now, I just have a query with an IIF statement check that replaces "(NULL)" with "" for each field (which took awhile to setup since the client file has roughly 96 fields). This works. However, we work with hundreds of clients, so I'd like to make a scale-able solution that doesn't require many changes if another client has a similar file; not to mention that if this client changes something in their file, I might have to redo my field specific statements.
Long-term Solution:
My first thought was an UPDATE query. I was hoping I could do something like:
UPDATE [ImportedRaw_T]
SET [ImportedRaw_T].* = ""
WHERE ((([ImportedRaw_T].* = "(NULL)"));
This would be easily scale-able, since for further clients I'd only need to change the table name and replace "(NULL)" with their particular default. Unfortunately, you can't use SELECT * with an update query.
Can anyone think of a work-around to the SELECT * issue for the update query, or have a better solution for cleansing an entire table, rather doing the cleanse at the field level?
SIDE NOTES
This conversion is 100% automated currently (Access is called via a watch folder batch), so anything requiring manual data manipulation / human intervention is out.
I've tried using a batch script to just cleanse the data in the .txt file before importing to Access - however, this caused an issue with the fixed-width format of the .txt, which has caused even larger issues with the automatic import of the file to Access. So I'd prefer to do this in Access if possible.
Any thoughts and suggestions are greatly appreciated. Thanks!
Unfortunately it's impossible to implement this in SQL using wildcards instead of column names, there is no such kind syntax.
I would suggest VBA solution, where you need to cycle thru all table fields and if field data type is string, generate and execute SQL UPDATE command for updating current field.
Also use Null instead of "", if you really need Nulls in the field instead of empty strings, they may work differently in calculations.
I have a simple DataFlow with two objects the source which is a mdb file and the destination which is an MSSQL database.
The idea is to migrate the data from one to another.
The problem is that the data is extracted from an Access query, and one column has ~1000 characters, and in SSIS in advanced properties the external column has the default 255 length so when i execute the task it tries to truncate it. To disable the throw error on truncate is not an option, and modifying the Length of the external column cannot be done, it throws and error regarding the metadata.
First of all can anyone explain WHY?
Second of all i need a resolution and i need it fast because it's kinda driving me crazy.
This kind of problem occours, because the ssis task "guesses" the length of the column by inspecting the first 100(afaik) rows. So if all rows from 1 to 100 have a length of 10 and the row 101 has the legnth of 11, the task will fail, because the length was "guessed" to 10.
Modifying throws an error, because you have validateExternalMetadata set to true. To solve this problem, go to advanced options of your import task (access) and set the value to false.
This means, the task will accept modified values you entered without checking it.
Did you try to SSIS Import and Export Wizard to import the data, from within the BI development environment? That is the easiest way with MsAccess as this not only imports the data but also saves the package. If you get an error during the import ( using the wizard), please post it, as this helps in further investigation. Also, as #stb suggested, try having the first record over 1000 characters.
Access supports queries which are the equivalent to views in MSSQL.
The column size is defined not by looking at a few results but by the default column length of the column data type.
I created another table with the desired data types and before the data flow i've put in the package 2 sql scripts: one to delete all the data in the table and one to execute the query against the table, as to treat it as a temporary table.
Then the actual data flow is executed against this pseudo-temporary table.
This solved my problem.
I have no idea whether this can be done or not, but basically, I have the following data flow:
Extracts the data from an XML file (works fine)
Simply splits the records based on an enclosed condition (works fine)
Had to add a derived column object due to some character set issues (might be better methods, but it works)
Now "Step 4" is where I'm running into a scenario where I'd only like to insert the values that have a corresponding match in my database, for instance, the XML has about 6000 records, and from those, I have maybe 10 of them that I need to match back against and insert them instead of inserting all 6000 of them and doing the compare after the fact (which I could also do, but was hoping there'd be another method). I was thinking that I might be able to perform a sql insert command within the OLE DB DESTINATION object where the ID value in the file matches, but that's what I'm not 100% clear on or if it's even possible for that matter. Should I simply go the temp table route and scrub the data after the fact, or can I do this directly in the destination piece? Any suggestions would be greatly appreciated.
EDIT
Thanks to the last comment from billinkc, I managed to get bit closer, where I can identify the matches and use that result set, but somehow it seems to be running the data flow twice, which is strange.... I took the lookup object out to see whether it was causing it and somehow it seems to be the case, any reason why it would run this entire flow twice with the addition of the lookup? I should have a total of 8 matches, which I confirmed with the data viewer output, but then it seems to be running it a second time for the same file.
Is there a reason you can't use a Lookup transformation to find existing records. Configure it so that it routes non-match records to the no match output and then only connect the match found connector to the "Navigator Staging Manager Funds"
I believe that answers what you've asked but I wonder if you're expressing the right desire? My assumption is the lookup would go against the existing destination and so the lookup returns the id 10 for a row. All of the out of the box destinations in SSIS only perform inserts, so that row that found a match would now get doubled. As you are looking for existing rows, that usually implies you'd want to perform an update to an existing row. If that's the case, there is a specially designed transformation, the OLE DB Command. It is the component that allows for updates. There is a performance problem with that component, it issues a single update statement per row flowing through it. For 10 rows, I think it'd be fine. Otherwise, the pattern you'd use is to write all the new rows (inserts) into your destination table and then write all of your changed rows (updates) into a second staging-type table. After the data flow is complete, then use an Execute SQL Task to perform a set based update statement.
There are third party options that handle combined upserts. I know Pragmatic Works has an option and there are probably others on the tasks and components site.
I'm quite new to SSIS - using 2008 version.
I have a job that uses a few data flow tasks. On the third one I'm getting a primary key violation on the last row that it needs to insert, but only sometimes!
I'd like to ignore this problem for now and let the job continue. I have set the MaximumErrorCount property to 10 for the DataFlowTaks, the SequenceContainer and for the Package but still taks fails and this causes the package to stop.
Could anyone please advise how I can get the package to ignore the error?
Thanks
Rob.
That error count refers to the number of Tasks that SSIS will allow to error before it stops the package. You're wanting to allow a set number of rows to error - and that's not what it's counting.
Instead, you should go into your Destination and configure the Error Output on that destination to either ignore errors, or redirect errors (better). You can then pull a red arrow off the bottom of the destination component to a Derived Column (or any other type of component that doesn't need to attach its output to anything), and put a Data Viewer on that red link. Now all the rows that fail will go to the Derived Column, and show up in a Data Viewer for you to see (while in BIDS).
The other thing you'll have to do is change the Batch Size on the OLE DB Destination (if that's what you're using) to 1 so that it only inserts one row at a time. Otherwise, it will fail the whole batch that contains the error...