I am stuck on what is ultimately a simple task.
I have a process which loads files.
The process loads these files inside a for each container.
I need to rowcount the file that is currently being processed inside the for each container and if it is over a certain number of rows then fail the file.
I have tried a control flow task but that would ultimately bypass the for each loop.
The file currently being processed is determined via a variable in the for each container, and that is the one i need to count.
Any help would be appreciated.
Cheers
I would add a separate data flow in the For..Each to count the records and then have a Exprression and constraint linking to your main process so that you only process record counts > 0. Here's a rough layout ..
Related
I've been using LOAD CSV for some time now with neo4j to import data but I think, not sure, I noticed that the LOAD CSV will start importing rows from the bottom of the csv file.
Or is it completely random?
I'm trying to create an (org)-[:has_suborg]->(subOrg) relationship while I'm processing each row but I want to make sure that the parent orgs are created first to avoid exceptions/errors when a sub is attempted to be related to a parent org and the parent org is not present yet.
If rows are processed from the top or the bottom I can then make sure that my csv records are already sorted the way I want them processed.
Thanks in advance
The CSV will be processed from top to bottom - in that order. What might be worth considering is doing a double load of your data.
First pass just CREATE/MERGE your org nodes. Second pass, MATCH the org nodes, then create the rest of the data.
Using this approach you will avoid any potential order issues, as well as dodging eager queries.
I have an excel spreadsheet with multiple sheets. So In For Each loop container have a script which is reading the sheets and saving them to the variable. In dataflow still inside of the For Each loop container is the process which leads to recordset destination which is saving all the columns to another variable. Then outside of the for each loop container is another dataflow which has to read from the variable all rows check for duplicity (in second and 3 sheet is duplicit product id), remove duplicity and upload data into database. I have been searching everywhere and cannot find how to setup the recordset destination to not replace the variable but append it with the values, because end up only with last sheet of data.
Cannot be doing changes in the foreach loop container in settings because of the looping through the sheets.
Thank you in advance for any advise.
While hopefully someone wiser in SSIS will chime in here, but I don't think your current approach will work.
Generally speaking, you can use Expressions within SSIS to get dynamic behaviour. However, the VariableName of the Recordset Destination does not support that.
You might be able to have a Script Task after the Data Flow that copies from rsCurrent into rsIteration1, rsIteration2, etc based on the current loop but at that point, you're double copying data for no real value.
Since you're doing a duplicate check, perhaps a read of sheet 1 goes into a Cache Connection Manager
And then the read from subsequent pages will use the CCM as the lookup. For rows that have matches, then you know you have duplicates (or maybe you only import what doesn't match, I don't quite get your logic)
Debugging some of this is going to be quite challenging. If at all possible, I would stage the data to tables. There you could load all the data + the tab name and then you can test your deduplication and refer back to your inputs and outputs.
The tooling for SSIS variables of type Object is pretty limited, which is a pity.
I have a process that outputs multiple instances of the same report. Something like packing-slips for instance. Right now it is relatively slow because the whole process is done for each iteration. Creating the data calling the ssrs report, output it clean-up the report instance and data and start again. It has a lot of overhead and unnecessary repetition.
An idea i have is to create all the data beforehand and add a document number to the dataset so to differentiate between documents. And call the ssrs report 1 time to output all the documents contained in 1 big document. So all the documents separate with their own header and footer but created in 1 go and in 1 file.
The thing i am looking for is to iterate on document level.
Is something like this possible with ssrs?
I have been googling and is it something i could solve with sub reports?
Yes, you can do this with SSRS. The way I usually go about this style of a report is to first create a table with a single cell in it. Set the row to be grouped by document ID. Set the group to page break between instances. Now place a large rectangle inside the cell. Place all the elements you want to have on each page inside the rectangle. Note that you would not use actual headers and footers, just textboxes, tables, etc. No sub-reports needed.
Now when you run the report, you will get one copy of the entire layout per page and each page is naturally filtered to the containing document ID. Since you are letting the report split up the data for you, you can let your dataset be one large query instead of many smaller ones. This will drastically improve your efficiency.
I have a folder with a lot of data files in. I want to be able to loop through the files, look at the headers and sort them into folders if they have the same headers. Is that possible to do in SSIS? If so would anyone be able point me the direction of how to do this?
I am going to try and explain this as best I can without writing a book as this a multi stepped process that isn't too complex but, might be hard to explain with just test. My apologies but I do not have access to ssdt at the moment so I can not provide images to aid here.
I would use the TextFieldParser class in the VisualBasics.dll. in a script task. This will allow you to read the header from file into a string array. You can then build the string array into a delimited column and load an object variable with a datatable that has been populated with two columns. The first column being the filename and the second being the delimiter headers.
Once you have this variable you can load a sql table with this information. (optional to skip if you want to load the columns directly into sql as you read them. your call)
Once you have your sql table you can create an enumerator for that dataset based on the unique headers column.
Then use a foreach loop task with script task to enumerate thru the unique header sets. Use a sql task to assign the file names that belong to the unique header set.
Within the script loop thru the returned file names and apply the necessary logic to move the files to there respective folders.
This is sort of a high level overview as I am assuming you are familiar enough with SSIS to understand the steps necessary to complete each step. If not then I would be able to elaborate later in the day when I am able to get to my SSIS rig.
I have a file that contains a header record in the 1st row, and I need to evaluate only that one row. The rest of the rows are detail records. I have it set up now as a flat file source component into a conditional split. The job of the conditional split is to look for the string "header" and then read the next column to get the count. I send the header record on and discard the rest.
This works fine, but for my bigger files (300k-800k rows, 100MB-900MB) I have to wait for the conditional split to evaluate all 300k rows, and this takes time.
Is there a better approach? I guess I could use a script component and break after the 1st row, but I'd like to see if there's a built in solution before I script it up.
Wondering if anyone knows of an alternative.
Go for the script component. This is the simplest solution for this task.