SSIS Loop container rename file name on a Monday - ssis

I am busy setting up an SSIS, I am using a foreach/loop container to get a csv, it places it in a staging SQL, I then do my normalization with joins, push to production, archive my files again with a loopcontainer and variables and clean up my staging enviroment.
All working perfectly, except on a Monday (or after a off day,) my files have the same name everyday and I am adding a time stamp.
The issue is getting the files after an off day i need too get 2 files or more, Windows adds a incrementing number in brackets and although I can get this file imported, I cannot get the rename loop to find this file (I will add my move to staging and rename are in different dtsx steps,) as a result these files are not renamed and then not move to archive.
I am guessing the issue is my filename variable as it is static, IO likely need a wildcard FileImport.csv is my normal file and my variable but I sometimes have FileImport (1).csv and likely need something equivilent to FileImport*.csv.

Source Connection
Figured it out, firstly fix the connection to point at the folder, then instead of using a variable to for the source connection, use the connection manager, working as intended now

Related

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

Adding static files to Talend jobs

I'm using Talend Open Studio for Big Data and I have a job where I use tFileInputDelimited to load a CSV file and use it as a lookup with a tMap.
Currently the file is loaded from the disk using an absolute path (C:\work\jobs\lookup.csv) and everything works fine locally.
The issue is that when I deploy the task, it obviously doesn't take the lookup.csv file with it.
Which begs a question:
Is there any way to "bundle" this file (lookup.csv) into the job so I can later deploy them together?
With static data such as this your best bet is to hard code the data into the job using a tFixedFlowInput instead.
As an example, if we want to use a list of country names, their ISO2 and ISO3 codes you might have these in a CSV that you'd normally access with a tFileInputDelimited. However, to save bundling this CSV with every build (which could be done with ANT/Maven) you can just hard code this data into a tFixedFlowInput:
You then just need to make sure your schema is set up as the same as your delimited file would have been (so in this case we have 3 columns: Country_Name, ISO2 and ISO3).

The connection "C:\\<path>\\*.txt" is not found. This error is thrown by Connections collection when the specific conn element is not found

I developed a SSIS package that creates several .txt files. These files are zipped and then the .txt files need to be removed. Using a foreach file enumerator, I loop through all the .txt files for a specific folder. The folder is retrieved from a variable in configuration and looks something like: C:\Folder\
The foreach loop uses: *.txt to gather all .txt files, does not traverse subfolder and uses the full qualified name.
In the Variable Mappings the "FileName" variable gets filled with the 0 index.
Within the foreachloop I use a File system task.
This task removes the .txt files which are generated before, using the FileName variable that is filled in the loop.
On the development machine this runs like a charm. All greens, no problem at all. Now I copy the package and the configuration file to the test environment. A basic version without the file removing was running perfectly fine here. I replaced the package. Nothing big.
Now I run the SQl Server Agent Job and it starts running. I can see all the text files appearing, and disappearing after it created the zipfiles. However, when all files are removed the package results with errors. Namely the error shown above in the title.
I tried looking for the connectionmanager that might have been removed
Looked for connection managers named in the config that don't exist in the package.
No such thing found. Annoying part is that the package is fully functioning, but still results with the error.
EDIT: I noticed that if I run the package using the execute package utility with the dev. config it gives the same errors.
Hopefully someone is able to help me out.
Thanks in advance!
I managed to "fix" the issue. Remove the File System Component responsible for deleting the files. Then add it again and configure it again.
I think this happens if you accidentally change General parameters before changing the Operation parameter. It holds the metadata to irrelevant parameters and upon execution says: "Wait, you defined this parameter but I don't need it, but I'm checking for it anyway, and it's not there!"
It's a bug for sure

Force delete a file using SSIS on a network location

I am facing a problem while deleting a file on a network location using SSIS, since its a zip file, contains monthly SQL Database backup file, so I need to delete the last month file before copying current month file.
May be there is some app which were using this file, I am not sure, but I wanna get rid of this file, so that I can copy new file.
Thanks
Use a file task and you should be able to delete pretty much anything on any location as long as you have the rights to do so.

How do I load Excel files selectively?

I have an SSIS package that needs to lookup two different types of excel files, type A and type B and load the data within to two different staging tables, tableA and tableB. The formats of these excel sheets are different and they match their respective tables.
I have thought of putting typeA.xls and typeB.xls in two different folders for simplicity(folder paths to be configureable). The required excel files will then be put here through some other application or manually.
What I want is to be able to have my dtsx package to scan the folder and pick the latest unprocessed file and load it ignoring others and then postfix the file name with '-loaded' (typeAxxxxxx-loaded.xls). The "-loaded" in the filename is how I plan to differentiate between the already loaded files and the ones yet to be loaded.
I need advice on:
a) How to check that configured folder for the latest file ie. without the '-loaded' in the filename and load it? ..and then after loading it, rename the same file in that configured folder with the '-loaded' postfixed.
b) Is this the best approach to doing this or is there a better way?
Thanks.
You can do it this way, but it might require several complex string expressions.
E.g. create a ForEach loop over .xls files, inside the loop add an empty script task, then a data flow to load this file. Connect them with a precedence constraint and make it conditional: precedence constraint expression will the check if file name does not end with -loaded.xls. You may either do it in script task or purely using SSIS expression on precedence constraint. Finally, add File System Task to rename the file. You may need to build new file name with another expression.
It might be easier to create two folders: Incoming for new unprocessed files, and Loaded for the files you've processed, and just move the .xls to this folder after processing without renaming. This will avoid the first conditional expression (and dummy script task), and simplify the configuration of File System task.
You can get the SQL File watcher Task and add it to your SSIS. I think this is a cleaner way to do what you want.
SQL File Watcher