Data flow task not getting completed and flat file generated is in locked state - ssis

I have a SSIS package deployed in 64-bit machine. The package is running fine if there are less number of records to be extracted and written to a file. We are using a data flow task for writing into file. However when we are runnning the package for large data extract, the data flow task is not getting completed and the file is getting locked. Please suggest a solution for this.

Are you logging the progress of your package? Do you see anything in there? If not, can you log the progress?
If you are writing to a UNC, I would suggest writing locally and then moving the file where you want.

Some debugging directions: Check if you are getting same problem with a different type of target like excel or SQL table. Check which process is using the file currently (use tools like ProcessXP for this) and check if file is showing any contents at any intermediate stage. By the way are you using transactions?

Related

BIML Scripts fails package generation on second attempt with no changes to files

First time posting a question, please forgive if I don't have enough information.
I have a tiered BIML Script that has the following tiers:
10-Connection – create the connection nodes
20- Model – loop through the connections to build database, table nodes
30-Create/Drop Staging Tables – This is included as the package/s need to be run prior to the remainder of the creation process.
30- Flat File – loop through the table object to create flat file formats and connections
40-Packages – loop through the table objects and create extract and load packages
45-Project Params & Connections– attaches the project params and connections (using named connections and GUIDs from 10-Connections). Project params manually created in SSIS.
The process successfully connects to the source SQL Server database, generates the Create/Drop Staging Tables packages with correct metadata, and will create the extract packages successfully the first time.
Upon a second attempt to process the same BIML scripts with no changes made to the files, the process fails with “Object reference not set to an instance of an object.” & “Unable to Query on Connection” on the OleDBSource Node.
The BIML files generated in preview and output debugging have valid queries and source metadata that indicate a positive connection and proper model. I have used emitted queries in SSMS without error. When I move the BIML files to a new project, the process is successful the first time and fails subsequently.
I have tried the following:
Connection Managers
Delete project connection managers prior to package re-generation
GUIDs annotated and used in PackageProject and Packages Nodes.
Delay Validation/Validate External Metadata – I have tried with both true and false on Package, DFT and OleDBSource
Project
Delete .proj files from directory
Direct PackageProject to new ProjectSubpath
I also tried simply hard coding the BimlScript to simplify and remove any variables with the same result.
The most maddening point is that the metadata and queries all indicate the process can connect and query this exact table and it functions, but only on initial creation. Adding or re-generating during testing fails. Anyone ever come across this before?
Great thanks and shout out to cathrine-wilhelmsen, billinkc, whose posts and tutorials have been very helpful. Any and all help would be greatly appreciated.
I changed the driver from SQLNCLI11 to SQLOLEDB with no changes to code. I tested different drivers after seeing a few example connection strings with different drivers.
I wish I could explain why.

Feedback requested for SSIS Master package design - Running a bunch of Sub-Packages

Overall, I am looking for feedback regarding two different design options of running a master package.
I have one package that Agent calls that runs a bunch of packages that process data (I think we are up to about 50 now).
The original design was to group packages into smaller chunks called directorates which call the actual packages. Sample below:
A few perceptions I see (and experienced) with this approach is that:
1. Every package has to open (even if it is unnecessary to run ie no file present)
2. #1 adds so much time for the process to complete
3. Runs in parallel for sure
So I developed a new approach which will only run the packages that have the necessary files and logs the attempt if not. It is so much cleaner and you don't need all the file connections for each package to run since you are iterating through them.
I am not sure it runs in parallel (I actually doubt it).
I am adding the dataflow that populates the ADO Object that is being iterated in foreach to demonstrate the files being processed.
Note: Usually in DEV environment there are not many files to be processed, however, when deploying to TEST and PROD there will be most files present to be processed.
Can I get some feedback on these two different approaches?
Anyone that provides productive feedback will recieve upvotes!!!
I would go with modified first approach ie something like Inside package, use Script task to check if files are present in destination or not.
For instance :
Create a Script task and a variable.
Inside script task, write a code similar to the image below(Logic is, if file is found then flag it as true, else flag is false) :
Now constraint the execution of DFT by using this flag as shown below :
Only con is, you'll have to make changes in 50 packages, but this is a one time activity. Your parallel execution will remain intact.
I will go with 2nd approach as its cleaner and easy to debug.
Here are the suggestions to improve 2nd approach :
Create a Control table with all package Names, Enable/Disable flag, FileAvailable Flag
Create a Poll package which will go through files and sets flag and package flag accordingly
Loop through this Control table and run only those are enabled and having file.

SSIS 14 Issue - Enough disk

I was running my SSIS package in order to make some transformations in Staging Area when I got the following error:
Error: The buffer manager cannot extend the
file "C:\...\AppData\Local\Temp\3\DTS{CFB0A22A-23C4-4111-BE80-84643E8E579E}.tmp" to length 1046308 bytes.
There was insufficient disk space.
This error happens in a Lookup Object before the OLE DB Destination.
I'm not the technical guy here but I need to load my DW util next week! What can I do in this situation? Can I put SSIS point to another drive in my machine? How can I do that?
Thanks!
This issue is addressed here, but in summary the temporary files are created in directory specified by BufferTempStoragePath property of data flow task. By default it uses user's temporary files folder (%TEMP%), but can be overridden for each data flow task. There is a problem with this path, and data flow can't create temporary files there. Check that the path is correct, you have permissions to write to this path, and there is enough free space during package execution.

SSIS - File system task, Create directory error

I got an error after running a SSIS package that has worked for a long time.
The error was thrown in a task used to create a directory (like this http://blogs.lessthandot.com/wp-content/uploads/blogs/DataMgmt/ssis_image_05.gif) and says "Cannot create because a file or directory with the same name already exists", but I am sure the directory or a file with the same name didn´t exist.
Before throwing error, the task created a file with no extension named as the expected directory. The file has a modified date more than 8 hours prior to the created date wich is weird.
I checked the date in the server and it is correct. I also tried running the package again and it worked.
What happened?
It sounds like some other process or person made a mistake in that directory and created a file that then blocked your SSIS package's directory create command, not a problem within your package.
Did you look at the security settings of the created file? It might have shown an owner that wasn't the credentials your SSIS package runs under. That won't help if you have many packages or processes that all run under the same credentials, but it might provide useful information.
What was in the file? The contents might provide a clue how it got there.
Did any other packages/processes have errors or warnings within a half day of your package's error? Maybe it was the result of another error. that you could locate through the logs of the other process.
Did your process fail to clean up after itself on the last run?
Does that directory get deleted at the start of your package run, at the end of your package run, or at the end of the run of the downstream consumer of the directory contents? If your package deletes it at the beginning, then something that slows the delete could present a race condition that normally resolves satisfactorily (the delete finishes before the create starts) but once in a while goes the wrong way.
Were you (or anyone) making a copy or scan of the directory in question? Sometimes copy programs (i.e. FTP) or scanning programs (anti virus, PII scans) can make a temporary copy of a large item being processed (i.e. that directory) and maybe it got interrupted and left the temp copy behind.
If it's not repeatable then finding out for sure what happened is tough, but if it happens again try exploring the above. Also, if you can afford to, you might want to increase logging. It takes more CPU and disk space and makes reviewing logs slower, but temporarily increasing log details can help isolate a problem like that.
Good luck!

SSIS package failing

I am using SSIS for data warehousing to import data from different sources like flat files,.xls and some other SQL server servers.
In my scenario i have 50 data flow task which execute in a package(Control flow) parallel.These data flow are independent means fetching data from different tables and files into my warehouse DB.
In my case sometime structure of my source table or file changed and then my package got failed means show validation error.
I need a solution by which I can skip only corrupted "data flow task" and other data flow task can complete their task. I don't like to make separate package for each data flow task.
Please advise what to do in such situation.
Regards
Shakti
I highly advise putting each of these into a separate package, and then using a scheduling tool or master package to call each one individually. It will make the maintainability of this solution much better.
If you insist on having them all in one package, you can use the "FailParentOnFailure", "FailPackageOnFailure", and "MaximumErrorCount" properties to have your data flow fail, but the container ignore errors, allowing other things to run. Really probably shouldn't do that though - failures could be for any number of reasons and having separate packages that run in parallel makes finding the error during a scheduled run much easier...