I've got an SSIS job that pulls GL transactions from a Jade database via ODBC. If I have a large date range of transactions, I get a read timeout from Jade. Is there a way to structure an SSIS job so that it would pull a few days at a time in separate reads from the source, so that it avoids this timeout. I'm using a for loop and only asking for a few days at a time but it fails, so I've obviously not avoided the issue.
To be clear we're going to up the timeout on the server from 3 to 10 minutes. We won't use a 0 timeout for obvious reasons but there is always a chance that if we need to pull a large range of data for a new project we'd hit whatever reasonable timeout we set.
I'm looking for a way to structure the job to incrementally pull smaller ranges until it's complete.
I found the timeout setting, it's on the Jade Dataflow Component, not on the ODBC connection object where I thought it would be. I probably put a value in there when I created the package about 18 months ago, and forgot about it.
Related
Good day!
I have a SSIS package that retrieves data from a database and exports it to a flat file (simple process). The issue I am having is that the data my package retrieves each morning depends on a separate process to load the data into a table prior to my package retrieving it.
Now, the process which initially loads the data does inserts metadata into a table which shows the start and end date/time. I would like to setup something in my package that checks the metadata table for an end date/time for the current date. If the current date exists, then the process continues... IF no date/ time exists then the process stops (here is the kicker) BUT the package re-triggers itself automatically an hour later to check if the initial data load is complete.
I have done research on checkpoints, etc. but all that seems to cover is if the package fails it would pick up where it left when the package is restarted. I don't want to manually re-trigger the process, I'd like it to check the metadata and re-start itself if possible. I could even put in processing that if it checks the metadata 3 times it would stop completely.
Thanks so much for your help
What you want isn't possible exactly the way you describe it. When a package finishes running, it's inert. It can't re-trigger itself, something has to re-trigger it.
That doesn't mean you have to do it manually. The way I would handle this is to have an agent job scheduled to run every hour for X number of hours a day. The job would call the package every time, and the meta data would tell the package whether it needs to do anything or just do nothing.
There would be a couple of ways to handle this.
They all start by setting up the initial check, just as you've outlined above. See if the data you need exists. Based on that, set a boolean variable (I'll call it DataExists) to TRUE if your data is there or FALSE if it isn't. Or 1 or 0, or what have you.
Create two Precedent Contraints coming off that task; one that requires that DataExists==TRUE and, obviously enough, another that requires that DataExists==FALSE.
The TRUE path is your happy path. The package executes your code.
On the FALSE path, you have options.
Personally, I'd have the FALSE path lead to a forced failure of the package. From there, I'd set up the job scheduler to wait an hour, then try again. BUT I'd also set a limit on the retries. After X retries, go ahead and actually raise an error. That way, you'll get a heads up if your data never actually lands in your table.
If you don't want to (or can't) get that level of assistance from your scheduler, you could mimic the functionality in SSIS, but it's not without risk.
On your FALSE path, trigger an Execute SQL Task with a simple WAITFOR DELAY '01:00:00.00' command in it, then have that task call the initial check again when it's done waiting. This will consume a thread on your SQL Server and could end up getting dropped by the SQL Engine if it gets thread starved.
Going the second route, I'd set up another Iteration variable, increment it with each try, and set a limit in the precedent constraint to, again, raise an actual error if you're data doesn't show up within a reasonable number of attempts.
Thanks so much for your help! With some additional research I found the following article that I was able to reference and create a solution for my needs. Although my process doesn't require a failure to attempt a re-try, I set the process to force fail after 3 attempts.
http://microsoft-ssis.blogspot.com/2014/06/retry-task-on-failure.html
Much appreciated
Best wishes
The hardware, infrastructure, and redundancy are not in the scope of this question.
I am building an SSIS ETL solution needs to import ~600,000 small, simple files per hour. With my current design, SQL Agent runs the SSIS package, and it takes ānā number of files and processes them.
Number of files per batch ānā is configurable
The SQL Agent SSIS package execution is configurable
I wonder if the above approach is a right choice? Or alternatively, I must have an infinite loop in the SSIS package and keep taking/processing the files?
So the question boils down to a choice between infinite loop vs. batch+schedule. Is there any other better option?
Thank you
In a similar situation, I run an agent job every minute and process all files present. If the job takes 5 minutes to run because there are alot of files, the agent skips the scheduled runs until the first one finishes so there is no worry that two processes will conflict with each other.
Is SSIS the right tool?
Maybe. Let's start with the numbers
600000 files / 60 minutes = 10,000 files per minute
600000 files / (60 minutes * 60 seconds) = 167 files per second.
Regardless of what technology you use, you're looking at some extremes here. Windows NTFS starts to choke around 10k files in a folder so you'll need to employ some folder strategy to keep that count down in addition to regular maintenance
In 2008, the SSIS team managed to load 1TB in 30 minutes which was all sourced from disk so SSIS can perform very well. It can also perform really poorly which is how I've managed to gain ~36k SO Unicorn points.
6 years is a lifetime in the world of computing so you may not need to take such drastic measures as the SSIS team did to set their benchmark but you will need to look at their approach. I know you've stated the hardware is outside of the scope of discussion but it very much is inclusive. If the file system (san, nas, local disk, flash or whatever) can't server 600k files then you'll never be able to clear your work queue.
Your goal is to get as many workers as possible engaged in processing these files. The Work Pile Pattern can be pretty effective to this end. Basically, a process asks: Is there work to be done? If so, I'll take a bit and go work on it. And then you scale up the number of workers asking and doing work. The challenge here is to ensure you have some mechanism to prevent workers from processing the same file. Maybe that's as simple as filtering by directory or file name or some other mechanism that is right for your situation.
I think you're headed down this approach based on your problem definition with the agent jobs that handle N files but wanted to give your pattern a name for further research.
I would agree with Joe C's answer - schedule the SQL Agent job to run as frequently as needed. If it's already running, it won't spawn a second process. Perhaps you're going to have multiple agents that all start every minute - AgentFolderA, AgentFolderB... AgentFolderZZH and they are each launching a master package that then has subprocesses looking for work.
Use WMI Event viewer watcher to know if new file arrived or not and next step you can call job scheduler to execute or execute direct the ssis package.
More details on WMI event .
https://msdn.microsoft.com/en-us/library/ms141130%28v=sql.105%29.aspx
My ssis package is running very slowly. The package is like this, using FTP task we will collect files from server and then loads that data into sql server table. It is scheduled to run every night as a job. When I run it in IDE it is very fast. And when run it on SQL server Aegnt as a job, for some days it fast. But as days progress the package is taking much time to execute. What I have to do for upcoming this issue? Please give me in detail.
Logging implementation makes sense. Than you can find out certain part(-s) of your ETL, which slows down entire package.
As for some variants:
What about other tasks on SQL or FTP servers the same time as
package execution? Looks like other scheduled tasks (backups,
defrags etc.) take server resources from time to time. Is there any
repeated sequence of performance decreasing?
Amount of processed data. Let's say files
represent sales, which have drastically other amount on weekend.
Based on code. For example, manual execution let
server to truncate some temporary tables, but automatic version uses
the same tables w/o truncation, which slows down every next day.
But first of all: logging may help as a start point of fixing.
I have an older machine, running Windows Server 2003 and SQLServer 2005. It is nearing its end of life, but until all the users have migrated to the new server, we need to keep an SSAS cube on that machine up to date.
The problem is the network connection is sometimes flaky. The cube on this server points to a SQL database on another server, and if the connection experiences a problem while the cube is being rebuilt, the process stops with an error message.
Since 99 out of a 100 times when there is a processing error it is caused by the network, I'm looking for a way to repeat the processing of the cube if it fails. The perfect solution should stop after x tries (just in case this is the 100th time and it's not a network issue).
I've looked at SQL Server Agent Jobs, and SSIS, and don't see a way to loop until it succeeds or reaches the x number of tries. Isn't there a command line option to run the cube processing, so that I can include it in a Powershell loop or something? Or is there an option is SQL or SSIS that I have missed?
We ran into a similar situation with our processes and were able to have the Agent jobs retry 3 times after a 10 minute wait period. Only if all 3 tries failed did our process alert us that "this is a real failure and not a transient issue."
In any SQL Agent job step type, the Advanced tab ought to look like this.
As you can see, you can specify your Retry attempts and an interval in minutes for it to retry.
You could use some sort of loop where you have three objects in your package :
SSAS Processing Task
Script component that increments a counter
Package execution task that calls back the current package if the counter is less than the number of tries for your processing task.
I'm using R to read some data from a MySQL database using the RODBC package. The data is then processed and some results are sent back to the database. The problem is that the server closes the connection after about a minute due to inactivity, which is the time needed to process the data locally. It's a shared server, so the host won't bump up the timeout time.
I think there are two possibilities to get around this:
Open a connection before every database transaction and close it immediately after
Send some small 'ping' command to the server every 30 seconds or so to let the server know that I'm still there.
I can implement the first fairly easily, but it seems pretty slow to constantly open and close connections. Does anyone know an efficient command for the second? Or is a better way altogether?
The first solution is the one I prefer. It's really hard to do the latter with a single threaded program like R. If R is busy running analysis there's no way for it to handle the ping. Unless you are doing hundreds of reads/writes the method of opening and closing the connection should not introduce an extreme amount of overhead.