I'm trying to debug some legacy Integration Services code, and really want some confirmation on what I think the problem is:
We have a very large data task inside a control flow container. This control flow container is set up with TransactionOption = supported - i.e. it will 'inherit' transactions from parent containers, but none are set up here.
Inside the data flow there is a call to a stored proc that writes to a table with pseudo code something like:
"If a record doesn't exist that matches these parameters then write it"
Now, the issue is that there are three records being passed into this proc all with the same parameters, so logically the first record doesn't find a match and a record is created. The second record (with the same parameters) also doesn't find a match and another record is created.
My understanding is that the first 'record' passed to the proc in the dataflow is uncommitted and therefore can't be 'read' by the second call. The upshot being that all three records create a row, when logically only the first should.
In this scenario am I right in thinking that it is the uncommitted transaction that stops the second call from seeing the first? Even setting the isolation level on the container doesn't help because it's not being wrapped in a transaction anyway....
Hope that makes sense, and any advice gratefully received. Work-arounds confer god-like status on you.
Is the flow too large to stream all these rows through an aggregate first, to eliminate the duplicates?
If the changes are inside the same transaction they should be visible to each other. And I don't think that SSIS would create a transaction per statement / SP call, so my opinion is that the problem is elsewhere.
Related
I have two SSIS projects within the same folder - let's call them Parent.dtsx and Child.dtsx.
Child solution has many SQL task split by different Sequence Containers.
What I need to do is to (in Parent) execute SQL task from (Child). I don't want to execute whole Child solution, only part of it.
I has been searching for a proper solution for a while, but I haven't found a proper answer yet.
Every Parent-Child solution I've seen presents how to execute whole solution (Child) within Parent one.
I tried to execute selected tasks from Child solution by passing the SQL task ID to the Execute Package Task but if failed. Probably, I don't want to pass any variables from Child to Parent - I just need to execute selected SQL tasks from Child.
I'm a beginner when it comes to SSIS.
Thanks,
Karol
Every Parent-Child solution I've seen presents how to execute whole
solution (Child) within Parent one.
That's because that's the only way it works. There is no way to call only some elements of a child package from a parent package; you can only execute the entire child package, unless you want to get into some extremely complicated low-level coding in a script task.
You need to decide where your tipping point is, and do one of the following (whichever is more desirable in your case):
Copy the SQL Task from your Child package and paste it into your parent package, and just have everything in one package.
Modify your child package so that you can pass it a variable, and only execute certain tasks based on the variable that is passed.
Make your solution even more modular: Take the Task you want to execute out of the child package, and put it in its own package all by itself. Then you can call that third package from the child package, and/or you call it directly from the parent package.
Those are your best options.
EDIT: An idea of how to do option 2 - Add a variable to the child package. In the precedence constraints before each task, check the variable, and if it isn't a certain value, then skip that step.
In other words, from your first step, (which may have to be a "dummy" script, because it is going to get executed every time the package starts no matter what), you have multiple constraints coming out. One that says if the first step is complete and the variable equals some value, go to step 2. Another that says if the first step is complete and the variable equals some other value, go to step 3, and so on and so on.
And then from your parent package, you pass whatever variable value will tell the child package to only execute the task you want to execute.
It ends up looking pretty ugly, because you have precedence constraints all over the place, but we have used it in the past and it works. It won't be too bad if you only have two possible paths you want the execution to take.
Friends,
I have a table that contains data on the two parents of students at a college. Each parent will be sent an email with a link to a web page that will display the parent data that we currently have on record (names, email addresses, mailing addresses, employment information, etc.), and will be able to edit the data in order to update our records.
Since each parent will receive a link to the same data, and will be able to update the same fields, there is the potential for both parents opening the data at the same time, and then one parent submitting changes, then the other submitting changes which would overwrite those submitted by the first parent.
In order to avoid this, I have thought of using the method I've read about in which a timestamp field exists in the parent data record, and that timestamp is used as a hidden field on the form. Then, if both parents load the form, they'll both have the same timestamp stored in the form. When the first parent submits her/his updates, though, the timestamp field will update, and when the form is submitted by the second parent, the timestamp from her/his form will not be the same as the timestamp in the table, and the program (a Perl CGI) would alert the 2nd parent to this fact, and tell them to reload the form or risk overwriting the data submitted by the first parent.
That will work, but the person for whom I'm creating this form has asked if, instead, there's a way to lock the record in the table as soon as the first parent loads the form, and if the second parent tries to load the form while the lock exists, the form will tell them to wait until later (or words to that effect). The lock would be in place either until the form is submitted by parent one, or until one hour (or some specific period of time) has passed. Is this even do-able? I've been Googling, and don't see specific examples of this having been done.
Is there some better solution to this issue of needing to prevent two people from updating the same record, and the second submitter overwriting data submitted by the first.
Thanks for any help you can provide!
Doug
*******to address the comment by "inspiredcoder," here are some more details about what I'm concerned with here:
What I'm trying to avoid having happen is that parent 1 opens the form and starts making changes to the data. Before parent 1 submits those changes, parent 2 opens the form and also starts making different changes to the same fields being edited by parent 1. Parent 1 then submits her/his changes. Parent 2 then submits her/his changes, overwriting the changes made by parent 1.
What I would prefer is that parent 2 would not be able to even begin making changes if parent 1 has opened the form. The changes made by both parents need to be captured, and not overwritten.
The method of using the timestamp as I describe in my initial post can be used to prevent parent 2 from overwriting the data, but it also will mean that they'd have to reload the form to see the changes submitted by parent 1, and in doing so, would lose any of the edits they'd made in the form prior to them trying to submit it and getting the notice to reload. I'd like to avoid them having to re-enter their changes, and the only way to accomplish this seems to be to prevent them from even opening the form if it is already being edited, but I'd want that "lock" on the form/data to timeout after an hour or so in case parent 1 walks away with the form open but unsubmitted.
*****To answer a question by "ThisSuitIsBlackNot": Each parent can edit the same fields. One field asks for activities in which the parents are involved. Let's say Parent 1 enters five activities. If Parent 2 sees the form before Parent 1's edits have been submitted, he/she may enter completely different items, which upon submission would overwrite the activities submitted by Parent 1. If, on the other hand, Parent 2 could be stopped from accessing the form until after Parent 1 has finished her/his edits, then when Parent 2 can load the form, she/he will see everything that Parent 1 entered, rather than an empty form field, and may choose to modify what Parent 1 submitted, overwrite it completely, or not make any changes.
There's a reason you're not finding any info on how to do this. It is a very tough problem that no one has a good solution for regardless of which tech stack you're using. In your case, I'm not convinced that it is actually a terribly important issue to solve because the data does not seem crucial or mission critical. And besides, if there are changes they will likely be the same.
I've been in many design discussions where this issue came up. After hours of arguing the result is always the same: Last one in wins.
That said, here are a couple of simpler ideas you could try:
Simply email both parents (or whoever's registered as a guardian) whenever data on that page changes. This solution is stupid simple and easy to implement. If you're already using email services in other parts of the app then it becomes nearly trivial.
Not so simple: Whenever a request is made to edit the data, create a hash of the data as is to send back with the response to the client. When the edited data is sent in to update the row, check the data against the hash. If the hashes don't match it means that someone else has modified the data while the other parent was looking at it. The trouble with this solution is that you have to create these hashes and lug them around through several layers of the app making your programming non-trivial.
This statement caught my eye in a later edit of your OP:
The changes made by both parents need to be captured, and not
overwritten.
That single business rule actually makes things quite simple for you. All you need to do is to ALWAYS create objects when they do not have a unique identifier (probably 0 or -1). When objects do have an ID, meaning they have already been created, you simply update.
There is an assumption here that edits will likely be performed non-destructively on the same data. e.g. One parent creating an activity and the other parent editing it. There is a chance of duplicate activities but that's a situation easily resolved with a delete.
This way, no one parent can overwrite the other's data blindly and unknowingly.
Regardless of what you do though, do not try to find a perfect solution. It just doesn't happen. I know, I've been writing line of business apps for over 15 years. Apply your time and talents to something that you can get right, which the application and its business rules.
I would suggest reading up on database isolation levels. I believe MySQL defaults to repeatable read. You can confirm your isolation level at the DB level by running "SHOW GLOBAL VARIABLES LIKE 'tx_isolation';" Each transaction in this configuration is already placing a lock. Whether it is getting a row level or escalating depends on factors such as how indexes are being hit etc, by the query. If you fire off transaction A to update a record then subsequently fire off transaction B, transaction B is already in a holding pattern until transaction A completes its work in this configuration. If you set this to read commited, reads no longer block each other with locking (updates, etc still place locks). In lieu of implicit locks on reads you can be explicit using the select for update to try and force a lock on the read.
I mention brushing up on locking mechanics as trying to brute force locking without extreme knowledge of the back end DB mechanics can lend itself to deadlock central.
It seems like in your scenario this is more about user perception that what they are reading is up to date when they submit the changes. The DB is really doing it's job as designed. I have seen architecture to address this user perception issue by only allowing one user in a record at a time (locking out other users from the record while someone is it) handled in some middle ware code, etc. Or by using SOA architecture to push notifications to users in the record that a changed occurred by another user.
I need to prevent loading data into my fact table if any of the incoming data has a [DateId] that already exists in the fact table. The field [DateId] is an integer value.
The Lookup action in SSIS allows you to fail on non-matches, but I actually need a failure if any match is found. How can I get the package to fail when there's a match?
If you just want non-matches to flow through the lookup, just use the "Lookup No Match Output" to connect to the next component in your data flow.
Since the Lookup Match Output isn't hooked up to anything, all that data will just "stop" there. This is the equivalent of the SQL pattern LEFT JOIN WHERE --some left column-- IS NULL.
Use either a merge-join (with a conditional split) or a lookup with a nomatch output (without hooking the match to anything).
No answer after 8 years. I had the same issue today and I can't think of anything else than this workaround.
Context:
Writing from excel to SQL Server. Need to fail if the date is already present. This is also the parent in the hierarchy of data flow tasks in the control flow so wanted the child transformations to not execute if the parent fails.
Workaround:
Created a dummy table with one row and 2 columns for date and description. Entered the value "1900-01-02 12:34:56.000" in date and put a description in the other column.
Passed the Lookup Match Output to another lookup which looks up the dates in excel to this dummy table and will fail eventually.
At least I'm able to fail this and avoid writing duplicate data to 6 other tables that are written as a part of the 6 other data flow tasks which would otherwise execute if the parent task does not fail.
I have an int variable User::FileLineCount scoped in a for loop container and in a task within the loop, I want to proceed from the task depending in this variable's value.
In the Precedence Constraint Editor I have chosen Evaluation Operation as Expression and the Expression as #FileLineCount!=0. There is another version to some other task as #FileLineCount==0. When I debug, I can see that the User::FileLineCount value is 0 but when I step Over the task I get Unable to step. Not Implemented. error.
Thanks for the help
EDIT: Apparently the debugger could not step over so that was the reason for the error but the conditions still do not work properly.
EDIT2:
The other one is #FileLineCount==0. Doesnt work without OR'in as in the picture.
I had two paths leaving a data flow task one would go to a sql task and the other would go to another task. I was struggling with this until I realized that two paths leaving the same data flow task would be an or if they were different paths. I assume that you would use AND if you had multiple tasks going into one task and you needed them all to be true for it to proceed. I'm not sure if this exactly what you are asking.
This would be or, because I want one or the other.
Where below I want all three to be true to continue to send the email.
I've been creating imports that use SSIS to import data into a temp table, then using a stored procedure, steps through the data one by one with a cursor to process the data and insert information into 3 different tables. The inserts in the first 2 tables are complicated because if there is a record that already exists with the same data the record is not created. Whether the a record is inserted or not in the first 2 tables the ID of the record or matching record is returned to be used on the 3rd table. Is there an alternative to using the cursor?
Without seeing your current code it is difficult to know whether this would be suitable but I'd look at
the MERGE statement (allows actions to be specified for the different cases "when matched", "when not matched by target", "when not matched by source") and
the OUTPUT clause (allows you to capture the newly updated records for processing).
Hopefully some ideas from this will help. If you still need help avoiding a cursor, we need to see a better example of the processing you are doing in the cursor.
http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them
This sounds like the perfect candidate for replacing a cursor with a combination of table variables and a while loop (which, multiple people have tested and confirmed, is nearly always more performant than a cursor).