SSIS checkpoints are not re-starting correctly, skipping NON-checkpointed tasks - ssis

I have an SSIS package where the checkpoints are not behaving as I understand that they should.
To simplify, this is the kind of setup:
Imagine a package with two containers in a serial flow (Container 1 executes then Container 2). Checkpoints are configured and ONLY Container 2 is setup as "Fail Package on Failure". It is desired that Container 1 is ALWAYS completely re-run even in a job restart unless the job made it to Container 2, then it should restart at a checkpoint in Container 2.
Container 1 has a major table copy step that obviously can't be restarted using checkpoints since checkpoints do not track db updates. So Container 1 has logic that allows it to restart the "old fashion way" by interrogating the source and destination, determining where it left off when it failed, and updating package variables accordingly that are used to drive the main query so it will resume at the row it left off. This works flawlessly if I delete the checkpoint file after the failure and restart.
I intentionally cause a failure during the table copy in the non-checkpointed Container 1. But with the checkpoint file left at failure, it doesn't work! For some reason upon restart, it acts as if Container 1 is also checkpointed. It does NOT run the steps in Container 1 that will interrogate the source and destination, nor the steps that update the variables and instead immediately goes to the step of the table copy in Container 1, using variable values from the failed run, which causes the copy to start at an incorrect position (and end up with pk violations in insert). Every task in Container 1 has "Fail Package on Failure" set to FALSE. They also have "Fail Parent on Failure" set to false in case that matters. Why does it act as if Container 1 is checkpointed? Why doesn't it run all steps in Container 1 from the beginning since the package has not even gotten to the first container that is checkpointed (Container 2)? What behavior am I missing with these checkpoints?

Related

SSIS forcing task successful but parent fails

I have a sequence container with two tasks. First task is ok to fail but if the second task fails the container should fail. I set the properties for the first task as
ForcedExecutionValue=0
ForceExecutionResult=Success
ForceExecutionValue=True
If both tasks are successful - regardless of the method why is the container (and therefore the package) failing ?
Tried to set a breakpoint on the first task for onerror, taskfailure and warning but none are firing which seems strange.
Think I was able to work this out. For the first task in addition to the settings in OP, the task needed OnError handler with propagate=false so the error would not be passed to the container. Also make sure DisableEventHandlers = false

How to set up SSIS parent package such that 4 child packages can run at the same time with different parameter values passed in?

I have created a child SSIS package that executes according to the "ProcessName" variable value that is specified initially. Now, I wish to create a parent package such that I can execute 4 child package tasks with different ProcessName values passed in to be executed in parallel. How can I maintain my child package and pass in different values to each of the 4 execute packages task such that the ProcessNames variable values are different for each of them? I am new to SSIS and would deeply appreciate if someone could advice or give a direction on how I could go about doing so.
I would see this as a pattern like the following
The "trick" here is that within each Sequence Container, SEQC, I need to define my variable that holds my parameter value. That variable needs to be scoped to the container - otherwise, there is only one SSIS variable and the 4 processes that attempt to initialize that value will be in conflict.
In the SSIS Variables menu, there is a Move Variable icon (second one listed)
Here you can see that I have ParameterValue defined in both "SEQC Opt 1a" and "SEQC Opt 1b" and they're initialized with different values.
The first step within the Sequence container is an Execute SQL Task where I pull back the intended parameter value. Maybe that is not needed in your case but it can be helpful to have a repository of run-time values. In the case of 1b, this is much more what my execution pattern looks like. I have a query that pulls back any packages to be run within the scope of this container and the starting value. e.g.
ContainerName|PackageName|StartingValue
SEQC Opt 1a |Child0.dtsx|100
SEQC Opt 1a |Child1.dtsx|200
SEQC Opt 1a |Child2.dtsx|300
SEQC Opt 1b |Child5.dtsx|600
SEQC Opt 1b |Child6.dtsx|700
SEQC Opt 1b |Child7.dtsx|800
This table pattern allows me to dynamically run packages in both parallel and in serial. Assuming Child7 and Child2 in the above set are very slow but the other 4 packages are relatively fast. The fast ones would start up, do their work and complete and the next runs. There are limits to how many parallel operations can fire at once so you can't scale infinitely across processes so a balance of serial and parallel operations makes sense.
Once you have your pattern working for one sequence container: copy, paste, rename and assuming you lookup in a table based on the task name as I show above, it's ready to go.
NOTE for everyone reading this answer: This answer is not full/complete with examples/full steps. From comment above I am posting this now so requestor can see it and get started.
This was from notes I wrote myself a long while back on how to do this for myself. I am posting it as answer as it is helpful and too large to post as comment. Plus I have not rewritten anything in what wrote for myself and what I am posting.
Currently I can not find my full code yet to post full details/steps. If/when I do I will post here, but this should be good details on what/how to do it. Plus this gives info on how to handle child package error trapping as well.
-- my notes I saved for myself posting as answer:
Steps for creating child packages:
Create any variables needed in the child package
Create the coorisponding variable name in the parent package (the name doesnot have to be the same, and maybe want to name it something to identify it as a child package variable)
Child Package:
Need to set up: Package Configurations
a. Right click on package and click Package Paramaters
b. Click the checkox to Enable Package configurations
Click Add and set the paramaters:
a. Configuration Type: Pareknt Package variable
b. Specify the configuration setting directly: Put the parent variable name in here that the child package is going to access
c. Click "Next"
d. In "Objects" window scroll down to the variable you are setting from the parent variable name you selected above and click the "Value" option under Properties for that variable name
e. Click "Next"
f. Under Configuration Name: Set a detailed name for what this variable is/does.
Error Handling (NOTE: This is not required but you wont capture the child error messages if you dont do this):
a. Go to Event Handlers Tab
b. Under drop down (on top right) select OnError
c. Add a Scrit Task
d. Pass as read only variables:
System::ErrorDescription
System::SourceName
System::PackageName
e. Copy/paste the code below into the script task uin the Main() function.
----- this is for the error handling
public void Main()
{
// build our the error message
string ErrorMessageToPassToParent = "Package Name: " + Dts.Variables["System::PackageName"].Value.ToString() + Environment.NewLine + Environment.NewLine +
"Step Failed On: " + Dts.Variables["System::SourceName"].Value.ToString() + Environment.NewLine + Environment.NewLine +
"Error Description: " + Dts.Variables["System::ErrorDescription"].Value.ToString();
// have to do this FIRST so you can access variable without passing it into the script task from SSIS tool box
// Populate collection of variables. This will include parent package variables.
Variables vars = null;
Dts.VariableDispenser.GetVariables(ref vars);
// checks if this variable exists in parent first, and if so then will set it to the value of the child variable
// (do this so if parent package does not have the variable it will not error out when trying to set a non-existant varaible)
if (Dts.VariableDispenser.Contains("OnError_ErrorDescription_FromChild") == true)
{
// Lock the to and from variables.
// parent variable
Dts.VariableDispenser.LockForWrite("User::OnError_ErrorDescription_FromChild");
// Need to call GetVariables again after locking them. Not sure why - perhaps to get a clean post-lock set of values.
Dts.VariableDispenser.GetVariables(ref vars);
// Set parentvar = childvar
vars["User::OnError_ErrorDescription_FromChild"].Value = ErrorMessageToPassToParent;
vars.Unlock();
}
Dts.TaskResult = (int)ScriptResults.Success;
}
Parent Package:
Add this variable to propertly capture the child error messages (not required but you wont capture chidl error messages if you dont):
variable: OnError_ErrorDescription_FromChild
Error Handling(NOTE: This is not required but you wont capture the child error messages if you dont do this):
a. Go to Event Handlers Tab
b. Under drop down (on top right) select OnError
c. Add a Scrit Task
d. Pass as read only variables:
User::OnError_ErrorDescription_FromChild
e. Copy/paste the code below into the script task uin the Main() function.
----- this is for the error handling
public void Main()
{
// get the varaible from the parent package for the error
string ErrorFromChildPackage = Dts.Variables["User::OnError_ErrorDescription_FromChild"].Value.ToString();
// do a check if the value is empty or not (so we knwo if the error came from the child package or the occurent in the parent package itself
if (ErrorFromChildPackage.Length > 0)
{
// Then raise the error that was created in the child package
Dts.Events.FireError(0, "Capture Error From Child Package Failure",
ErrorFromChildPackage
, String.Empty, 0);
//Dts.TaskResult = (int)ScriptResults.Failure;
} // end if the error length of variable is > 0
Dts.TaskResult = (int)ScriptResults.Success;
}
NOTES:
For error handling:
a. The child package error handling is written so it wont fail if the variables or error handling does not exist in parent package.
b. If you include the error handling (and variable) in the parent package it MUST exist in the child package though.

Operation not allowed after ResultSet closed in solr import

I encountered an error while doing full-import in solr-6.6.0.
I am getting exception as bellow
This happens when I set
batchSize="-1" in my db-config.xml
If I change this value to say batchSize="100" then import will run without any error.
But recommended value for this is "-1".
Any suggestion why solr throwing this exception.
By the way the data am trying to import is not huge, data am trying to import is just 250 documents.
Stack trace:
org.apache.solr.handler.dataimport.DataImportHandlerException: java.sql.SQLException: Operation not allowed after ResultSet closed
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:464)
at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:377)
at org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:133)
at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:75)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:516)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:745)
By the way am getting one more warning:
Could not read DIH properties from /configs/state/dataimport.properties :class org.apache.zookeeper.KeeperException$NoNodeException
This happens when config directory is not writable.
How can we make config directory writable in solrCloud mode.
Am using zookeeper as watch-dog. Can we go ahead and change permission of config files which are there is zookeeper?
your help greatly appreciated.
Using fetchSize="-1" is only recommended if you have problems running without it. Its behaviour is up to the JDBC driver, but the cause of people assuming its recommended is this sentence from the old wiki:
DataImportHandler is designed to stream row one-by-one. It passes a fetch size value (default: 500) to Statement#setFetchSize which some drivers do not honor. For MySQL, add batchSize property to dataSource configuration with value -1. This will pass Integer.MIN_VALUE to the driver as the fetch size and keep it from going out of memory for large tables.
Unless you're actually seeing issues with the default values, leave the setting alone and assume your JDBC driver does the correct thing (.. which it might not do with -1 as the value).
The reason for dataimport.properties having to be writable is that it writes a property for the last time the import ran to the file, so that you can perform delta updates by referencing the time of the last update in your SQL statement.
You'll have to make the directory writable for the client (solr) if you want to use this feature. My guess would be that you can ignore the warning if you're not using delta imports.

Neo4j server hangs every 2 hours consistently. Please help me understand if something is wrong with the configuration

We have a neo4j graph database with around 60 million nodes and an equivalent relationships.
We have been facing consistent packet drops and delays in processing and a complete hung server after 2 hours. We had to shutdown and restart our servers every time this happens and we are having trouble understanding where we went wrong with our configuration.
We are seeing the following kind of exceptions in the console.log file -
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.jetty.server.HttpConnection - HttpConnection#609c1158{FILLING}
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.j.util.thread.QueuedThreadPool
java.lang.IllegalStateException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException
o.e.j.util.thread.QueuedThreadPool - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$3#59d5a975 in
qtp1667455214{STARTED,14<=21<=21,i=0,q=58}
org.eclipse.jetty.server.Response - Committed before 500 org.neo4j.server.rest.repr.OutputFormat$1#39beaadf
o.e.jetty.servlet.ServletHandler - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - Could not send response error 500: java.lang.IllegalStateException: Committed
o.e.jetty.server.ServerConnector - Stopped
o.e.jetty.servlet.ServletHandler - /db/data/cypher org.neo4j.graphdb.TransactionFailureException: Transaction was marked
as successful, but unable to commit transaction so rolled back.
We are using neo4j enterprise edition 2.2.5 server in SINGLE/NON CLUSTER mode on Azure D series 8 core CPU,56 GB RAM UBUNTU 14.04 LTS machine with an attached 500GB data disk.
Here is a snapshot of the sizes of neostore files
8.5G Oct 2 15:48 neostore.propertystore.db
15G Oct 2 15:48 neostore.relationshipstore.db
2.5G Oct 2 15:48 neostore.nodestore.db
6.9M Oct 2 15:48 neostore.relationshipgroupstore.db
3.7K Oct 2 15:07 neostore.schemastore.db
145 Oct 2 15:07 neostore.labeltokenstore.db
170 Oct 2 15:07 neostore.relationshiptypestore.db
The Neo4j configuration is as follows -
Allocated 30GB to file buffer cache (dbms.pagecache.memory=30G)
Allocated 20GB to JVM heap memory (wrapper.java.initmemory=20480, wrapper.java.maxmemory=20480)
Using the default hpc(High performance) type cache.
Forcing the RULE planner by default (dbms.cypher.planner=RULE)
Maximum threads processing queries is 16(twice the number of cores) - org.neo4j.server.webserver.maxthreads=16
Transaction timeout of 60 seconds - org.neo4j.server.transaction.timeout=60
Guard Timeout if query execution time is greater than 10 seconds - org.neo4j.server.webserver.limit.executiontime=10000
Rest of the settings are default
We actually want to setup a cluster of 3 nodes but before that we want to be sure if our basic configuration is correct. Please help us
--------------------------------------------------------------------------
EDITED to ADD Query Sample
Typically our cypher query frequency is 18K queries in an hour with an average of roughly 5-6 queries a second. There are also times when there are about 80 queries per second.
Our Typical Queries look like the ones below
match (a:TypeA {param:{param}})-[:RELA]->(d:TypeD) with distinct d,a skip {skip} limit 100 optional match (d)-[:RELF]->(c:TypeC)<-[:RELF]-(b:TypeB)<-[:RELB]-(a) with distinct d,a,collect(distinct b.bid) as bids,collect(distinct c.param3) as param3Coll optional match (d)-[:RELE]->(p:TypeE)<-[:RELE]-(b1:TypeB)<-[:RELB]-(a) with distinct d as distD,bids+collect(distinct b1.bid) as tbids,param3Coll,collect(distinct p.param4) as param4Coll optional match (distD)-[:RELC]->(f:TypeF) return id(distD),distD.param5,exists((distD)<-[:RELG]-()) as param6, tbids,param3Coll,param4Coll,collect(distinct id(f)) as fids
match (a:TypeA {param:{param}})-[:RELB]->(b) return count(distinct b)
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (h)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.param5
match (a:TypeA {param:{param}}) match (a)-[:RELB]->(b) with distinct b,a skip {skip} limit 100 match (a)-[:RELH]->(h1:TypeH) match (b)-[:RELF|RELE]->(x)<-[:RELF|RELE]-(h2:TypeH)<-[:RELH]-(a1) optional match (a1)<-[rd:RELD]-(a) with distinct a1,a,h1,b,h2,rd.param1 as param2,collect(distinct x.param3) as param3s,collect(distinct x.param4) as param4s optional match (a1)-[:RELB]->(b1) where b1.param7 in [0,1] and exists((b1)-[:RELF|RELE]->()<-[:RELF|RELE]-(h1)) with distinct a1,a,b,h2,param2,param3s,param4s,b1,case when param2 then false else case when ((a1.param5 in [2,3] or length(param3s)>0) or (a1.param5 in [1,3] or length(param4s)>0)) then case when b1.param7=0 then false else true end else false end end as param8 MERGE (a)-[r2:RELD]->(a1) on create set r2.param6=true on match set r2.param6=case when param8=true and r2.param9=false then true else false end MERGE (b)-[r3:RELM]->(h2) SET r2.param9=param8, r3.param9=param8
MATCH (a:TypeA {param:{param}})-[:RELI]->(g:TypeG {type:'type1'}) match (g)<-[r:RELI]-(a1:TypeA)-[:RELJ]->(j)-[:RELK]->(g) return distinct g, collect(j.displayName), collect(r.param1), g.gid, collect(a1.param),collect(id(a1))
match (a:TypeA {param:{param}})-[r:RELD {param2:true}]->(a1:TypeA)-[:RELH]->(b:TypeE) remove r.param2 return id(a1),b.displayName, b.firstName,b.lastName
match (a:TypeA {param:{param}})-[:RELA]->(b:TypeB) return a.param1,count(distinct id(b))
MATCH (a:TypeA {param:{param}}) set a.param1=true;
match (a:TypeE)<-[r:RELE]-(b:TypeB) where a.param4 in {param4s} delete r return count(b);
MATCH (a:TypeA {param:{param}}) return id(a);
Adding a few more strange things I have been noticing....
I am have stopped all my webservers. So, currently there are no incoming requests to neo4j. However I see that there are about 40K open file handles in TCP close/wait state implying the client has closed its connection because of time out and Neo4j has not processed it and responded to that request. I also see (from messages.log) that Neo4j server is
still processing queries and as it does this, the 40K open file handles is slowly reducing. By the time I write this post there are about 27K open file handles in TCP close/wait state.
Also I see that the queries are not continuously processed. Every once in a while I am seeing a pause in messages.log and I see these messages about log rotation because of some out of order sequence as below
Rotating log version:5630
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.LogRotationImpl]: Log Rotation [5630]: Awaiting all
transactions closed...
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
committed: out-of-order-sequence:95494483 [95494476]
committing:
95494483
closed: out-of-order-sequence:95494480 [95494246]
2015-10-04 05:10:43.293+0000 INFO [o.n.k.LogRotationImpl]: Log
Rotation [5630]: Starting store flush...
2015-10-04 05:10:44.941+0000
INFO [o.n.k.i.s.StoreFactory]: About to rotate counts store at
transaction 95494483 to [/datadrive/graph.db/neostore.counts.db.b],
from [/datadrive/graph.db/neostore.counts.db.a].
2015-10-04
05:10:44.944+0000 INFO [o.n.k.i.s.StoreFactory]: Successfully rotated
counts store at transaction 95494483 to
[/datadrive/graph.db/neostore.counts.db.b], from
[/datadrive/graph.db/neostore.counts.db.a].
I also see these messages once in a while
2015-10-04 04:59:59.731+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
NodeCache array:66890956 purge:93 size:1.3485746GiB misses:0.80978173%
collisions:1.9829895% (345785) av.purge waits:13 purge waits:0 avg.
purge time:110ms
or
2015-10-04 05:10:20.768+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
RelationshipCache array:66890956 purge:0 size:257.883MiB
misses:10.522135% collisions:11.121769% (5442101) av.purge waits:0
purge waits:0 avg. purge time:N/A
All of this is happening when there are no incoming requests and neo4j is processing old pending 40K requests as I mentioned above.
Since, it is a dedicated server, should not the server be processing the queries continuously without such a large pending queue? Am I missing something here? Please help me
Didn't go completely over your queries. You should examine each of the queries you send often by prefixing with PROFILE or EXPLAIN to see the query plan and get an idea how many accesses they cause.
E.g. the second match in the following query looks like being expensive since the two patterns are not connected with each other:
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (m)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.bPhoto
Also enable garbage collection logging in neo4j-wrapper.conf and check if you're suffering from long pauses. If so, consider to reduce heap size.
Looks like that this issue requires more research from your side, but there is some things from my experience.
TL;DR; - I had similar issue with my own unmanaged extension, where transactions were not properly handled.
Language/connector
What language/connector is used in your application?
You should verify that:
If some popular open-source library is used - your application is using latest version. Probably there is bug in your connector.
If you have your own, hand-written solution that works with REST API - verify that ALL http request are closed at client side.
Extension/plugins
It's quite easy to mess things up, if custom-written extensions/plugins are used.
What should be checked:
All transaction are always closed (try-with-resource is used)
Neo4j settings
Verify your server configuration. For example, if you have large value for org.neo4j.server.transaction.timeout and you don't handle properly transaction at client side - you can end up with a lot of running transactions.
Monitoring
You are using Enterprise version. That means that you have access to JMX. It's good idea to check information about active Locks & Transactions.
Another Neo4j version
Maybe you can try another Neo4j version. For example 2.3.0-M03.
This will give answers for questions like:
Is this Neo4j 2.2.5 bug?
Is this existing Neo4j installation misconfiguration?
Linux configuration
Check your Linux configuration.
What is in your /etc/sysctl.conf? Are there any invalid/unrelated settings?
Another server
You can try to spin-up another server (i.e. VM at DigitalOcean), deploy database there and load it with Gatling.
Maybe your server have some invalid configuration?
Try to get rid of everything, that can be cause of the problem, to make it easier to find a problem.

SSIS package 2012

I have attached the screen shot
How to use the check points in SSIS 2012 package in order to restart the package from point of failure not from the beginning.
I think its not allowing me to attach the image...
You can use Checkpoints which basically executes the package from the point where it got failed.If your package has 5 to 6 steps and if the 2nd step failed then the next time when u execute the package it will start from 2nd step .
You can refer these blogs 31 days of SSIS and articles from Simple Talk
Update :-
Step1: Right click on Control Flow and select properties.
Step2: Specify the informations for CheckPointFileName, CheckpointUsage and SaveCheckPoints
CheckPointFileName = Points to the XML file location where checkpoint details will be stored by SSIS
CheckpointUsage =Specifies 3 properties (Never,IfExists,Always)
Never signifies that the checkpoints will never be used.
IfExists = Checkpoints will be used if a file exists
Always=A checkpoint file will always be used .If the file is missing then error will be thrown
SavecheckPoints =Set it true to save the checkpoints
Now for all the controls or tasks set FailPackageOnFailure property to True for the components which you want to participate in Checkpoints . This property indicates whether the package fails when the execution fails