SnappyData Job API - snappydata

We have created a job by extending SnappySqlJob and overriding runSnappyJob and isValidJob.
In runSnappyJob we are creating connection to Kafka to poll messages from Kafka in every 1 second.
After terminating the job using:
bin/snappy-job.sh stop --lead lead:8090 --job-id <job_id>
we can see in the logs that Kafka is still polling the data.
Is there any API to check the status of running job so that we can stop the Kafka consumer?

Not sure about this behavior you noticed but you should use a SnappyStreamingJob instead of SnappyJob for managing streaming jobs.
See https://snappydatainc.github.io/snappydata/programming_guide/snappydata_jobs/ ... there is a link to the examples from there.

Related

Openshift AMQ6 - message order - queue

I use AMQ 6 (ActiveMQ) on OpenShift, and I use a queue with re-delivery with exponentialBackoff (set in connection query params).
When I have one consumer and two messages and the first message gets processed by my single consumer and does NOT get an ACK...
Will the broker deliver the 2nd message to the single consumer?
Or will the broker wait for the re-delivery to preserve message order.
This documentation states:
...Typically a consumer handles redelivery so that it can maintain message order while a message appears as inflight on the broker. ...
I don't want to have my consumer wait for re-delivery. It should consume other messages. Can I do this without multiple consumers? If so, how?
Note: In my connection query params I don't have the ActiveMQ exclusive consumer set.
I have read the Connection Configuration URI docs, but jms.nonBlockingRedelivery isn't mentioned there.
Can the resource adapter use it by query param?
If you set jms.nonBlockingRedelivery=true on your client's connection URL then messages will be delivered to your consumer while others are in the process of redelivery. This is false by default.

Messages stuck or lost in ActiveMQ cluster

I've set up a small ActiveMQ Network of Brokers to increase reliability. It consists of 3 nodes with the following properties (full config template file is available here):
ActiveMQ Version 5.13.3 (latest as of July 16)
Local LevelDB persistence adapter
NetworkConnector uri="static:(tcp://${OTHER_NODE1}:61616,tcp://${OTHER_NODE2}:61616)" with the two variables set for e.g. node2 to node1 and node3 (uni-directal conn. between all nodes).
Clients connect with failover:(tcp://node1:61616,tcp://node2:61616,tcp://node3:61616), send and retrieve messages as needed.
The failover protocol randomizes the target machine, so messages might be sent back and forth inside the cluster.
There are two (failing) scenarios:
As it is described now, some messages are not delivered, because they are not allowed to go "back". This is done to avoid loops and described in this blog post.
Activating the replayWhenNoConsumers flag as described in the blog and in NoB: Stuck Messages causes those messages to be recognized as duplicates.With enableAudit enabled, I get cursor got duplicate send ID, disabling it gives me a <MSG> paged in, is cursor audit disabled? Removing from store and redirecting to dlq.
Maybe this is trivial to fix - anybody has an idea?

Spinnaker Jenkins Integration unable to fetch jobs from Jenkins

We have completed all the steps as described in the hello-spinnaker example below.We have used the AWS spinnaker image to directly configure spinnaker in AWS.
www.spinnaker.io/docs/hello-spinnaker.
I am trying to create a sample pipeline as noted in the above example.But while I create trigger in the first step and select jenkins ,the jobs are not getting populated and am getting below error in browser.
GET http://localhost:8084/v2/builds/Jenkins/jobs 429 (Too Many Requests)
The actual issue looks like while retrofit is trying to map the response from jenkins getjobs into the JobList class its finding an attribute _class in jenkins response xml and which is not present in JobList groovy class.Below is how we tried finding the issue
1)Login to AWS Spinnaker instance
2)Gate service is exposed at port 8084.
curl http://localhost:8084/v2/builds/Jenkins/jobs.
{"failureCause":"retrofit.RetrofitError: 429 Too Many Requests","error":"Too Many Requests","message":"429 Too Many Requests","status":429,"url":"http://localhost:8088/jobs/Jenkins","timestamp":1462793944530}
3)Igor service is exposed at port 8088.
curl http://localhost:8088/jobs/Jenkins
{"fallbackException":"java.lang.UnsupportedOperationException: No fallback available.","failureType":"COMMAND_EXCEPTION","failureCause":"retrofit.converter.ConversionException: org.simpleframework.xml.core.AttributeException: Attribute '_class' does not have a match in class com.netflix.spinnaker.igor.jenkins.client.model.JobList at line 1","error":"Hystrix Failure","message":"jenkins-Jenkins-getJobs failed and no fallback available.","status":429,"timestamp":1462793896853}
When I check in the igor logs,there are few exceptions which are occuring during the getprojects by jenkins poll
Caused by: retrofit.converter.ConversionException: org.simpleframework.xml.core.AttributeException: Attribute '_class' does not have a match in class com.netflix.spinnaker.igor.jenkins.client.model.ProjectsList at line 2
at retrofit.converter.SimpleXMLConverter.fromBody(SimpleXMLConverter.java:38)
at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:367)
... 39 common frames omitted
Caused by: org.simpleframework.xml.core.AttributeException: Attribute '_class' does not have a match in class com.netflix.spinnaker.igor.jenkins.client.model.ProjectsList at line 2
4)Connect to jenkins and get the jobs as its being done in spinnaker code https://github.com/spinnaker/igor/blob/master/igor-web/src/main/groovy/com/netflix/spinnaker/igor/jenkins/client/JenkinsClient.groovy
resp = requests.get('http://jenkinserverip:8080/api/xml?tree=jobs[name,jobs[name,jobs[name,jobs[name,jobs[name,jobs[name,jobs[name,jobs[name,jobs[name,jobs[name]]]]]]]]]]',auth=('admin','password'))
print resp.text
<hudson _class='hudson.model.Hudson'><job _class='hudson.model.FreeStyleProject'><name>Hello Build</name></job><job _class='hudson.model.FreeStyleProject'><name>Hello Poll</name></job></hudson>
So as the jenkins response is having the _class attribute ,retrofit is throwing an error at this line http://grepcode.com/file/repo1.maven.org/maven2/com.squareup.retrofit/retrofit/1.9.0/retrofit/RestAdapter.java#383
I wanted to see how can we quickly fix this as it looks like some version in compatibility of jenkins.
I'm seeing a similar issue in spinnaker 1.8.5. I had to reformat the jenkins url from myjenkins.server.com:8080 to http://myjenkins.server.com/ and it corrected the issue.
this is a bug around the jenkins api in later version. I believe 2.2 is the last compatible version, we run 1.6 internally.

Fiware CEP server stops responding

In developing in Fi-Cloud's CEP I've been having an issue that has been happening repeatedly. As I'm trying to develop a definition to perform a task, CEP's server and Authoring Tool stop responding, although ssh is still responsive.
This issue happens as I develop. I'm using the AuthoringTool to alter the definition bit by bit and then I re-upload it to the server through the authoring tool's export feature.
To reinitiate the proton with the new definition each time I alter it, I use Google's Postman with this single operation:
-PUT (url:http://{ip}:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer)
header: 'Content-Type' : 'application/json'; body : {"action": "ChangeDefinitions","definitions-url" : "/ProtonOnWebServerAdmin/resources/definitions/Definition_Name"}
At the same time, I'm logged in with three ssh intances, one to monitor the files being created on /opt/tomcat10/sample/ and other things, and the other two to 'tail -f ' log files the definition writes to, as events are processed: one log for events recieved and another log for events detected by the EPAgent.
I'm iterating through these procedures over and over as I'm developing and eventualy CEP server and the Authoring Tool stop responding.
By "tailing" tomcat's log file (# tail -f /opt/tomcat10/logs/catalina.out) I can see that, when under these circumstances, if I attemp a:
-GET (url: http://{ip}:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer)
I get no response back and tomcat logs the following response:
11452100 [http-bio-8080-exec-167] ERROR org.apache.wink.server.internal.RequestProcessor - An unhandled exception occurred which will be propagated to the container.
java.lang.OutOfMemoryError: PermGen space
Exception in thread "http-bio-8080-exec-167" java.lang.OutOfMemoryError: PermGen space
Ssh is still responsive and I can look at tomcat's log this way.
To get over this and continue, I exit ssh connections and restart CEP's instance in the Fi-Cloud.
Is the procedure I'm using to re-upload and re-run the definition inapropriate? Should I take a different approach to developing?
When you update a definition that the CEP is already working with, and you want the CEP engine to work with the updated definition, you need to:
Export the definition using the authoring tool export (as you did)
Stop the engine run, using REST PUT
PUT //host:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer
{"action":"ChangeState","state":"stop"}
Start the engine, using REST PUT
PUT //host:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer
{"action":"ChangeState","state":"start"}
You don't need to activate the "ChangeDefinitions" action, since it is the same definition name that the engine is already working with.
Activating "ChangeDefinitions" action, only influences the next run of the CEP, and has no influence on the current run.
This answer your question about how you should update a CEP definition.
Hope it will solve your issue.

How to stop a streaming pipeline in google cloud dataflow

I have a Streaming dataflow running to read the PUB/SUB subscription.
After a period of a time or may be after processing certain amount of data, i want the pipeline to stop by itself. I don't want my compute engine instance to be running indefinitely.
When i cancel the job through dataflow console, it is shown as failed job.
Is there a way to achieve this? am i missing something ? Or that feature is missing in the API.
Could you do something like this?
Pipeline pipeline = ...;
... (construct the streaming pipeline) ...
final DataflowPipelineJob job =
DataflowPipelineRunner.fromOptions(pipelineOptions)
.run(pipeline);
Thread.sleep(your timeout);
job.cancel();
I was able to drain (canceling a job without losing data) a running streaming job on data flow with Rest API.
See my answer
Use Rest Update method, with this body :
{ "requestedState": "JOB_STATE_DRAINING" }