AWS SDK in java - How to get activities from worker when multple execution on going for a state machine - aws-sdk

AWS Step Function
My problem is to how to sendTaskSuccess or sendTaskFailuer to Activity which are running under the state machine in AWS .
My Actual intent is to Notify the specific activities which belongs to particular State machine execution.
I successfully send notification to all waiting activities by activityARN. But my actual need is to send notification to specific activity which belong to particular state machine execution .
Example . StateMachine - SM1
There two execution on going for SM1-- SM1E1, SM1E2 . In that case I want to sendTaskSuccess to activity which belongs to SM1E1 .
follwoing code i used . But it send notification to all activities
GetActivityTaskResult getActivityTaskResult = client.getActivityTask(new GetActivityTaskRequest()
.withActivityArn("arn detail"));
if (getActivityTaskResult.getTaskToken() != null) {
try {
JsonNode json = Jackson.jsonNodeOf(getActivityTaskResult.getInput());
String outputResult = patientRegistrationActivity.setStatus(json.get("patientId").textValue());
System.out.println("outputResult " + outputResult);
SendTaskSuccessRequest sendTaskRequest = new SendTaskSuccessRequest().withOutput(outputResult)
.withTaskToken(getActivityTaskResult.getTaskToken());
client.sendTaskSuccess(sendTaskRequest);
} catch (Exception e) {
client.sendTaskFailure(
new SendTaskFailureRequest().withTaskToken(getActivityTaskResult.getTaskToken()));
}

As far as I know you have no control over which task token is returned. You may get one for SM1E1 or SM1E2 and you cannot tell by looking at the task token. GetActivityTask returns "input" so based on that you may be able to tell which execution you are dealing with but if you get a token you are not interested in, I don't think there's a way to put it back so you won't be able to get it again with GetActivityTask later. I guess you could store it in a database somewhere for use later.
One idea you can try is to use the new callback integration pattern. You can specify the Payload parameter in the state definition to include the task token like this token.$: "$$.Task.Token" and then use GetExecutionHistory to find the TaskScheduled state of the execution you are interested in and retrieve the parameters.Payload.token value and then use that with sendTaskSuccess.
Here's a snippet of my serverless.yml file that describes the state
WaitForUserInput: #Wait for the user to do something
Type: Task
Resource: arn:aws:states:::lambda:invoke.waitForTaskToken
Parameters:
FunctionName:
Fn::GetAtt: [WaitForUserInputLambdaFunction, Arn]
Payload:
token.$: "$$.Task.Token"
executionArn.$: "$$.Execution.Id"
Next: DoSomethingElse

I did a POC to check and below is the solution .
if token is consumed by getActivityTaskResult.getTaskToken() and if your conditions not satisfied by request input then you can use below line to avoid token consumption .awsStepFunctionClient.sendTaskHeartbeat(new SendTaskHeartbeatRequest().withTaskToken(taskToken))

Related

Duplicates on Apache Beam / Dataflow inputs even when using withIdAttribute

I am trying to ingest data from a 3rd party API into a Dataflow pipeline. Since the 3rd party doesn't make webhooks available, I wrote a custom script that constantly polls their endpoint for more data.
The data is refreshed every 15 minutes, but since I don't want to miss any datapoints and I want to consume as soon as new data is available, my "crawler" runs every 1 minute. The script then sends the data to a PubSub topic. Easy to see that PubSub will receive about 15 repeated messages for each datapoint in the source.
My first attempt to identify and discard those repeated messages was to add a custom attribute to each PubSub message (eventid), created from a hash of its [ID + updated_time] at source.
const attributes = {
eventid: Buffer.from(`${item.lastupdate}|${item.segmentid}`).toString('base64'),
timestamp: item.timestamp.toString()
};
const dataBuffer = Buffer.from(JSON.stringify(item))
publisher.publish(dataBuffer, attributes)
Then I configured Dataflow with a withIdAttribute() (which is the new idLabel(), based on Record IDs).
PCollection<String> input = p
.apply("ReadFromPubSub", PubsubIO
.readStrings()
.fromTopic(String.format("projects/%s/topics/%s", options.getProject(), options.getIncomingDataTopic()))
.withTimestampAttribute("timestamp")
.withIdAttribute("eventid"))
.apply("OutputToBigQuery", ...)
With that implementation, I was expecting that when the script sends the same datapoint a second time, the repeated eventid would be the same and the message discarded. But for some reason, I still see duplicates on the output dataset.
Some questions:
Is there a clever way to ingest the data to dataflow from that 3rd party API if they don't provide webhooks?
Any ideas on why dataflow is not discarding the messages on this situation?
I know about the 10-minute restriction for deduplication on dataflow, but I see duplicated data even on the 2nd insertion (2 minutes).
Any help will be greatly appreciated!
I think you are on the right track, instead of the hash I recommend to use timestamps. A better way to to this is by using windows. Review this document which filters data that is outside of the window.
Regarding the additional duplicate data, if you are using pull subscriptions and the acknowledgement deadline is reached before having the data processed the message will be resent as per the at-least-once delivery. In this case change the acknowledgement deadline, the defaults is 10 seconds.

How to schedule Laravel 5 job to get data from external JSON file, and store value in database?

I'm currently working on a project in Laravel, and I want to schedule a job that grabs a value (the price of Bitcoin) from an external API (JSON file) and stores this value in my database every few minutes.
So far, I have created a job using the artisan command: artisan make:job UpdateBitcoinMarketPrice. But I've no idea what to include in my public function handle() in side of the Job class that was created.
I have fathomed that I can call this job regularly from App\Console\Kernel.php with the following function:
protected function schedule(Schedule $schedule){
// $schedule->command('inspire')
// ->hourly();
$schedule->job(new UpdateBitcoinMarketPrice)->everyFiveMinutes();}
Should I, for example, create a new Model that stores said value? Then create a new Object every-time this run?
Should I then call the first row of the database should I wish to return the value?
Job classes are very simple, normally containing only a handle() method which is called when the job is processed by the queue. You can use the contructor to inject any parameter or serialize a model so you can use it in your handle method.
So to keep it bold you can make the api call on the handle method and store the response in the databse. Knowing that this is going to fire the api call as a background job.
Something along the lines of:
public function __construct(User $user)
{
//In this case Laravel serilizes the User model example so you could use it on your background job.
//This can be anything that you need in order to make the call
$this->user = $user;
}
//Injecting as an example ExtrernalServieClass or Transformer(to transform api response).
public function handle(ExternalServiceClass $service, Transformer $transform)
{
//Here you can make the call to the api.
//Get the response parse it
// Store to database
$response = $service->postRequest($someUri, $someParams);
$parsedResponse = $transform->serviceResponse($response);
DatabaseModel::firstOrCreate($parsedResponse);
}
}
The handle method is called when the job is processed by the queue. Note that you are able to type-hint dependencies on the handle method of the job, like in the example above. The Laravel service container automatically injects these dependencies.
Now since you are going to run the job everyFiveMinutes() you have to be careful since if the previous job is not completed by default, scheduled tasks will be run even if the previous instance of the task is still running.
To prevent this, you may use the withoutOverlapping method:
$schedule->job(new UpdateBitcoinMarketPrice)->everyFiveMinutes()->>withoutOverlapping();

Postman/Newman junit report customization

I'm using postman and newman to perform automated tests and I do a JUnit export in order to exploit them in TFS.
However, when I open my .xml report, failures are indicated as follows:
-<failure type="AssertionFailure">
-<![CDATA[Failed 1 times.]]>
</failure>
I would like to know if it is possible to customize the "Failed 1 times." information in order to pass more relevant data about the failure (ie. json body error and description)
Thank you
Alexandre
Well, finally I found out how to proceed (not a clean way but sufficient for my purpose, so far):
I impact the file C:\Users\<myself>\AppData\Roaming\npm\node_modules\newman\lib\reporters\junit\index.js
Request's data and response can be recovered from 'executions' object:
stringExecutions = JSON.stringify(executions); //provide information about the arguments of the object "executions"
from this I can take general information by json-parsing this element and extracting what I want:
jsonExecutions = JSON.parse(stringExecutions)
jsonExecutions[0].response._details.code // gives me the http return code,
jsonExecutions[0].response._details.name // gives me the status,
jsonExecutions[0].response._details.detail //gives a bit more details
Error data (at test case/testsuite level) can be recovered from the 'err.error' object:
stringData = JSON.stringify(err.error); jsonData = JSON.parse(stringData);
from that I extract the data I need, ie.
jsonData.name // the error type
jsonData.message // the error detail
jsonData.stacktrace // the error stack
by the way, in the original file, stack cannot be displayed as there is no 'stack' argument in error.err (it is named 'stacktrace').
Finally failure data (at test step/testcase level) can be recovered from the 'failures' object:
stringFailure = JSON.stringify(failures); jsonFailure = JSON.parse(stringFailure);
from this I extract:
jsonFailure[0].name // the failure type
jsonFailure[0].stack // the failure stack
For my purpose, I add response details from jsonExecutions to my testsuite error data, which is much more verbose in the XML report than previousely.
If there is a cleaner/smarter way to perform this, do not hesitate to tell me, I'll be grateful
Next step : do it clean by creating a custom reporter. :)
Alexandre

jMeter not applying variable to header manager

I'm using Apache JMeter 3.2 r1790748 on Mac.
I have a setUp Thread Group making an authentication call. The call works and outputs the tokens correctly. Now I need to pass that token to the HTTP Header Manager for all the calls I'm making.
First of all, here's my token json output:
{
"access_token": "aaaaaa555555555",
"token_type": "Access",
"user_id": "5555"
}
Here's what my HTTP Header manager looks like:
1 value: Authorization : Bearer ${access_token}
My network call:
GET https://my_server.com/some_path
GET data:
[no cookies]
Request Headers:
Connection: close
Authorization: Bearer ${access_token}
Host: my_server.com
User-Agent: Apache-HttpClient/4.5.3 (Java/1.8.0_91)
As you can see, the variable access_token is not being replaced with the value from the setup call.
What I've tried:
BeanShell PostProcessor:
I created this script, and it actually parses and outputs the access_token properly:
import org.apache.jmeter.protocol.http.control.Header;
import net.minidev.json.JSONObject;
import net.minidev.json.parser.JSONParser;
String jsonString = prev.getResponseDataAsString();
log.info("jsonString = " + jsonString);
JSONParser parser = new JSONParser(JSONParser.MODE_JSON_SIMPLE);
JSONObject json = (JSONObject) parser.parse(jsonString);
String access_token = json.getAsString("access_token");
log.info("access_token = " + access_token);
vars.put("access_token", access_token);
JSON Extractor:
Apply to: Main sample and sub-samples
Variable names: access_token
JSON Path expressions: access_token
Match No. (0 for Random): 1
Compute concatenation var (suffix _ALL): unchecked
Default Values: none
Any ideas as to why the header manager is not applying the value of the access_token result?
Thanks!
Since you set a variable in setUp Thread Group, you cannot use it in another thread groups, since thread groups don't share variables, only properties.
So in order to pass authentication, you need to save it as a property:
${__setProperty(access_token, ${access_token})};
In this example I am using value of variable named access_token (already set, but only available in setUp thread group) to set property with the same name, which will be available across thread groups. Or change BeanShell post-processor, add:
props.put("access_token", access_token);
And then in the other thread group, you retrieve it using __P or __property function:
${__P(access_token)}
Also keep in mind that HTTP Header Manager initializes before any thread starts, so you can't use variables there for that reason too. Check this question for instance.
If you still see empty value, I recommend adding Debug Sampler (with both JMeter Properties and JMeter Variables enabled) in both thread groups, and checking where the breakage is (on saving or retrieving).
As per Functions and Variables chapter of the JMeter User Manual
Variables are local to a thread; properties are common to all threads, and need to be referenced using the __P or __property function
So the variable you define in the setUp Thread Group cannot be accessed by:
other threads in the same Thread Group
other threads outside the Thread Group
So my recommendations are:
Switch to JMeter Properties instead of Jmeter Variables, JMeter Properties are global to all threads and in fact the whole JVM instance
Switch to JSR223 PostProcessor with Groovy language instead of Beanshell PostProcessor, JSR223 Elements performance is much better, moreover Groovy has built-in JSON support.
So:
The relevant Groovy code for getting access_token attribute value and storing it into the relevant property would be :
props.put('access_token', new groovy.json.JsonSlurper().parse(prev.getResponseData()).access_token)
You can refer the value in the HTTP Header Manager (or wherever you require) as:
${__P(access_token,)}

Exceptions in Yesod

I had made a daemon that used a very primitive form of ipc (telnet and send a String that had certain words in a certain order). I snapped out of it and am now using JSON to pass messages to a Yesod server. However, there were some things I really liked about my design, and I'm not sure what my choices are now.
Here's what I was doing:
buildManager :: Phase -> IO ()
buildManager phase = do
let buildSeq = findSeq phase
jid = JobID $ pack "8"
config = MkConfig $ Just jid
flip C.catch exceptionHandler $
runReaderT (sequence_ $ buildSeq <*> stages) config
-- ^^ I would really like to keep the above line of code, or something like it.
return ()
each function in buildSeq looked like this
foo :: Stage -> ReaderT Config IO ()
data Config = MkConfig (Either JobID Product) BaseDir JobMap
JobMap is a TMVar Map that tracks information about current jobs.
so now, what I have are Handlers, that all look like this
foo :: Handler RepJson
foo represents a command for my daemon, each handler may have to process a different JSON object.
What I would like to do is send one JSON object that represents success, and another JSON object that espresses information about some exception.
I would like foos helper function to be able to return an Either, but I'm not sure how I get that, plus the ability to terminate evaluation of my list of actions, buildSeq.
Here's the only choice I see
1) make sure exceptionHandler is in Handler. Put JobMap in the App record. Using getYesod alter the appropriate value in JobMap indicating details about the exception,
which can then be accessed by foo
Is there a better way?
What are my other choices?
Edit: For clarity, I will explain the role ofHandler RepJson. The server needs some way to accept commands such as build stop report. The client needs some way of knowing the results of these commands. I have chosen JSON as the medium with which the server and client communicate with each other. I'm using the Handler type just to manage the JSON in/out and nothing more.
Philosophically speaking, in the Haskell/Yesod world you want to pass the values forward, rather than return them backwards. So instead of having the handlers return a value, have them call forwards to the next step in the process, which may be to generate an exception.
Remember that you can bundle any amount of future actions into a single object, so you can pass a continuation object to your handlers and foos that basically tells them, "After you are done, run this blob of code." That way they can be void and return nothing.