How to stop a flink streaming job from program - junit

I am trying to create a JUnit test for a Flink streaming job which writes data to a kafka topic and read data from the same kafka topic using FlinkKafkaProducer09 and FlinkKafkaConsumer09 respectively. I am passing a test data in the produce:
DataStream<String> stream = env.fromElements("tom", "jerry", "bill");
And checking whether same data is coming from the consumer as:
List<String> expected = Arrays.asList("tom", "jerry", "bill");
List<String> result = resultSink.getResult();
assertEquals(expected, result);
using TestListResultSink.
I am able to see the data coming from the consumer as expected by printing the stream. But could not get the Junit test result as the consumer will keep on running even after the message finished. So it did not come to test part.
Is thre any way in Flink or FlinkKafkaConsumer09 to stop the process or to run for specific time?

The underlying problem is that streaming programs are usually not finite and run indefinitely.
The best way, at least for the moment, is to insert a special control message into your stream which lets the source properly terminate (simply stop reading more data by leaving the reading loop). That way Flink will tell all down-stream operators that they can stop after they have consumed all data.
Alternatively, you can throw a special exception in your source (e.g. after some time) such that you can distinguish a "proper" termination from a failure case (by checking the error cause). Throwing an exception in the source will fail the program.

In your test you can start job execution in a separate thread, wait some time allowing it for data processing, cancel the thread (it will interrupt the job) and the make the assrtions.
CompletableFuture<Void> handle = CompletableFuture.runAsync(() -> {
try {
environment.execute(jobName);
} catch (Exception e) {
e.printStackTrace();
}
});
try {
handle.get(seconds, TimeUnit.SECONDS);
} catch (TimeoutException e) {
handle.cancel(true); // this will interrupt the job execution thread, cancel and close the job
}
// Make assertions here

Can you not use isEndOfStream override within the Deserializer to stop fetching from Kafka? If I read correctly, the flink/Kafka09Fetcher has the following code in its run method which breaks the event loop
if (deserializer.isEndOfStream(value)) {
// end of stream signaled
running = false;
break;
}
My thought was to use Till Rohrmann's idea of a control message in conjunction with this isEndOfStream method to tell the KafkaConsumer to stop reading.
Any reason that will not work? Or maybe some corner cases I'm overlooking?
https://github.com/apache/flink/blob/07de86559d64f375d4a2df46d320fc0f5791b562/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java#L146

Following #TillRohrman
You can combine the special exception method and handle it in your unit test if you use an EmbeddedKafka instance, and then read off the EmbeddedKafka topic and assert the consumer values.
I found https://github.com/asmaier/mini-kafka/blob/master/src/test/java/de/am/KafkaProducerIT.java to be extremely useful in this regard.
The only problem is that you will lose the element that triggers the exception but you can always adjust your test data to account for that.

Related

Don't let test stop on failure

I'm looking for the best practice for following (simplified) scenario:
#Test
public void someTest() {
for(String someText : someTexts) {
Assert.true(checkForValidity(someText));
}
}
This test iterates through x-thousands of texts and in this case I don't want it to be stopped for each failure. I want the errors to be buffered and in case of error(s) to fail at the end. Has JUnit got something on board for for my aim?
First of all, it's not really the correct way to implement this. JUnit allows parametrizing tests by defining a collection of inputs/outputs with the Parametrized test runner. Doing it this way ensures that each test case becomes a unique instance, making test report clearly state which samples passed and which ones failed.
If you still insist on doing it your way you should have a look at AssertJ's Soft Assertions which allow "swallowing" individual assertion failures, accumulating them and only reporting after the test is finished. The linked documentation section uses a nice example and is definitely worth reading.

Grails 2.4.4: How to reliably rollback in a complex service method

Consider the following service (transactional by default). A player must always have one account. A player without at least one corresponding account is an error state.
class playerService {
def createPlayer() {
Player p new Player(name: "Stephen King")
if (!p.save()) {
return [code: -1, errors:p.errors]
}
Account a = new Account(type: "cash")
if (!a.save()) {
// rollback p !
return [code: -2, errors:a.errors]
}
// commit only now!
return [code: 0, player:p]
}
}
I have seen this pattern by experienced grails developers, and when I tell them that if creation of the account of the player fails for any reason, it wont rollback the player, and will leave the DB in an invalid state, they look at me like I am mad because grails handles rolling back the player because services are transactional right?
So then, being a SQL guy, I look for a way to call rollback in grails. There isn't one. According to various posts, there are only 2 ways to force grails to rollback in a service:
throw an unchecked exception. You know what this is right?
don't use service methods or transactional annotations, use this construct:
.
DomainObject.withTransaction {status ->
//stuff
if (someError) {
status.setRollbackOnly()
}
}
1. throw an unchecked exception
1.1 So we must throw runtime exceptions to rollback. This is ok for me (I like exceptions), but this wont gel with the grails developers we have who view exceptions as a throwback to Java and is uncool. It also means we have to change the whole way the app currently uses its service layer.
1.2 If an exception is thrown, you lose the p.errors - you lose the validation detail.
1.3 Our new grails devs don't know the difference between an unchecked and an checked exception, and don't know how to tell the difference. This is really dangerous.
1.4. use .save(failOnError: true)
I am a big fan of using this, but its not appropriate everywhere. Sometimes you need to check the reason before going further, not throw an exception. Are the exceptions it can generate always checked, always unchecked, or either? I.e. will failOnError AWLAYS rollback, no matter what the cause? No one I have asked knows the answer to this, which is disturbing, they are using blind faith to avoid corrupted/inconsistent DBs.
1.5 What happens if a controller calls service A, which calls Service B, then service C. Service A must catch any exception and return a nicely formatted return value to the controller. If Service C throws an exception, which is caught by Service A, will service Bs transactions be rolled back? This is critical to know to be able to construct a working application.
UPDATE 1:
Having done some tests, it appears that any runtime exception, even if thrown and caught in some unrelated child calls, will cause everything in the parent to rollback. However, it is not easy to know in the parent session that this rollback has happened - you need to make sure that if you catch any exception, you either rethrow, or pass some notice back to the caller to show that it has failed in such a way that everything else will be rolled back.
2. withTransaction
2.1 This seems a bazaar construct. How do I call this, and what do I pass in for the "status" parameter? What is "setRollbackOnly" exactly. Why is it not just called "rollback". What is the "Only" part? It is tied to a domain object, when your method may want to do update several different domain objects.
2.2 Where are you supposed to put this code? In with the DomainObject class? In the source folder (i.e. not in a service or controller?)? Directly in the controller? (we don't want to duplicate business logic in the controllers)
3. Ideal situation.
3.1 The general case is we want every thing we do in a service method to roll back if anything in that service method cant be saved for any reason, or throws any exception for any reason (checked or unchecked).
3.2 Ideally I would like service methods to "always rollback, unless I explicitly call commit", which is the safest strategy , but this is not possible I believe.
The question is how do I achieve the ideal situation?
Will calling save(failOnError:true) ALWAYS rollback everything, no matter what the reason for failing? This is not perfect, as it is not easy for the caller to know which domain object save caused the issue.
Or do people define lots of exception classes which subclass runtimeException, then explicit catch each of them in the controller to create the appropriate response? This is the old Java way, and our groovy devs pooh pooh this approach due to the amount of boiler plate code we will have to write.
What methods do people use to achieve this?
I wouldn't call myself an expert, and this question is over a year old, but I can answer some of these questions, if only for future searchers. I'm just now refactoring some controllers to use services in order to take advantage of transactions.
I have seen this pattern by experienced grails developers, and when I tell them that if creation of the account of the player fails for any reason, it wont rollback the player, and will leave the DB in an invalid state, they look at me like I am mad because grails handles rolling back the player because services are transactional right?
I'm not seeing in the documentation where it explicitly states that returning from a service method does not rollback the transaction, but I can't imagine that this would be a very sane behavior. Still, testing is an easy way to prove yourself.
1.2 If an exception is thrown, you lose the p.errors - you lose the validation detail.
Since you're the one throwing the exception, you can throw the errors along with it. For instance:
// in service
if (!email.save()) {
throw new ValidationException("Couldn't save email ${params.id}", email.errors)
}
When you catch the exception, you reload the instance (because throwing an exception clears the session), put the errors back into the instance, and then pass that to the view as usual:
// in controller
} catch (ValidationException e) {
def email = Email.read(id)
email.errors = e.errors
render view: "edit", model: [emailInstance: email]
}
This is discussed under the heading "Validation Errors and Rollback", down the page from http://grails.github.io/grails-doc/2.4.4/guide/single.html#transactionsRollbackAndTheSession.
1.4. use .save(failOnError: true) I am a big fan of using this, but its not appropriate everywhere. Sometimes you need to check the reason before going further, not throw an exception. Are the exceptions it can generate always checked, always unchecked, or either? I.e. will failOnError AWLAYS rollback, no matter what the cause? No one I have asked knows the answer to this, which is disturbing, they are using blind faith to avoid corrupted/inconsistent DBs.
failOnError will cause save() to throw a ValidationException, so yes, if you're in a transaction and aren't checking that exception, the transaction will be rolled back.
Generally speaking, it seems to be un-"Grailsy" to use failOnError a lot, probably for the reasons you listed (e.g., lack of control). Instead, you check whether save() failed (if (!save()) ...), and take action based on that.
withTransaction
I'm not sure the point of this, because SpringSource really encourages the use of services for everything. I personally don't like it, either.
If you want to make a particular service non-transactional, and then make one method of it transactional, you can just annotate that one method with #Transactional (unless your developers also dislike annotations because they're too "Java" ;) ).
Note! As soon as you mark a single method with #Transactional, the overall service will become non-transactional.
3.1 The general case is we want every thing we do in a service method to roll back if anything in that service method cant be saved for any reason, or throws any exception for any reason (checked or unchecked).
I feel like checked exceptions are generally considered not "Groovy" (which also makes them not Grails-y). Not sure about the reason for that.
However, it looks like you can tell your service to rollback on your checked exceptions, by listing them in the rollbackFor option to #Transactional.
Or do people define lots of exception classes which subclass runtimeException, then explicit catch each of them in the controller to create the appropriate response? This is the old Java way, and our groovy devs pooh pooh this approach due to the amount of boiler plate code we will have to write.
The nice thing about Groovy is that you can write your boiler plate once and then call it repeatedly. A pattern I've seen a lot, and am currently using, is something like this:
private void validate(Long id, Closure closure) {
try {
closure()
} catch (ValidationException e) {
def email = Email.read(id)
email.errors = e.errors
render view: "edit", model: [emailInstance: email]
} catch (OtherException e) {
def email = Email.read(id)
flash.error = "${e.message}: ${e.reasons}"
render view: "show", model: [emailInstance: email]
} catch (Throwable t) {
flash.error = "Unexpected error $t: ${t.message}"
redirect action: "list"
}
}
And then call it in each controller action like so:
def update(Long id, Long version) {
withInstance(id, version) { Email emailInstance ->
validate(emailInstance.id) {
emailService.update(emailInstance, params)
flash.message = "Email $id updated at ${new Date()}."
redirect action: "show", id: emailInstance.id
}
}
}
(withInstance is another similar method that DRYs up the check for existence and optimistic locking.)
This approach has downsides. You get the same set of redirects in every action; you probably want to write one set of methods for each controller; and it seems kind of silly to pass a closure into a method and expect the method to know what exceptions the closure will throw. But hey, programming's all about tradeoffs, right?
Anyway, hope that is at least interesting.
If you have a service such as:
In a Grails 2 app, the recommended way would be to use transactionStatus.setRollbackOnly().
import grails.transaction.Transactional
Class RoleService {
#Transactional
Role save(String authority) {
Role roleInstance = new Role(authority: authority)
if ( !roleInstance.save() ) {
// log errors here
transactionStatus.setRollbackOnly()
}
roleInstance
}
}
See: https://github.com/grails/grails-core/issues/9212

Breeze EF6 SaveChanges doesn't propagate exceptions

In the EFContextProvider (EF6) SaveChangesCore method, the exception handling looks like this:
} catch (Exception e) {
while (e.InnerException != null) {
e = e.InnerException;
}
throw e;
}
This throws only the most internal exception and hides the relevant information revealed by the external exceptions.
When the SaveChanges process goes through multiple layers the next direct layer exception is lost, and only the last exception in the chain is thrown. It doesn't allow to handle well the exceptions for the caller.
Updated Post
As of Breeze 1.4.6, any .NET Exceptions thrown on the server are now available in their original form in the httpResponse.data property of any async breeze result. Breeze will still drill down to extract a "good" error message, but will no longer obscure the initial exception.
Original Post Below -------------------
It's an interesting point. The reason we did this was because most client side apps aren't written to navigate thru the exception chain and we wanted to expose the most 'relevant' error to the client. Most of the apps we looked at just exposed the client "error.message" property directly and with EF errors this was almost always useless.
However, your point is well taken. I think what we need to do is create a new Exception that has a top level message that is the innermost exception message but still expose the entire exception chain for those that want to drill. I've added an internal feature request for this and will try to get it into a near term release ( probably not the next one because we are already in testing for that one).
And thanks for the input.

Should I return null or throw an exception?

I found questions here Should a retrieval method return 'null' or throw an exception when it can't produce the return value? and Should functions return null or an empty object?, but I think my case is quite different.
I'm writing an application that consists of a webservice and a client. The webservice is responsible to access data, and return data to the client. I design my app like this:
//webservice
try
{
DataTable data = GetSomeData(parameter);
return data
}
catch (OopsException ex)
{
//write some log here
return null;
}
//client:
DataTable data = CallGetSomeData(parameter);
if(data == null)
{
MessageBox.Show("Oops Exception!");
return;
}
Well, there is a rule of not returning null. I don't think that I should just rethrow an exception and let the client catch SoapException. What's your comment? Is there better approach to solve this problem?
Thank you.
In your case, an exception has already been thrown and handled in some manner in your web service.
Returning null there is a good idea because the client code can know that something errored out in your web service.
In the case of the client, I think the way you have it is good. I don't think there is a reason to throw another exception (even though you aren't in the web service anymore).
I say this, because, technically, nothing has caused an error in your client code. You are just getting bad data from the web service. This is just a matter of handling potentially bad input from an outside source.
Personally, as a rule of thumb, I shy away from throwing exceptions when I get bad data since the client code can't control that.
Just make sure you handle the data == null condition in such a way that it doesn't crash your client code.
In general i try to design my webservices in such way that they return a flag of some sort that indicates whether there was a technical/functional error or not.
additionally i try to return a complex object for result not just a string, so that i can return things like:
result->Code = "MAINTENANCE"
result->MaintenanceTill = "2010-10-29 14:00:00"
so for a webservice that should get me a list of dataEntities i will return something like:
<result>
<result>
<Code>OK</Code>
</result>
<functionalResult>
<dataList>
<dataEntity>A</dataEntity>
</dataList>
</functionalResult>
</result>
so every failure that can occur behind my webservice is hidden in a error result.
the only exceptions that developers must care about while calling my webservice are the exceptions or errors that can occur before the webservice.
All the WebServices that I've used return objects, not simple data types. These objects usually contain a bool value named Success that lets you test very quickly whether or not to trust the data returned. In either event, I think any errors thrown should be untrappable (i.e. unintentional) and therefore signify a problem with the service itself.
I think there may be a few factors to consider when making a decision:
what is the idiomatic way to do this in the language your using (if it wasn't a webservice)
how good your soap/webservice library is (does it propogate exceptions or no)
what's the easiest thing for the client to do
I tend to make the client do the easiest, idiomatic thing, within the limitations of the library. If the client lib doesn't take care of auto restoring serialized exceptions I would probably wrap it with a lib that did so I could do the following.
Client:
try:
# Restore Serialized object, rethrow if exception
return CallGetSomeData(parameter);
except Timeout, e:
MessageBox.Show("timed out")
except Exception, e:
MessageBox.Show("Unknown error")
exit(1)
WebService:
try:
return GetSomeData(parameter) # Serialized
except Exception, e:
return e # Serialized
Your first problem is "a rule of not returning null". I would strongly suggest reconsidering that.
Returning a SoapException is a possibility, but like hacktick already mentioned, it would be better to return a complex object with a status flag {Success,Fail} with every response from the web service.
I think it all boils down to the question whether or not your client can use any info as to why no data was returned.
For example - if no data was returned because the (say sql) server that is called in GetSomeData was down, and the client can actually do something with that information (e.g. display an appropriate message) - you don't want to hide that information - throwing an error is more informative.
Another example - if parameter is null, and that causes an exception.. (although you probably should have taken care of that earlier in the code.. But you get the idea) - should have throw an appropriate (informative) exception.
If the client doesn't care at all why he didn't get any data back, you may return null, he'll ignore the error text anyhow and he's code will look the same..
If your client and service are running on different machines or different processes, it will be impossible to throw an error from the service and catch it on the client. If you insist on using exceptions, the best you can hope for is some proxy on the client to detect the error condition (either null or some other convention) and re-throw a new exception.
The general practice in handling exception is, when the sequence of flow is expected in the normal circumstance where as the sequence could not be completed due to non-availability of resources or expected input.
In your case, you still need to decide how do you want your client side code to react for null or exception.
How about passing in a delegate to be invoked when anything bad happens? The delegate could throw an exception if that's what the outside could would like, or let the function return null (if the outside code will check for that), or possibly take some other action. Depending upon the information passed to the delegate, it may be able to deal with problem conditions in such a way as to allow processing to continue (e.g. the delegate might set a 'retry' flag the first few times it's called, in case flaky network connections are expected). It may also be possible for a delegate to log information that wouldn't exist by the time an exception could get caught.
PS--It's probably best to pass a custom class to the problem-detected delegate. Doing that will allow for future versions of the method to provide additional information to the delegate, without breaking any implementations that expect the simpler information.
Exceptions are recommended in the same process space. Across processes, it is only through information that a success/failure is evaluated.
Since you are the client to your webservice, you can log the exception at the service layer and return null to the client, yet the client should still know if the CallGetSomeData returned null because a) data is not available, or b) there is a database exception as the table is locked. Hence its always good to know what has caused the error for easier reporting at client side. You should have a error code and description as part of your message.
If you are not consuming your webservice then you should definetly throw exception for the same reasons mentioned above, client should know what has happened and its upto them to decide to what to do with that.

Designing a class with **Exceptions**

When I design a class I often have trouble deciding if I should throw an exception or have 2 func with the 2nd returning an err value. In the case of 2 functions how should I name the exception and non exception method?
For example if I wrote a class that decompresses a stream and the stream had errors or incomplete I would throw an exception. However what if the app is trying to recover data from the stream and excepts an error? It would want a return value instead? So how should I name the 2nd function?
Or should I not have both an exception method and a nonexception method?
Or should I not have both an exception method and a nonexception method?
That. Unless you really have time to burn maintaining two separate but mostly-identical methods.
If you really need to allow for clients that won't consider errors exceptional, then just indicate them with a return value and be done with it... Otherwise, just write the exception-throwing version and let the odd error-eating client handle the exception.
It depends on the language, but...
In my opinion, two versions of each potentially failing method imposes too high a cognitive burden on the API user, and too high a burden on the API maintainer. My personal preference is for exceptions, since that's fewer parameters to remember the order of.
I believe that you should try to use exceptions even if you're not going to exit program in some cases. You just need to create a specific exception type for your errors. And catch only them when you need to do some spefic logic. All other exceptions will go to upper level of your code.
For example you've created function which throws exception on any error. And you don't want to exit program if user specified incorrect file name. Here is how it can look:
## this is top level try/catch block
try {
## your main code is here
...
## somewhere deep in your code
try {
## we trying to open file specified by user
}
catch (FileNotFoundException) {
## we are not going to exit on this error
## let's just show a user an error message
## and try to ask different file to open
}
} catch (Exception) {
## catch all exceptions here
## the best thing we can do here is save exception to log and quit }
}
We just need to create hierarchy of exceptions (if your language allows it):
Exception <-- MoreSpecificException
This approach is used in Java
Exceptions should normally be used for exceptional conditions, whatever they may be. Being unable to decompress a file is probably an exceptional condition (unless, for example, you're writing a program to scan for badly compressed files). Bad data may or may not be exceptional.
If you've got a class decompressing a stream, then what it should do is decompress the stream, and not try to interpret its contents. Another class should use the first class to get decompressed input, and do the interpretation. That gives you separation of functionality, and good cohesion. Avoid classes where you're tempted to put an "And" in their names: "DecompressAndParseInput" is a bad class name.
Given two classes, there's no particular reason why you have to use the same error-reporting
method for both. The decompressor could throw, and the parser could return an error code.
I would only throw an exception on the decompression function if the results were not usable. If they were usable, return the results and then the function that reads those results can throw an exception instead. IE
try
{
results = decompress(file); // only throws exceptions on non-usable files
} catch (FileNotFoundException) {
// file was not usable, can't recover anything, insert nuclear error handling here
}
try
{
read(results);
} catch (ErrorThatIsRecoverableException) {
partialRead(results);
}
so read() is the function you call on normal data, partialRead is the function that handles trying to recover screwed-up-but-still-usable data. And of course, you don't necessarily need exceptions or separate functions at all- you could do the error handling all within the read() function.