Clean up failed maps - exception

My mapper will write some data to local disks and clean it up when mapper finishes. However, the cleanup() method won't be called if error occurs (exception happens).
I can catch exception inside my mapper but I can't handle the exception which is not invoked in my mapper ( Ex: Job tracker failover to standby node).
Is there any way that I can cleanup when the mapper get fails?

You can override the run method of mapper to include a try / catch around the iteration of input keys from the context and ensure that cleanup is called:
#Override
public void run() {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
You'll need to make sure that your cleanup method doesn't have any logic in it to try and output records, or set a flag in your mapper to denote that an error occurred.
This may not protect against all types of task failure (JVM crash for example), for which i don't think you have any other method, other than to maybe run a job after the original job whose role is to ensure the resources used are properly cleaned up.

Using the job class you can definitely delete some folders if the job finishes, even if the directories are in the local filesystem, use the FileSystem class
More on filesystems in hadoop

Related

Exception handling onException() using Apache Camel

I am trying to handle Exception in my code. Below is the code:
public void configure() throws Exception {
onException(Exception.class).process(new Processor() {
public void process(Exchange exchange) throws Exception {
System.out.println("handling ex");
}
}).log("Received body").handled(true).end();
from("file:src/main/resources?fileName=data.csv")
.process(new MyTransformRevised1())
.to("file:src/main/resources/?fileName=emp.xml")
.split(body().tokenizeXML("equityFeeds", null)).streaming().to("jms:queue:xml.inbound.topic");
}
Now if suppose in the line: from("file:src/main/resources?fileName=data.csv") if the file "data.csv" is not present in my resources folders then shouldn't it throw FileNotFoundException and go into the onException() handler and print "handling ex"? Also when this code runs my log statement is also not printed to the console.
Currently its not going into the onException() handler. I am failing to understand as to why? Please kindly help me solve this issue.
As #bedla already commented, you will not get a FileNotFoundException because `from("file:...") creates a file consumer that is listening continuously for new files in the directory you configure.
The filename option acts as a filter. That means your file consumer processes only files with the name data.csv.
So if you drop a file with this name, it will be consumed and moved to a subfolder (I think the default name is .camel). Then you can drop another data.csv into the folder and it will be consumed too. If you remove the filename option, it consumes every file you drop into the folder.
If you don't want to have a continuous file import from a folder, but want to import a specific file as part of your processing workflow, have a look at Camel Poll Enrich EIP. There is a simple example that imports a file.

JUnit Test is not returning the full StackTrace from UUT Class

I have the following Junit test
#Before
public void setup() {
UUTClass myObject = new UUTClass();
}
#After
public void teardown() {
// cleanup
}
#Test
public void testSomeMethod() {
myObject.invokeSomeMethod(); // This is throwing NPE from somewhere inside my UUTClass
}
I don't want to use expected=NullPointerException.class what I want is log the stack trace from my UUTClass. All I can see is a NullPointerException one liner in my test method.
I am using Log4j to log everything and I have got log4j.xml in the search path which does get picked up by Initializer. I can see my logger messages being printed for other items.
How to enable Junit to return/propagate the full stack trace for NPE that's been thrown by my UUTClass? I believe it's probably because the errors are redirected to error console and perhaps not being picked up by log4j (don't know if I explained that right).
As a workaround, I will probably use printStackTrace() for now. but ideally I would like to log it if possible.
My Intention is the first understand where in the design I have missed out conditions/constraints which resulted into this NPE and then build more tests to expect certain exceptions.
Regards,

How to stop a flink streaming job from program

I am trying to create a JUnit test for a Flink streaming job which writes data to a kafka topic and read data from the same kafka topic using FlinkKafkaProducer09 and FlinkKafkaConsumer09 respectively. I am passing a test data in the produce:
DataStream<String> stream = env.fromElements("tom", "jerry", "bill");
And checking whether same data is coming from the consumer as:
List<String> expected = Arrays.asList("tom", "jerry", "bill");
List<String> result = resultSink.getResult();
assertEquals(expected, result);
using TestListResultSink.
I am able to see the data coming from the consumer as expected by printing the stream. But could not get the Junit test result as the consumer will keep on running even after the message finished. So it did not come to test part.
Is thre any way in Flink or FlinkKafkaConsumer09 to stop the process or to run for specific time?
The underlying problem is that streaming programs are usually not finite and run indefinitely.
The best way, at least for the moment, is to insert a special control message into your stream which lets the source properly terminate (simply stop reading more data by leaving the reading loop). That way Flink will tell all down-stream operators that they can stop after they have consumed all data.
Alternatively, you can throw a special exception in your source (e.g. after some time) such that you can distinguish a "proper" termination from a failure case (by checking the error cause). Throwing an exception in the source will fail the program.
In your test you can start job execution in a separate thread, wait some time allowing it for data processing, cancel the thread (it will interrupt the job) and the make the assrtions.
CompletableFuture<Void> handle = CompletableFuture.runAsync(() -> {
try {
environment.execute(jobName);
} catch (Exception e) {
e.printStackTrace();
}
});
try {
handle.get(seconds, TimeUnit.SECONDS);
} catch (TimeoutException e) {
handle.cancel(true); // this will interrupt the job execution thread, cancel and close the job
}
// Make assertions here
Can you not use isEndOfStream override within the Deserializer to stop fetching from Kafka? If I read correctly, the flink/Kafka09Fetcher has the following code in its run method which breaks the event loop
if (deserializer.isEndOfStream(value)) {
// end of stream signaled
running = false;
break;
}
My thought was to use Till Rohrmann's idea of a control message in conjunction with this isEndOfStream method to tell the KafkaConsumer to stop reading.
Any reason that will not work? Or maybe some corner cases I'm overlooking?
https://github.com/apache/flink/blob/07de86559d64f375d4a2df46d320fc0f5791b562/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java#L146
Following #TillRohrman
You can combine the special exception method and handle it in your unit test if you use an EmbeddedKafka instance, and then read off the EmbeddedKafka topic and assert the consumer values.
I found https://github.com/asmaier/mini-kafka/blob/master/src/test/java/de/am/KafkaProducerIT.java to be extremely useful in this regard.
The only problem is that you will lose the element that triggers the exception but you can always adjust your test data to account for that.

How to check ActiveMQ queues in unit test using JUnit Rule with EmbeddedActiveMQBroker

I created an Integration test (based on apache camel and blueprint) that sends some messages to an ActiveMQ service on my machine.
Via the admin-web interface i can check if my messages arrived. To decouple from a locally running ActiveMQ i am now using the EmbeddedActiveMQBroker with JUnit Rule (followed instructions from here):
#Rule
public EmbeddedActiveMQBroker broker = new EmbeddedActiveMQBroker() {
#Override
protected void configure() {
try {
this.getBrokerService().addConnector("tcp://localhost:61616");
} catch (Exception e) {
// noop test should fail
}
}
};
The test works fine as before.
But: Is there a way to check the number of (queued)messeages for a given queue? The test sends messages to the queue "q".
Your EmbeddedActiveMQBroker instance wraps around an ActiveMQ BrokerService object that is the real embedded ActiveMQ broker. Because you have access to that through the EmbeddedActiveMQBroker instance you have access to all the stats maintained by the broker via the AdminView (broker.getBrokerService().getAdminView())
From there you can get all sorts of useful info like number of subscriptions, number of Queues etc. All this data is kept in the broker's JMX management context tree so standard JMX applies. One easy way to get info on number of messages in a Queue then is to lookup the Queue in the Broker's management context using code similar to the following:
// For this example the broker name is assumed to be "localhost"
protected QueueViewMBean getProxyToQueue(String name) throws MalformedObjectNameException, JMSException {
ObjectName queueViewMBeanName = new ObjectName("org.apache.activemq:type=Broker,brokerName=localhost,destinationType=Queue,destinationName="+name);
QueueViewMBean proxy = (QueueViewMBean) brokerService.getManagementContext()
.newProxyInstance(queueViewMBeanName, QueueViewMBean.class, true);
return proxy;
}
From there you can use the QueueViewMBean to see what's in the Queue:
QueueViewMBean queueView = getProxyToQueue("myQueue");
LOG.info("Number of messages in my Queue:{}", queueView.getQueueSize());
It looks as though the current implementation disables JMX by default which is unfortunate but can be worked around. You have to give the embedded broker instance a configuration URI which is either a string containing the connector to add or an xbean configuration file.
One option would be to do something along these lines (note the useJmx=true):
#Rule
public EmbeddedActiveMQBroker broker = new EmbeddedActiveMQBroker("broker:(tcp://0.0.0.0:0)/localhost?useJmx=true&persistent=false");

Need to perform RequestBasedLogging

I have a need to implement request based logging.
Based on header - log-level-header.
In my code, I am using JAX-RS and have implemented ContainerRequestFilter.
#Override
public void filter(final ContainerRequestContext context) throws IOException {
String log_level = context.getHeaderString("log-level-header");
//translate to actual log level
Logger root = (Logger)LoggerFactory.getLogger(org.slf4j.Logger.ROOT_LOGGER_NAME);
root.setLevel(logLevelToSet);
}
I am using Logback and slf4j API.
The problem is that i am setting the log level to the RootLogger which is a singleton and hence ends up modifying the log level across application.
Instead I intend to change the log level for a particular thread (RequestBasedLogging). Is it achievable and how?
Yes, this is achievable via TurboFilters and MDC. The code in MDCFilter should be helpful as well.
The key is to understand MDC.