Kafka: How to fix TimeoutException error? - exception

I am using Kafka With Nifi and when I am ingesting a large file (over 100MB). I am getting a TimeOutException error. Kafka does not crash.
Reading about it, i need to increase the following property 'request.timeout.ms' that is set by default to 30ms.
What should this property be set for with bigger files that takes longer time to be ingested? How can I calculate it? i am using Confluent Kafka 5.3.1 in a production environment.
Thank you

With publishKafka and ConsumeKafka processors you can add dynamic properties to add kafka configuration to your consumer or producer.
so you can add a property like this :
key : request.timeout.ms
value : the value you want
you can also configure back pressure in nifi connection to secure your environement.
you should check nifi app-logs, maybe your kafka is down.

Related

How can I externalize ISchedulerExecutorService to run tasks in an external hazelcast cluster(Hazecast 5.2) without using UserCodeDeployment?

I am working on externalizing our IScheduledExecutorService so I can run tasks externally on a external cluster. I am able to write a test and get the Runnable to actually run ONLY if I turn on UserCode deployment. If I want to change this task at all and run the tests again I get the below in my external cluster member's logs..
java.lang.IllegalStateException: Class com.mycompany.task.ScheduledTask is already in local cache and has conflicting byte code representation
I want to be able to change the task if I could and redeploy to Hazelcast to just handle it. I do this kind of thing with our external maps now. It can handle different versions of our objects using compact serialization.
Am I stuck using user code deployment for these functional objects? If I need to make a change to it I need to change the class name and redeploy to production. I'm hoping to get this task right the first time and not have to ever do that but I have a way of handling it if I do.
The cluster is already running in production and I'll have to add the following to each member
HZ_USERCODEDEPLOYMENT_ENABLED=true
and the appropriate client code(listed below) to enable this.
What I've done...
Added the following to my local docker file
HZ_USERCODEDEPLOYMENT_ENABLED=true
and also in the code that creates a hazelcast client connecting to my external cluster with
ClientConfig clientConfig = new ClientConfig(); ClientUserCodeDeploymentConfig clientUserCodeDeploymentConfig = new ClientUserCodeDeploymentConfig(); clientUserCodeDeploymentConfig.addClass("com.mycompany.task.ScheduledTask"); clientUserCodeDeploymentConfig.setEnabled(true); clientConfig.setUserCodeDeploymentConfig(clientUserCodeDeploymentConfig);
However, if I remove those two pieces I get the following Exception with a failing test. It doesn't know about my class at all.
com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: com.mycompany.task.ScheduledTask
Side Note:
We are using compact serialization for several maps already and when I try to configure this Runnable task via compact serialization I get the below error. I don't think that's the right approach either.
[Scheduler: myScheduledExecutorService][Partition: 121][Task: 7afe68d5-3185-475f-b375-5a82a7088de3] Exception occurred during run
java.lang.ClassCastException: class com.hazelcast.internal.serialization.impl.compact.DeserializedGenericRecord cannot be cast to class java.lang.Runnable (com.hazelcast.internal.serialization.impl.compact.DeserializedGenericRecord is in unnamed module of loader 'app'; java.lang.Runnable is in module java.base of loader 'bootstrap')
at com.hazelcast.scheduledexecutor.impl.ScheduledRunnableAdapter.call(ScheduledRunnableAdapter.java:49) ~[hazelcast-5.2.0.jar:5.2.0]
at com.hazelcast.scheduledexecutor.impl.TaskRunner.call(TaskRunner.java:78) ~[hazelcast-5.2.0.jar:5.2.0]
at com.hazelcast.internal.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:64) ~[hazelcast-5.2.0.jar:5.2.0]

MUnit test fails - Cannot process event as “FileConnector” is stopped

I am implementing Munit for a flow which involves Mule Requester. This mule requester would be picking up a file.
So, when i run the java class as Junit, it throws out an exception as, Cannot perform the operation on the FileConnector as it is stopped.
The expression used in mule requester is ,
file ://${path}?connector=FileConnector
I have also defined a global file connector.
Please let me know how to resolve this issue.
Thank you.
All connectors and inbound-endpoints are disabled by default in MUnit. This is to prevent flow accidentally processing/generating real data. (Some explanation here). For the same reason File Connector is also disabled.
To enable connectors, you need to override a method in your MUnitsuite as below -
#Override
protected boolean haveToMockMuleConnectors() {
return false;
}
For XML Munit, see this to enable connectors.
Note: This will enable and start all the connectors that you are using in your mule-configs under test. If you have SMTP connector, DB connector, MQ connector etc, they all be started during test, so use it with caution.
Check whether the file connector is defined in the files you loaded for munit.
<spring:beans>
<spring:import resource="classpath:api.xml"/>
</spring:beans>
You may also try mocking the mule requester.

Play 2 Framework - SSL Certificates from a keystore

I am trying to configure a key store in the Play server.
I was able to do this sucessfuly by define command line paramters Dhttps.keyStore and https.keyStorePassword as shown below.
... -Dhttps.keyStore="C:/tempKS/myserver.jks" -Dhttps.keyStorePassword="xxxxx" ...
My question is how to define these two properties in the application.conf instead of passing as parameters at the command line.
I tried this in the application.conf but server didn't pick those values.
https.keyStore="C:/tempKS/myserver.jks"
https.keyStorePassword="xxxxx"
Take a look at https://github.com/typesafehub/activator-play-tls-example/blob/master/app/https/CustomSSLEngineProvider.scala and set it to read from the application's Configuration object.

spring batch: Dump a set of queries over a database in parallel to flat files

So my scenario drilled down to the essence is as follows:
Essentially, I have a config file containing a set of SQL queries whose result sets need to be exported as CSV files.
Since some queries may return billions of rows, and because something may interrupt the process (bug, crash, ...), I want to use a framework such as spring batch, which gives me restartabilty and job monitoring.
I am using a file based H2 database for persisting spring batch jobs.
So, here are my questions:
Upon creating a Job, I need to provide my RowMapper some initial configuration. So what happens when a job needs to be restarted after a e.g. crash? Concretly:
Is the state of the RowMapper automatically persisted, and upon restart Spring batch will try to restore the object from its database, or
will the RowMapper object be used that is part of the original spring batch XML config file, or
I have to maintain the RowMapper's state using the step's/job's ExecutionContext?
Above question is related to whether there is magic going on when using the spring batch XML configuration, or whether I could as well create all these beans in a programmatic way:
Since I need to parse my own config format into a spring batch job config, I rather just use spring batch's Java classes (beans) and fill them out appropriately, rather attempting to manually write out valid XML. However, if my Job crashes, I would create all the beans myself again. Does spring batch automagically restore the Job state from its database?
If I really need XML, is there a way to serialize a spring-batch JobRepository (or one of these objects) as a spring batch XML config?
Right now, I tried to configure my Step with the following code - but I am unsure if this is the proper way to do this:
Is TaskletStep the way to go?
Is the way I create the chunked reader/writer correct, or is there some other object which I should use instead?
I would have assumed that opening of the reader and writer would occur automatically as part of the JobExecution, but if I don't open these resources prior to running the Job, I get an exception telling me that I need to open them first. Maybe I need to create some other object that manages the resoures (jdbc connection and file handle)?
JdbcCursorItemReader<Foobar> itemReader = new JdbcCursorItemReader<Foobar>();
itemReader.setSql(sqlStr);
itemReader.setDataSource(dataSource);
itemReader.setRowMapper(rowMapper);
itemReader.afterPropertiesSet();
ExecutionContext executionContext = new ExecutionContext();
itemReader.open(executionContext);
FlatFileItemWriter<String> itemWriter = new FlatFileItemWriter<String>();
itemWriter.setLineAggregator(new PassThroughLineAggregator<String>());
itemWriter.setResource(outResource);
itemWriter.afterPropertiesSet();
itemWriter.open(executionContext);
int commitInterval = 50000;
CompletionPolicy completionPolicy = new SimpleCompletionPolicy(commitInterval);
RepeatTemplate repeatTemplate = new RepeatTemplate();
repeatTemplate.setCompletionPolicy(completionPolicy);
RepeatOperations repeatOperations = repeatTemplate;
ChunkProvider<Foobar> chunkProvider = new SimpleChunkProvider<Foobar>(itemReader, repeatOperations);
ItemProcessor<Foobar, String> itemProcessor = new ItemProcessor<Foobar, String>() {
/* Custom implemtation */ };
ChunkProcessor<Foobar> chunkProcessor = new SimpleChunkProcessor<Foobar, String>(itemProcessor, itemWriter);
Tasklet tasklet = new ChunkOrientedTasklet<QuadPattern>(chunkProvider, chunkProcessor); //new SplitFilesTasklet();
TaskletStep taskletStep = new TaskletStep();
taskletStep.setName(taskletName);
taskletStep.setJobRepository(jobRepository);
taskletStep.setTransactionManager(transactionManager);
taskletStep.setTasklet(tasklet);
taskletStep.afterPropertiesSet();
job.addStep(taskletStep);
Most of you questions are really complex and can be difficult give a good answer without write a long paper.
I'm new with spring-batch as you, and I found a lot of really useful info - and all the answers to your questions - reading Spring batch in action: it's completed, well explained, full of example and cover all aspects of framework (reader/writer/processor, job/tasklet/chunk lifecycle/persistence, tx/resources management, job flow, integration with other service, partitioning, restarting/retry, failure management and a lot of interesting things).
Hope to help

setting hadoop job configuration programmatically

I am getting OOM exception (Java heap space) for reduce child. I read in the documentation that increasing the value of mapred.reduce.child.java.opts to -Xmx512M or more would help. Since I am not the admin, I cannot change that value in mapred-site.xml. I would like to set that value only for my job through the java program. I tried setting it using Configuration class as follows, but that didn't work.
Configuration config = new Configuration();
config.set("mapred.reduce.child.java.opts", "-Xmx512M");
JobConf conf1 = new JobConf(config, this.getClass());
The version of Hadoop is 1.0.3
What is the proper way of setting the configuration values programmatically?
AS #ThomasJungblut and #octo have pointed out, the procedure I mentioned in the question is the right way of doing it. The OOM exception still persists, so I would start a new thread instead of continuing here.