Kafka Consumer - How to set fetch.max.bytes higher than the default 50mb? - configuration

I want my consumers to process large batches, so I aim to have the consumer listener "awake", say, on 1800mb of data or every 5min, whichever comes first.
Mine is a kafka-springboot application, the topic has 28 partitions, and this is the configuration I explicitly change:
Parameter
Value I set
Default Value
Why I set it this way
fetch.max.bytes
1801mb
50mb
fetch.min.bytes+1mb
fetch.min.bytes
1800mb
1b
desired batch size
fetch.max.wait.ms
5min
500ms
desired cadence
max.partition.fetch.bytes
1801mb
1mb
unbalanced partitions
request.timeout.ms
5min+1sec
30sec
fetch.max.wait.ms + 1sec
max.poll.records
10000
500
1500 found too low
max.poll.interval.ms
5min+1sec
5min
fetch.max.wait.ms + 1sec
Nevertheless, I produce ~2gb of data to the topic, and I see the consumer-listener (a Batch Listener) is called many times per second -- way more than desired rate.
I logged the serialized-size of the ConsumerRecords<?,?> argument, and found that it is never more than 55mb.
This hints that I was not able to set fetch.max.bytes above the default 50mb.
Any idea how I can troubleshoot this?
Edit:
I found this question: Kafka MSK - a configuration of high fetch.max.wait.ms and fetch.min.bytes is behaving unexpectedly
Is it really impossible as stated?

Finally found the cause.
There is a broker fetch.max.bytes setting, and it defaults to 55mb. I only changed the consumer preferences, unaware of the broker-side limit.
see also
The kafka KIP and the actual commit.

Related

Why paxos in mysql group replication jump prepare phase?

I see such code segment in proposer_task(xcom_base.c)
if(threephase || ep->p->force_delivery){
push_msg_3p(ep->site, ep->p, ep->prepare_msg, ep->msgno, normal);
}else{
push_msg_2p(ep->site, ep->p);
}
the threepahse is int const threephase = 0 and force_delivery == 0 here
push_msg_eq is normal paxos include prepare, accept and learn phase
but push_msg_2p will skip prepare phase and directly send accept request
I want to know why, Thanks a lot.
If you look at the paper Paxos Made Simple page 10 paragraph 3 says:
A newly chosen leader executes phase 1 for infinitely many instances
of the consensus algorithm [...]
Then paragraph 4:
Since failure of the leader and election of a new one should be rare
events, the effective cost of executing a state machine command—that
is, of achieving consensus on the command/value—is the cost of
executing only phase 2 of the consensus algorithm. It can be shown
that phase 2 of the Paxos consensus algorithm has the minimum possible
cost of any algorithm for reaching agreement in the presence of faults.
Hence, the Paxos algorithm is essentially optimal.
This is saying that a leader only issues a prepare during a leader failover. After that it streams accept messages. It then has "optimal messaging" in that the leader only needs one round trip to know a value is chosen (the accept message and its acknowledgment).
In a three node cluster, a leader self-accepts instantaneously, then only needs one accept acknowledgment from a second node to have a majority. It then knows the value is chosen without having to await the response from the 3rd node (which could be down). That is as efficient as you can get. The value is known to be accepted at a second node with strong consistency.
Given that is how paxos works to get maximum efficiency we should expect that mysql xcom has a mode that skips the prepare message phase in steady state.
You can read more about the Paxos Made Simple techniques on my blog here.
You might be interested to know about the latest developments of Paxos where you don't need a majority response for accept messages in the cluster using FPaxos and tricks like the even nodes optimization.

JMeter: Capturing Throughput in Command Line Interface Mode

In Jmeter v2.13, is there a way to capture Throughput via non-GUI/Command Line mode?
I have the jmeter.properties file configured to output via the Summariser and I'm also outputting another [more detailed] .csv results file.
call ..\..\binaries\apache-jmeter-2.13\bin\jmeter -n -t "API Performance.jmx" -l "performanceDetailedResults.csv"
The performanceDetailedResults.csv file provides:
timeStamp
elapsed time
responseCode
responseMessage
threadName
success
failureMessage
bytes sent
grpThreads
allThreads
Latency
However, no amount of tweaking the .properties file or the test itself seems to provide Throuput results like I get via the GUI's Summary Report's Save Table Data button.
All articles, postings, and blogs seem to indicate it wasn't possible without manual manipulation in a spreadsheet. But I'm hoping someone out there has figured out a way to do this with no, or minimal, manual manipulation as the client doesn't want to have to manually calculate the Throughput value each time.
It is calculated by JMeter Listeners so it isn't something you can enable via properties files. Same applies to other metrics which are calculated like:
Average response time
50, 90, 95, and 99 percentiles
Standard Deviation
Basically throughput is calculated as simple as dividing total number of requests by elapsed time.
Throughput is calculated as requests/unit of time. The time is calculated from the start of the first sample to the end of the last sample. This includes any intervals between samples, as it is supposed to represent the load on the server.
The formula is: Throughput = (number of requests) / (total time)
Hopefully it won't be too hard for you.
References:
Glossary #1
Glossary #2
Did you take a look at JMeter-Plugins?
This tool can generate aggregate report through commandline.

Is there any constant interval in Nservicebus' automatic retries

I need the figure out how to manage my retries in Nservicebus.
If there is any exception in my flow, It should retry 10 times every 10 seconds. But when I search in Nservicebus' website (http://docs.particular.net/nservicebus/errors/automatic-retries), there are 2 different retry mechanisms which are First Level Retry(FLR) and Second Level Retry (SLR).
FLR is for transient errors. When you got exception, It will try instantly according to your MaxRetries parameter. This parameter should be 1 for me.
SLR is for errors that persist after FLR, where a small delay is needed between retries. There is a config parameter called "TimeIncrease" defines a delay time between tries. However, Nservicebus do these retries increasingly delay time. When you set this parameter to 10 second. It will try 10.seconds, 30.seconds, 60.seconds and so on.
What do you suggest to me to provide my first request to try every 10 seconds with or without these mechanisms?
I found my answer;
The reply of Particular Software's community(John Simon), You need to apply a custom retry policy, have a look at http://docs.particular.net/nservicebus/errors/automatic-retries#second-level-retries-custom-retry-policy-simple-policy for an example.

Kafka SparkStreaming configuration specify offsets/message list size

I am fairly new to both Kafka and Spark and trying to write a job (either Streaming or batch). I would like to read from Kafka a predefined number of messages (say x), process the collection through workers and then only start working on the next set of x messages. Basically each message in Kafka is 10 KB and I want to put 2 GB worth of messages in a single S3 file.
So is there any way of specifying the number of messages that the receiver fetches?
I have read that I can specify 'from offset' while creating DStream, but this use case is somewhat different. I need to be able to specify both 'from offset' and 'to offset'.
There's no way to set ending offset as the initial parameter (as you can for starting offset), but
you can use createDirectStream (the fourth overloaded version in the listing) which gives you the ability to get the offsets of the current micro batch using HasOffsetRanges (which gives you back OffsetRange).
That means that you'll have to compare values that you get from OffsetRange with your ending offset in every micro batch in order to see where you are and when to stop consuming from Kafka.
I guess you also need to think about the fact that each partition has its sequential offset. I assume it would be easiest if you could go a bit over 2GB, as much as it takes to finish the current micro-batch (could be couple of kB, depending on density of your messages), in order to avoid splitting the last batch on consumed and unconsumed part, which may require you to fiddle with offsets that Spark keeps in order to track what's consumed and what isn't.
Hope this helps.

Ability to limit maximum reducers for a hadoop hive mapred job?

I've tried prepending my query with:
set mapred.running.reduce.limit = 25;
And
set hive.exec.reducers.max = 35;
The last one jailed a job with 530 reducers down to 35... which makes me think it was going to try and shoe horn 530 reducers worth of work into 35.
Now giving
set mapred.tasktracker.reduce.tasks.maximum = 3;
a try to see if that number is some sort of max per node ( previously was 7 on a cluster with 70 potential reducer's ).
Update:
set mapred.tasktracker.reduce.tasks.maximum = 3;
Had no effect, was worth a try though.
Not exactly a solution to the question, but potentially a good compromise.
set hive.exec.reducers.max = 45;
For a super query of doom that has 400+ reducers, this jails the most expensive hive task down to 35 reducers total. My cluster currently only has 10 nodes, each node supporting 7 reducers...so in reality only 70 reducers can run as one time. By jailing the job down to less then 70, I've noticed a slight improvement in speed without any visible changes to the final product. Testing this in production to figure out what exactly is going on here. In the interim it's a good compromise solution.