What does the timeout in nerve signify? - configuration

I'm trying to discover few services using nerve.
While I came across the timeout configuration specified in nerve docs.
timeout: (optional) maximum time the check can take; defaults to 100ms
However when I look at the examples provided, the timeout is mentioned as "0.2".
Does this mean the timeout for these examples are "0.2ms"? Is that even a valid configuration for timeout?
Or is 0.2 considered as 2 sec?

I went through the code for nerve and looks like the timeout configuration provided in nerve json would just be read this value and directly pass it to http client as read_timeout without any additional processing.
As per Ruby documentation, this value is in seconds.
So 0.2 means 200ms.
I'm assuming Nerve Docs was not updated or has a mistake w.r.t documentation.
read_timeout[R]
Number of seconds to wait for one block to be read
(via one read(2) call). Any number may be used, including Floats for
fractional seconds. If the HTTP object cannot read data in this many
seconds, it raises a Net::ReadTimeout exception. The default value is
60 seconds.

Related

Server side microsecond timing

Is there some API available for microsecond accurate timing? Some jitter is acceptable. Something equivalent to performance.now() is preferable.
In my 5 minutes of research I found the console object which does log times accurately enough but there is no easy way to retrieve those logged entries. Additionally I may call this timing function thousands of times which would clutter logs.

how to tell NServiceBus is using MaximumConcurrencyLevel?

I'm trying to validate our company's code works when NServiceBus v4.3 is using the MaximumConcurrencyLevel value setup in the config.
The problem is, when I try to process 12k+ of queued entries, I cannot tell any difference in times between the five different max concur levels I change. I set it to 1 and I can process the queue in 8m, then I put it to 2 and I get 9m, seems interesting (I was expecting more, but it's still going in the right direction), but then I put 3, 4, 5 and the timings stay at around 8m. I was expecting a much better throughput.
My question is, how can I verify that NServiceBus is actually indeed using five threads to process entries on the queue?
PS I've tried setting the MaximumConcurrencyLevel="1" and the MaximumMessageThroughputPerSecond along with logging the Thread.CurrentThread.ManagedThreadId thinking\hoping I was ONLY going to see one ThreadID value, but I'm seeing quite a few of different ones, which surprised me. My plan was to see one, then bump the max concur level to 5 and hopefully see five different values.
What am I missing? Thank you in advance.
There can be multiple reasons why you don't see faster processing times when increasing the concurrency setting described on the official documentation page: http://docs.particular.net/nservicebus/operations/tuning
You mentioned you're using the MaximumMessageThroughputPerSecond which will negate any performance gains my parallel message processing if a low value has been configured. Try removing this setting if possible.
Maybe you're accessing a resource in your handlers which isn't supporting/optimized for parallel access.
NServiceBus internally schedules the processing logic on the threadpool. This means that even with a MaximumConcurrencyLevel of 1, you will most likely see a different thread processing each message since there is no thread affinity. But the configuration values work as expected, if your queue contains 5 messages:
it will process these messages one by one if you configured MaximumConcurrencyLevel to 1
it will process all messages in parallel if you configured MaximumConcurrencyLevel to 5.
Depending on your handlers it can of course happen that the first message is already processed at the time the fifth message is read from the queue.

EsRejectedExecutionException in elasticsearch for parallel search

I am querying elasticsearch for multiple parallel requests using single transport client instance in my application.
I got the below exception for the parallel execution. How to overcome the issue.
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23#5f804c60
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:441)
at org.elasticsearch.action.search.type.TransportSearchScanAction$AsyncAction.sendExecuteFirstPhase(TransportSearchScanAction.java:68)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
at org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:52)
at org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:42)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:107)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:124)
at org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:113)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:212)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Elasticsearch has a thread pool and a queue for search per node.
A thread pool will have N number of workers ready to handle the requests. When a request comes and if a worker is free , this is handled by the worker. Now by default the number of workers is equal to the number of cores on that CPU.
When the workers are full and there are more search requests, the request will go to queue. The size of queue is also limited. If by default size is, say, 100 and if there happens more parallel requests than this, then those requests would be rejected as you can see in the error log.
Solutions:
The immediate solution for this would be to increase the size of
the search queue. We can also increase the size of threadpool,
but then that might badly affect the performance of individual
queries. So, increasing the queue might be a good idea. But then
remember that this queue is memory residential and increasing the
queue size too much can result in Out Of Memory issues. (more
info)
Increase number of nodes and replicas - Remember each node has its
own search threadpool/queue. Also, search can happen on primary
shard OR replica.
Maybe it sounds strange, but you need to lower the parallel searches count. With that exception, Elasticsearch tells you that you are overloading it. There are some limits (at thread count level) that are set in Elasticsearch and, most of the times, the defaults for these limits are the best option. So, if you are testing your cluster to see how much load it can hold, this would be an indicator that some limits have been reached.
Alternatively, if you really want to change the default you can try increasing the queue size for searches to accommodate the concurrency demands, but keep in mind that the larger the queue size, the more pressure you put on your cluster that, in the end, will cause instability.
I saw this same error because I was sending lots of indexing requests in to ES in parallel. Since I'm writing a data migration, it was easy enough to make them serial, and that resolved the issue.
I don't know what was your node configuration but your queue size (1000) is already on a higher side. As others have explained already, your search requests are queued in the Elasticsearch thread pool queue. Even after such a high queue size, if you are getting rejections, that gives some hint that you need to revisit your query pattern.
Like many other designs, even in this case, there is no one-size-fits-all solution. I found this is a very good post about how this queue works and different ways to do a performance test to find out what suits best for your use case.
HTH!

How to simulate dynamic requests per minute in JMeter where numbers come from file for each minute

After successfully using JMeter to profile our platform's performance I got the request to simulate a 24h load based on minute-by-minute transaction data extracted from the last year's logs.
At this point, having the static nature of thread creation in jmeter I am wondering if this is easily achievable. I studied the usual plugins together with those at jmeter-plugins.org but still I could not find a straightforward way to do this kind of shaping.
I am looking at the alternative to write a groovy script that dynamically feeds a throughput shaping timer but I am not sure if this is the proper way to go.
Any suggestions?
UPDATE:
I tried the follwing combination (as also Alon and Dan suggested):
- One thread group with one looping thread and a 60 seconds delay timer; this thread reads every minute from csv the number of requests for the next minute and passes it to the next thread group (using a groovy script and global props)
- the second thread group has a fixed number of threads and a Constant Throughput Timer that is updated every minute by the first thread group.
It works partially but the limitation here is that the load/min is divided among all active threads, so part of the threads will still wait to be executed even if the load request changed in the meanwhile.
I think that in order to have a correct simulation there should be a way that all threads that were not executed within the minute be interrupted and started again.
So for a concrete example:
I have 100 requests in the first minute and 5000 in the second (it is real data with big variations)
In the first minute 300 threads have been started (this is my max nr of concurrent connections accepted), but, because they execute very fast they are going to be delayed for more than a minute in order to fulfill the calculated throughput,
so the 5000 requests for the next minute don't have a chance to be executed because lots of threads are still sleeping.
So I am looking for a way to interrupt sleeping threads when more throughput is needed. Probably from Groovy or by modifying some JMeter code.
Thanks,
Dikran
You should use JMeter's constant throughput timer for this. In combination with a CSV file that includes all of the values, it should work perfectly.
See these links:
http://jmeter.apache.org/usermanual/component_reference.html#Constant_Throughput_Timer
http://jmeter.apache.org/usermanual/component_reference.html#CSV_Data_Set_Config
Best,
Alon.
Use JSR 223 + Groovy for Scripting
You have a lot of options to do scripting with JMeter:
Beanshell
BSF and all supported languages Javascript, Scala , Groovy, Java ...
JSR223 and all supported languages Javascript, Scala , Groovy, Java
...
Although you can be lazy and choose the language you know, FORGET ABOUT IT.
Use the most efficient option, which is JSR223 + Groovy + Caching (supported since JMeter 2.8 in external script and in next upcoming JMeter 2.9 also supported with embedded scripts).
Using Groovy is as simple as adding
groovy-VERSION-all.jar in <JMETER_HOME>/lib folder.
But of course ensure your script is necessary and efficiently written, DON'T OVERSCRIPT
View more over here - http://blazemeter.com/blog/jmeter-performance-and-tuning-tips

Google Drive SDK - 500: Internal Server error: File uploads successfully most of the time

The Google Drive REST API sometimes returns a 500: Internal Server Error when attempting to upload a file. Most of these errors actually correspond to a successful upload. We retry the upload as per Google's recommendations only to see duplicates later on.
What is the recommended way of handing these errors?
Google's documentation seems to indicate that this is an internal error of theirs, and not a specific error that you can fix. They suggest using exponential backoff, which is basically re-attempting the function at increasing intervals.
For example, the function fails. Wait 2 seconds and try again. If that fails, wait 4 seconds. Then 8 seconds, 16, 32 etc. The bigger gaps mean that you're giving more and more time for the service to right itself. Though depending on your need you may want to cap the time eventually so that it waits a maximum of 10 minutes before stopping.
The retrying package has a very good set up for this. You can just from retrying import retry and then use retry as a decorator on any function that should be re-attempted. Here's an example of mine:
#retry(wait_exponential_multiplier=1000, wait_exponential_max=60*1000, stop_max_delay=10*60*1000)
def find_file(name, parent=''):
...
To use the decorator you just need to put #retry before the function declaration. You could just use retry() but there are optional parameters you can pass to adjust how the timing works. I use wait_exponential_multiplier to adjust the increase of waiting time between tries. wait_exponential_max is the maximum time it can spend waiting between attempts. And stop_max_delay is the time it will spend retrying before it raises the exception. All their values are in milliseconds.
Standard error handling is described here: https://developers.google.com/drive/handle-errors
However, 500 errors should never happen, so please add log information, and Google can look to debug this issue for you. Thanks.