I have a requirement for a service. The requirement states that I must get up to N instances of a service. If no instances are available block/wait until one is released and then return the available instance. This is very similar to the pooled lifestyle.
My understanding of the Pooled Lifestyle is:
When first requested N objects will be created (where N is max pool size)
As requests for the object are received, the pooled lifestyle will initially return an object from the pool, until all the objects in the pool are "in use"
When all objects are "in use" additional objects (beyond the scope of max pool size) are created.
As objects are released, they are either destroyed (if there are more than the max pool size) or returned to the pool (if there are fewer than the max pool size).
This is similar to the behavior I want, however with a slight difference. Do not create objects beyond the max pool size, wait for the objects "in use" to be released and then return an available object.
Any ideas? Can this be done without blocking other container resolutions on a different thread?
You need to implement a IPoolFactory and IPool and register the factory in the container. Then your pool can do whatever you need, including that blocking.
Related
I’m working with the 0.27.0 version of context broker. I'm using the Cygnus generic enabler and I have established a MQTT agent that connects external devices to the context broker.
My major concern right now is how to prevent from data loss. I established the context broker and the Cygnus mongodb databases as replica sets, but that won't ensure that all data will be persisted into the databases. I have seen that Cygnus uses Apache flume. Looking at its configuration, the re-injection retries can be configured:
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = -1
¿It is a good idea to establish the retries value to -1? I have read about events re-injected in the channel forever.
¿What can be done to ensure that all the data will be persisted?
¿Is there any functionality into fiware ecosystem oriented to that purpose?
Regarding Cygnus, the TTL is for sure the way of controlling the persistence retries after an error. A retry means the data is reinjected in the internal channel communicating the source (which receives Orion notifications) and the sink (which persists the data in the final storage) for future persistence attempts.
Possible values for this TTL are:
TTL = 0: there are no retries, i.e. if the first time a notified data cannot be persisted in the final storage (because of a network fail, a storage error, whatever) then the data is dropped.
TTL > 0: there are as much retries as configured TTL. Once exhausted the TTL the data is dropped.
TTL = -1: infinite retries, i.e. the data is reinjected in the channel forever until it is persisted or the channel gets full.
As commented, a -1 TTL may consume the channel capacity if the final storage never gets OK, avoiding new received data is put into the channel. Nevertheless, if the final storage never gets OK, such a drawback does not matter, right? :)
Thus, we could say the rules for choosing a TTL are:
If you don't want retries, simply configure 0.
If you want retries but you don't mind to loose data afeter certain number of retries, then configure a positive value.
If you want retries but you don't want to loose data, then configure -1 and a large channel capacity since the final storage may be down for an unknown time.
In any case, the TTL feature is changing during this sprint. The behaviour will be the same, but instead of being applied to single events, it will applied to batches of events (batches may be about 1 single event, of course). You'll see this change in the next release of Cygnus (0.13.0), and it will be available at the end of February 2016 (at the moment of writing this, the next week :)). My recommendation is to wait for such a release if you want to instensively use the TTL feature.
I am querying elasticsearch for multiple parallel requests using single transport client instance in my application.
I got the below exception for the parallel execution. How to overcome the issue.
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23#5f804c60
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at org.elasticsearch.search.action.SearchServiceTransportAction.execute(SearchServiceTransportAction.java:509)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteScan(SearchServiceTransportAction.java:441)
at org.elasticsearch.action.search.type.TransportSearchScanAction$AsyncAction.sendExecuteFirstPhase(TransportSearchScanAction.java:68)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:171)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:153)
at org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:52)
at org.elasticsearch.action.search.type.TransportSearchScanAction.doExecute(TransportSearchScanAction.java:42)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:107)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:124)
at org.elasticsearch.action.search.TransportSearchAction$TransportHandler.messageReceived(TransportSearchAction.java:113)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:212)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:109)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Elasticsearch has a thread pool and a queue for search per node.
A thread pool will have N number of workers ready to handle the requests. When a request comes and if a worker is free , this is handled by the worker. Now by default the number of workers is equal to the number of cores on that CPU.
When the workers are full and there are more search requests, the request will go to queue. The size of queue is also limited. If by default size is, say, 100 and if there happens more parallel requests than this, then those requests would be rejected as you can see in the error log.
Solutions:
The immediate solution for this would be to increase the size of
the search queue. We can also increase the size of threadpool,
but then that might badly affect the performance of individual
queries. So, increasing the queue might be a good idea. But then
remember that this queue is memory residential and increasing the
queue size too much can result in Out Of Memory issues. (more
info)
Increase number of nodes and replicas - Remember each node has its
own search threadpool/queue. Also, search can happen on primary
shard OR replica.
Maybe it sounds strange, but you need to lower the parallel searches count. With that exception, Elasticsearch tells you that you are overloading it. There are some limits (at thread count level) that are set in Elasticsearch and, most of the times, the defaults for these limits are the best option. So, if you are testing your cluster to see how much load it can hold, this would be an indicator that some limits have been reached.
Alternatively, if you really want to change the default you can try increasing the queue size for searches to accommodate the concurrency demands, but keep in mind that the larger the queue size, the more pressure you put on your cluster that, in the end, will cause instability.
I saw this same error because I was sending lots of indexing requests in to ES in parallel. Since I'm writing a data migration, it was easy enough to make them serial, and that resolved the issue.
I don't know what was your node configuration but your queue size (1000) is already on a higher side. As others have explained already, your search requests are queued in the Elasticsearch thread pool queue. Even after such a high queue size, if you are getting rejections, that gives some hint that you need to revisit your query pattern.
Like many other designs, even in this case, there is no one-size-fits-all solution. I found this is a very good post about how this queue works and different ways to do a performance test to find out what suits best for your use case.
HTH!
Based on what I know, when threads of a warp access the same address in global memory, requests get serialized so it's better to use constant memory. Does serializing of simultaneous global memory accesses happen when GPU is equipped with L1 and L2 cache levels (in Fermi and Kepler architecture)? In other words, when threads of a warp access the same global memory address, do 31 threads of a warp benefit from cache existence because 1 thread has already requested that address? What happens when the access is a read and also when access is a write?
Simultaneous global accesses to the same address by threads in the same warp in Fermi and Kepler do not get serialized. The warp read has a broadcast mechanism which satisfies all such reads from a single cacheline read with no performance impact. The performance is the same as if it were a fully coalesced read. This is true regardless of cache specifics, for example it is true even if L1 caching is disabled.
The performance of simultaneous writes is not specified (AFAIK) but behaviorally, simultaneous writes always get serialized, and the order is undefined.
EDIT responding to additional questions below:
Even if all threads in the warp write the same value into the same address, does it get serialized? Isn't there a write broadcast mechanism that recognizes such situation?
There is not a write broadcast mechanism that looks at all the simultaneous writes to see if they are all the same, and then take some action based on that. The correct answer is that the writes happen in unspecified order, and the performance characteristics are undefined. Obviously, if all the values being written are the same, you can be assured that the value that ends up in the location will be that value. But if you're asking whether the write activity is collapsed to a single cycle or requires multiple cycles to complete, that actual behavior is undefined (undocumented) and in fact may vary from one architecture to the next (for example, cc1.x may serialize in such all way that all the writes are performed, whereas cc2.x may "serialize" in such a way that one write "wins" and all the others are discarded, not consuming actual cycles.) Again, the performance is undocumented/unspecified, but the program-observable behavior is defined.
2 With this broadcast mechanism you explained, the only difference between constant memory broadcast access and global memory broadcast access is that the first one may route the access all the way to the global memory but the latter has a dedicated hardware and is faster, right?
__constant__ memory uses the constant cache, which is a dedicated piece of hardware that is available on a per-SM basis, and caches a particular section of global memory in a read-only fashion. This HW cache is physically and logically separate from L1 cache (if it exists and is enabled) and L2 cache. For Fermi and beyond, both mechanisms support broadcast on read, and for constant cache, this is the preferred access pattern, because the constant cache can only service one read access per cycle (i.e. does not support a whole cacheline read by a warp.) Either mechanism may "hit" in the cache (if present) or "miss" and trigger a global read. On the first read of a given location (or cacheline), niether cache will have the requested data, and it will therefore "miss" and trigger a global memory read, to service the access. Thereafter, in either case, subsequent reads will be serviced out of the cache, assuming the relevant data is not evicted in the interim. For early cc1.x devices, the constant memory cache was pretty valuable since those early devices did not have a L1 cache. For Fermi and beyond the principal reason to use the constant cache would be if identifiable data(i.e. read-only) and access patterns (same address per warp) are available, then using the constant cache will prevent those reads from travelling through L1 and possibly evicting other data. In effect you are increasing the cacheable footprint somewhat, over just what the L1 can support alone.
Hi I did get how the Counting Semaphore works? Please help me in understanding.
As per my understanding if we set count as 3, then process can use 3 threads to access the resource. so, here just 3 threads have access on the resource. When 1 thread leaves the other waiting thread comes in. If my understanding is correct, these 3 thread can corrupt shared data too. Then what is use of it?
Your observations are correct; typically a resource either needs to be restricted to one thread (e.g. it is being written to), or is safe to use with an unlimited number of threads (e.g. it is read-only). Restricting a resource to be used by say 5 threads is rarely useful.
Thus a counting semaphore with count N is most often used to restrict access to a pool of N resources...when the count reaches zero the next thread has to wait to obtain a resource from the pool.
However, I don't commonly find this useful in practice because simply controlling the number of threads accessing a pool of resources isn't sufficient, you need to manage the resources themselves as well. So I typically end up with a blocking queue containing the managed resources that threads can take from. When a thread is done with a resource, it returns that resource (e.g. an object) to the queue so that a waiting thread can take it.
The queue might internally use a semaphore to control access to the internal buffer, but that is usually encapsulated from the user of the queue.
See also
Wikipedia: Semaphore - Important Observations
What does "object affine" mean? For instance, there are object affine thread pools. While I understand both thread pools and affine transformations in math, I can't think of an association between them.
Are you thinking about Affinity? If so I suggest that what it means with respect to threads, is that the certain threads will be linked to a cetain set of resources like for exmample a cpu or a core perhaps, and not be swtched across to another set of resources. This can allow for certain low level optimisations such as maximising L1 cache hits