RabbitMQ: routing on multiple criteria but hitting only one consumer - message-queue

How can we distribute work via RabbitMQ such that worker pools can subscribe to work messages based on differing (but frequently overlapping) criteria from each other but such that when a message is routed that matches to multiple worker pools, only one worker will pick up the job?
Simplified example:
We have host1 and host2.
Host1 handles jobs of classA and classB; host2 handles jobs of classB and classC.
If we route a job of
classA, only host1 will pick it up; if we route a job of classB,
either host1 or host2 will pick it up (based on their current load /
first available) but never both.
It would seem that we need to use a topic exchange, as our routing criteria is complex and using wildcards gives us the type of flexible matching we want.
However:
If we use the same name for the worker pool queue (say “worker-jobs”) we get the desired work splitting out to arbitrary matching workers, but every worker subscribing to the named queue seems to infect the other workers with each other’s routing criteria as they bind it. I.e. the binding of the routing key seems to be at the central queue name level not on a connection-to-queue basis.
If we use different queue names for each worker pool connection (say “poolA-jobs” and “poolB-jobs”) to the same exchange then we get the desired behavior with the different routing criteria maintained between pools but a job coming in that can match to both poolA and poolB gets routed to both of them (albeit only to one worker in each).
Notes:
I’ve spared you the details of why but suffice to say we have an existing multi-petabyte distributed search application that needs response times < 50ms. We already achieve this with our own custom routing hub but we’d like to replace this with RabbitMQ as its performance is attractive (as is retiring homemade code that overlaps with general purpose community projects) if we can get the sophisticated routing we need.
We use Python
Disco isn’t viable for many reasons, too numerous to go into.
It doesn’t have to be RabbitMQ but the performance needs to be as good. ØMQ looks very interesting and like it might provide both the flexibility and the performance but we’re already using RabbitMQ and after wading through the first half of the colorfully written ØMQ guide I’m still not sure if it will support the routing we need but it does look like we’ll have to pretty much write a broker to do it.
We actually have the luxury of knowing which hosts are capable of serving which jobs, so we can do something like have host1 subscribe to #.host1.# and host2 to #.host2.#. Then when we route a classB message, we can give it a key of host1.host2 to indicate which backends are acceptable for service. This simplifies the routing rules but still doesn’t overcome the problem described.

Related

ZeroMQ Messaging Queues

I am working for a while with ZeroMQ. I read already some whitepapers and a lot from the guide, but one question remained open to me:
Lets say we use PUB-SUB.
Where or on which system are the messaging queues? On the publisher system side, on the subscriber system side or something in between?
Sorry for the maybe stupid question.
This picture is from the Broker vs. Brokerless whitepaper.
Every zeromq socket has both send and recv queues (The limits are set via high water mark).
In the case of a brokerless PUB/SUB the messages could be queued on the sender or the receiver.
Example:
If there is network congestion the sender may queue on its send queue
If the receiver is consuming messages slower than they arrive they will be queued on the receiver recv queue
If the queues reach the high water mark the messages will be lost.
Q : Where or on which system are the messaging queues?
The concept of the Zen-of-Zero, as built-into ZeroMQ uses smart and right enough approches, on a case by case principle.
There cases, where there are no-Queues, as you know them, at all :
for example, the inproc:// transport-class has no Queue, as one may know 'em, as there are just memory-regions, across which the messages are put and read-from. The whole magic thus appears inside the special component - the Context()-instance. There the message-memory-mapping takes place. The inproc:// case is a special case, where no I/O-thread(s) are needed at all, as the whole process is purely memory-mapping based and the Context()-instance manipulates its internal states, so as to emulate both an externally provided abstraction of the Queue-alike behaviour and also the internal "Queue"-management.
Others, similarly, operate localhost-located internal Queue(s) endpoints :
for obvious reason, as the ZeroMQ is a broker-less system, there is no "central"-place and all the functionality is spread across the participating ( coordinated and cooperating ) nodes.
Either one of the Queue-ends reside inside the Context()-instances, one in the one-side's localhost-itself, the other in the "remote"-host located Context()-instance, as was declared and coordinated during building of the ZMTP/RFC-specified socket-archetype .bind()/.connect() construction. Since a positive acknowledgement was made for such a ZMTP-specific-socket, the Queue-abstraction ( implemented in a distributed-system-manner ) holds till such socket does not get .close()-ed or the Context()-instance did not get forcefully or by a chance .term()-ed.
For these reasons, the proper capacity sizings are needed, as the messages may reside inside the Context()-operated memory before it gets transported across the { network | ipc-channel }-mediated connection ( on the .send()-side ) or before being .recv()-ed ( from inside the remote Context()-instance ) by the remote application-code for any further use of the message-payload data ( a Zero-copy message-data management is possible, yet not all use-cases indeed use this mode for avoiding replicated memory-allocations and data-transfer costs ).

Multiple instances of Google API Client?

I have activity A that instantiates GoogleApiClient, connects and starts processing in AsyncTask that may take seconds or minutes.
Meanwhile, user triggers activity B that instantiates it's own GoogleApiClient with a connection.
The question is: Can an app have multiple instances of GoogleApiClient connected and working simultaneously, or should I keep an app singleton with my own semaphores?
It's perfectly fine to keep as many GoogleApiClients as you want around, and there are often good reasons for doing so (separation of fragments, different accounts, etc.). It's also not particularly inefficient. The cost of two clients is less than 1% higher than the cost of one client.
It can be confusing if all of them are trying to resolve errors, so it's probably a good idea to make the Fragment clients all ignore connection failures, and have an Activity or Application level client responsible for resolving issues.
Its possible to have multiple connected GoogleApiClients, just possibly inefficient. You do need to be careful using GoogleApiClient with AsyncTasks that it isn't disconnected if the activity goes away.
Consider managing the GoogleApliClient within a retained fragment. See http://www.androiddesignpatterns.com/2013/04/retaining-objects-across-config-changes.html
The issue is resolving by very common OOP Composition knowledge and Factory design pattern. Saying something about 1%, like #Hounshell below is not engineering approach.

How to force a multihopping topology with xbee zb?

I use some xbee (s2) modules with zb stack for mesh networking evaluation. Therefore a multi hopping environment has to be created. The problem is, that the firmware handles the association for themselves and there is no way deeper into the stack as the api provides. To force the path of the data, without to disturb the routing mechanism, I have tried to measure, I had to put them outside their reach. To get only the next hop in association isn't that easy. I used the least power level of the output, but the distance for the test setup is to large and the rf characteristics of the environment change undetermined.
Therefore my question, has anyone experience with this issue?
Regards, Toby
I don't think it's possible through software and coordinator/routers. You could change the Node Join Time (ATNJ) to force a new router to join through a particular router (disable Node Join on all nodes except one), but that would only affect joining. Once joined to the network, the router will discover that other nodes are within range.
You could possibly do it with sleepy end devices. You can use the ATNJ trick to force an end device to join through a single router, and it will always send its messages to that router. But you won't get that many hops -- end device sends to its parent router, which sends to the target's parent router, which sends to the target end device.
You'll likely need to physically limit the range of the radios to force hopping, as demonstrated in the video you linked of Digi's K-Node test equipment with a network of over 1000 radios. They're putting the radios in RF-shielded boxes and using wired antenna connections with software-controlled attenuators to connect the modules to each other.
If you have XBee modules with the U.fl or RPSMA connector, and don't connect an antenna, it should significantly reduce the range of the module. Otherwise, with a wire whip or integrated PCB antenna, you need to put each radio in some sort of box that attenuates the signal. Perhaps someone else can offer advice on materials that will reduce the signal's range without completely blocking it.
ZigBee nodes try to automatically form an Ad-Hoc network. That is why they join the network with the strongest connection (best network coverage) available on that moment. These modules are designed in such a way, that you do not have to care much about establishing a reliable communication. They will solve networking problems most of the time.
What you want to do, is somehow force a different situation. You want to create a specific topology, in order to get some multi-hopping. That will not be the normal behavior of the nods. But you can still get what you want with some of the AT Commands.
The mentioned command "NJ" should work for you. This command locks joins after a certain time (in seconds). Let us think of a simple ZigBee network with three nodes: one Coordinator, one Router and one End-Device. Switch on the Coordinator with "NJ" set to, let us say, two minutes. Then quickly switch on the Router, so it can associate with the Coordinator within these two minutes. After these two minutes, the Coordinator will be locked and will not accept more joins. At that moment you can start the End-Device, which will have to associate with the Router necessarily. This way, you will see that messages between End-Device and Coordinator go through the Router, as you wanted.
You may get a bigger network applying this idea several times, without needing to play with the module's antennas. You can control the AT Parameters remotely (i.e. from a Computer connected to the Coordinator), so you can use some code to help you initialize the network.

Message queuing solution for millions of topics

I'm thinking about system that will notify multiple consumers about events happening to a population of objects. Every subscriber should be able to subscribe to events happening to zero or more of the objects, multiple subscribers should be able to receive information about events happening to a single object.
I think that some message queuing system will be appropriate in this case but I'm not sure how to handle the fact that I'll have millions of the objects - using separate topic for every of the objects does not sound good [or is it just fine?].
Can you please suggest approach I should should take and maybe even some open source message queuing system that would be reasonable?
Few more details:
there will be thousands of subscribers [meaning not plenty of them],
subscribers will subscribe to tens or hundreds of objects each,
there will be ~5-20 million of the objects,
events themselves dont have to carry any message. just information that that object was changed is enough,
vast majority of objects will never be subscribed to,
events occur at the maximum rate of few hundreds per second,
ideally the server should run under linux, be able to integrate with the rest of the ecosystem via http long-poll [using node js? continuations under jetty?].
Thanks in advance for your feedback and sorry for somewhat vague question!
I can highly recommend RabbitMQ. I have used it in a couple of projects before and from my experience, I think it is very reliable and offers a wide range of configuraions. Basically, RabbitMQ is an open-source ( Mozilla Public License (MPL) ) message broker that implements the Advanced Message Queuing Protocol (AMQP) standard.
As documented on the RabbitMQ web-site:
RabbitMQ can potentially run on any platform that Erlang supports, from embedded systems to multi-core clusters and cloud-based servers.
... meaning that an operating system like Linux is supported.
There is a library for node.js here: https://github.com/squaremo/rabbit.js
It comes with an HTTP based API for management and monitoring of the RabbitMQ server - including a command-line tool and a browser-based user-interface as well - see: http://www.rabbitmq.com/management.html.
In the projects I have been working with, I have communicated with RabbitMQ using C# and two different wrappers, EasyNetQ and Burrow.NET. Both are excellent wrappers for RabbitMQ but I ended up being most fan of Burrow.NET as it is easier and more obvious to work with ( doesn't do a lot of magic under the hood ) and provides good flexibility to inject loggers, serializers, etc.
I have never worked with the amount of amount of objects that you are going to work with - I have worked with thousands ( not millions ). However, no matter how many objects I have been playing around with, RabbitMQ has always worked really stable and has never been the source to errors in the system.
So to sum up - RabbitMQ is simple to use and setup, supports AMQP, can be managed via HTTP and what I like the most - it's rock solid.
Break up the topics to carry specific events for e.g. "Object updated topic" "Object deleted"...So clients need to only have to subscribe to the "finite no:" of event based topics they are interested in.
Inject headers into your messages when you publish them and put intelligence into the clients to use these headers as message selectors. For eg, client knows the list of objects he is interested in - and say you identify the object by an "id" - the id can be the header, and the client will use the "id header" to determine if he is interested in the message.
Depending on whether you want, you may also want to consider ensuring guaranteed delivery to make sure that the client will receive the message even if it goes off-line and comes back later.
The options that I would recommend top of the head are ActiveMQ, RabbitMQ and Redis PUB SUB ( Havent really worked on redis pub-sub, please use your due diligance)
Finally here are some performance benchmarks for RabbitMQ and Redis
Just saw that you only have few 100 messages getting pushed out / sec, this is not a big deal for activemq, I have been using Amq on a system that processes 240 messages per second , and it just works fine. I use a thread pool of workers to asynchronously process the messages though . Look at a framework like akka if you are in the java land, if not stick with nodejs and the cool Eco system around it.
If it has to be open source i'd go for ActiveMQ, and an application server to provide the JMS functionality for topics and it has Ajax Support so you can access them from your client
So, you would use the JMS infrastructure to publish the topics for the objects, and you can create topis as you need them
Besides, by using an java application server you may be able to take advantages from clustering, load balancing and other high availability features (obviously based on the selected product)
Hope that helps!!!
Since your messages are very small might want to consider MQTT, which is designed for small devices, although it works fine on powerful devices as well. Key consideration is the low overhead - basically a 2 byte header for a small message. You probably can't use any simple or open source MQTT server, due to your volume. You probably need a heavy duty dedicated appliance like a MessageSight to handle your volume.
Some more details on your application would certainly help. Also you don't mention security at all. I assume you must have some needs in this area.
Though not sure about your work environment but here are my bits. Can you identify each object with unique ID in your system. If so, you can have a topic per each event type. for e.g. you want to track object deletion event, object updation event and so on. So you can have topic for each event type. These topics would be published with Ids of object whenever corresponding event happened to the object. This will limit the no of topics you needed.
Second part of your problem is different subscribers want to subscribe to different objects. So not all subscribers are interested in knowing events of all objects. This problem statement scoped to message selector(filtering) mechanism provided by messaging framework. So basically you need to seek on what basis a subscriber interested in particular object. Have that basis as a message filtering mechanism. It could be anything: object type, object state etc. So ultimately your system would consists of one topic for each event type with someone publishing event messages : {object-type:object-id} information. Subscribers could subscribe to any topic and with an filtering criteria.
If above solution satisfy, you can use any messaging solution: activeMQ, WMQ, RabbitMQ.

Does RabbitMq do round-robin from the exchange to the queues

I am currently evaluating message queue systems and RabbitMq seems like a good candidate, so I'm digging a little more into it.
To give a little context I'm looking to have something like one exchange load balancing the message publishing to multiple queues. I don't want to replicate the messages, so a fanout exchange is not an option.
Also the reason I'm thinking of having multiple queues vs one queue handling the round-robin w/ the consumers, is that I don't want our single point of failure to be at the queue level.
Sounds like I could add some logic on the publisher side to simulate that behavior by editing the routing key and having the appropriate bindings in place. But that's kind of a passive approach that wouldn't take the pace of the message consumption on each queue into account, potentially leading to fill up one queue if the consumer applications for that queue are dead.
I was looking for a more pro-active way from the exchange entity side, that would decide where to send the next message based on each queue size or something of that nature.
I read about Alice and the available RESTful APIs but that seems kind of a heavy duty solution to implement fast routing decisions.
Anyone knows if round-robin between the exchange the queues is feasible w/ RabbitMQ then? Thanks.
Exchanges are generally stateless in the AMQP model, though there have been some recent experiments in stateful exchanges now that there's both a system for managing RabbitMQ plugins and for providing new experimental exchange types.
There's nothing that does quite what you want, I don't think, though I'm not completely sure I understand the requirement. Aside from the single-point-of-failure point, would having a single queue with workers reading from it solve your problem? If so, then your problem reduces to configuring RabbitMQ in an HA configuration that permits you to use that solution. There are a couple of approaches to doing that: either use HALinux and a shared store to get active/passive HA with quick failover, or set up more than one parallel broker and deduplicate on the client, perhaps using redis or similar to do so.
I suggest asking your question again on the rabbitmq-discuss mailing list, where more people will be able to offer suggestions, and where the discussion can be archived for posterity.
Agree with Tony on the approach.
Here is a 'mashup' of RabbitMQ, Redis that you could use instead of rolling your own -
http://xing.github.com/beetle/
One built in way you can do a form of sharing a form exchange to queues, but not exactly round robin, is Consistent Hashing. rabbitmq_consistent_hash_exchange
How too
https://medium.com/#eranda/rabbitmq-x-consistent-hashing-with-wso2-esb-27479b8d1d21
Paper to explain, it puts queues at a weighted distribution on a circle and then by sending random routing key it will send to the closest queue.
http://www8.org/w8-papers/2a-webserver/caching/paper2.html