Use of messaging like RabbitMQ in web application? - message-queue

I would like to learn what are the scenarios/usecases/ where messaging like RabbitMQ can help consumer web applications.
Are there any specific resources to learn from?
What web applications currently are making use of such messaging schemes and how?

In general, a message bus (such as RabbitMQ, but not limited to) allows for a reliable queue of job processing.
What this means to you in terms of a web application is the ability to scale your app as demand grows and to keep your UI quick and responsive.
Instead of forcing the user to wait while a job is processed they can request a job to be processed (for example, clicking a button on a web page to begin transcoding a video file on your server) which sends a message to your bus, let's the backend service pick it up when it's turn in the queue comes up, and maybe notify the user that work has/will begin. You can then return control to the UI, so the user can continue working with the application.
In this situation, your web interface does zero heavy lifting, instead just giving the user visibility into stages of the process as you see fit (for example, the job could incrementally update database records with the state of process which you can query and display to your user).
I would assume that any web application that experiences any kind of considerable traffic would have this type of infrastructure. While there are downsides (network glitches could potentially disrupt message delivery, more complex infrastructure, etc.) the advantages of scaling your backend become increasingly evident. If you're using cloud services this type of infrastructure makes it trivial to add additional message handlers to process your jobs by subscribing to the job queue and just picking off messages to process.

I just did a Google search and came up with the following:
Reddit.com
Digg.com
Poppen.De
That should get you started, at least.

Related

Which hook to limit the number of messages a user can send per day?

We want to use ejabberd in the context of a web application having fairly unique and business rules, we'd therefore need to have every chat message (not protocol message, but message a user sends to another one) go through our web application first, and then have the web application deliver the message to ejabberd on behalf of the user (if our business rules allow the message to be sent).
The web application is also the one providing the contact lists (called rosters if I understand correctly to ejabberd). We need to be and remain the single source of truth to ease maintenance.
To us, ejabberd value added would be to deliver chat messages in near real-time to clients, and enable cool things such as presence indicators. Web clients will maintain a direct connection to ejabberd through websocket, but this connection will have to be read-only as far as chat messages are concerned, and read-write as far as presence messages are concerned.
The situation is similar with regards to audio and video calls. While this time the call per see will directly be managed by ejabberd to take advantage of built-in STURN, TURN etc... and will not need to go through our web app, we have custom business logic to manage who is able to call who, when, how often etc... (so in order words, we have custom business logic to authorize the call or not and we'd like to keep all the business logic centralized in the web app).
My question is what would be the proper hooks we'd need to look into to achieve what we are after? I spent an hour or so in the documentation, but I couldn't find what I am after so hopefully someone can provide me pointers. In an ideal world, we'd like to expose API endpoints from our web app that ejabberd hooks can hit. However, the first question is: which relevant hooks is ejabberd offering and where are these hooks documented?
Any help would be greatly appreciated, thank you!
When a client sends a packet to ejabberd, it triggers the user_send_packet hook, providing the packet and the state of the client's session process. Several modules use that hook, for example mod_service_log.

Is Google's Webspeech server request-limiting me, and is there a fix?

I've been writing an extension that allows the user to issue voice commands to control their browser, and things were going great until I hit a catastrophic problem. It goes like this:
The speech recognition object is in continuous mode, and whenever the onerror: 'no-speech' or onend events fire, it restarts. This way, the extension is constantly waiting to accept input and reacts whenever a command is issued, even after 5 minutes of silence.
After a few days of of development, today I reached the point where I was testing it in practical use, and I found that after a little while (and with no change to anything on my part), my onend event started firing constantly. As in, looking at the console, I would see 18,000 requests being made in the space of three seconds, all being instantly denied, thus triggering onend and restarting the request.
I'm aware that it would be optimal to wait for sound before sending a request, or to have local speech recognition capabilities without the need for a remote server, but the present API does not allow that.
Are my suspicions correct? Am I getting request limited?
Are my suspicions correct? Am I getting request limited?
Yes
I'm aware that it would be optimal to wait for sound before sending a request, or to have local speech recognition capabilities without the need for a remote server, but the present API does not allow that.
To hide the IP source of your request you can use anonymizer networks like Tor, though it will not be fast.
It's naive to assume Google will spend resources to process all audio being recorded on your system. In your application development it is better to rely on API which provides at least some guarantees. It could be either commercial API or open source implementation like CMUSphinx.
With CMUSphinx, you can also properly implement command keyword detection and increase accuracy by specifying the grammar of the commands.
You could also use a Voice Activity Detection (VAD) algorithm to detect when a user is talking. This can be done by either setting a volume threshold or a frequency threshold (Human speech is usually less than 400hz for example). This way, you won't send useless requests to Google unless those conditions are meant. I would not recommend using Tor as this would significantly increase latency. CMUSphinx is probably the best local system option, but if still want to use a web-based service, I would recommend either using a Voice Activity Detection algorithm or finding a different web-based software.

Implementing a scalable multiroom chat system

I've been looking into sockjs-tornado recently and am working on a chat function for a social networking site. I'm trying to get a feel for common methods used in building scalable multiroom chat functionality. I'll outline a couple of the methods I've thought of and I'd like to get feedback. What methods are used in the real world? What are the advantages and disadvantages to these methods?
Prereqs:
running tornado
using sockjs-tornado lib
sockjs-client lib for js
Everything else is open.
Methods I've considered:
For loop
This seems like the simplest way to go. You create a user class that subscribes to certain room classes. The user sends a message class that contains a room id and the server redirects the message in the loop only to users that have subscribed to that room. This seems to me to be by far the worst because the complexity is obviously at least linear. (Imagine 500 users connected at once to 5 chat rooms each.)
Multi-tasking/multiple server instances
This also seems like a bad idea because you could have 500 server instances running at any time on... different ports? I'm really not sure on the implementation of this method.
Native support
Now granted, a lot of libraries have this built in such as socketio. However that's not an option due to the sole node.js support. (I'm on tornado server.) Socks in particular does not have built in support for multiple "rooms".
Conclusion
I'm looking for resources/case studies, and industry standards. Any help would be appreciated.
I would just use a message queue server like RabbitMQ with a fanout exchange as each "chat room".
You can see an example of using a fanout exchange in Python here.
The Pika AMQP library works with Tornado, too.
The advantage with using a message queueing system is that you can have users connected to different Tornado processes on different servers while still being in the same "room", giving you high availability on the HTTP layer.
RabbitMQ also has HA capabilities (although not the greatest).

In which domains are message oriented middleware like AMQP useful?

What problem do MOM (Message Oriented Middleware) solve? Scalability? Integration?
In which domain are they typically used and in which domains are they typically not used?
For example, say, is Google using such solution for it's main search engine or to power GMail?
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Does it make sense for desktop apps that need to communicate with a server?
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
I'm a bit confused as to what they're useful for and I think that with example of where they're appropriate and where they're not appropriate I could better understand their use.
This is a great question.
The main uses of messaging are: scaling, offloading work, integration, monitoring, event handling, routing, networking, push, mobility, buffering, queueing, task sharing, alerts, management, logging, batch, data delivery, pubsub, multicast, audit, scheduling, ... and more. Basically: anything where you need data but don't want to make a database request. (Caching is another, longer story).
Another way of looking at this is to notice that many applications used to be built by assuming that users (people) would perform actions that would be fulfilled by executing a transaction on a database (including reads, writes). But today, many actions are not user-initiated. Instead they are application-initiated. For example "tell me when the book that I want to buy is in stock". The best way to solve this class of problems is with messaging of some sort. Whether you call it middleware or web push or real time salad dressing does not matter. It's all messaging.
When you enable applications to initiate or react to events, then it is much easier to scale because your architecture can be based on loosely coupled components. It is also much easier to integrate those components if your messaging is based on a stable, scalable, serviceable tool, preferably using open standard APIs and protocols.
I hope this helps. We try to maintain a list of useful links about messaging here
Please get in touch with questions and comments on any of this, we are dead easy to find.
To address your specific questions:
In which domain are they typically used and in which domains are they typically not used?
Like databases, messaging systems crop up everywhere.
For example, say, is Google using such solution for it's main search engine or to power GMail?
Google uses a lot of home grown technology, but a lot of their open source contributions and known use cases suggest that messaging is (or should be) central to some of the main services.
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Very much so.
An example use case is scaling web page requests. When the user makes a web request, the web server puts it onto a queue for background processing. This means that the web server can keep working while the request is processed. It also means that the web server does not need to know how the request is handled, making system maintenance, upgrade and rollback much simpler because the main parts are 'decoupled'.
So, anyway, the web request gets processed by a back end service, or possibly by many services, eg 'look up book titles', 'draw shopping cart', 'get advertisement', 'check user account'... Finally all the results get put onto another queue, ready for collection and user response by the web server. Typically the system will include a timeout of around 100ms so that any late requests just get thrown away. The user sees anything that got processed in the time interval. This is one reason why some large ecommerce sites have pages that appear to load in stages.
There are many more use cases...
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Definitely. If you have an unknown, or unbounded, number of users, server side instances, and application latencies, then it makes sense to use messaging, even if just as a scalable substrate for non-blocking RPC.
Does it make sense for desktop apps that need to communicate with a server?
In lots of cases. One very common case is when the server pushes events to the desktop app, eg game event, tweets, price feeds in finance, system alerts....
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
Definitely not only for those 'legacy integration' cases but they are important too. At RabbitMQ, the biggest customers we have in terms of pure scale or message volume are cloud providers and big web application providers.
I will answer only one answer, from prior experience - take a look at this middle-ware that is employed by big companies here - middle-ware has one purpose - to glue dis-connected systems (written in disparate languages) together so that they can interact with one another and streamline the business process - Entera as I have had experience with, creates a middle layer in which the unix box using processes written in C, interact with the mainframe system (DB2, COBOL) via a front-end written in PowerBuilder (I am not naming the company!).
From the description I have given, Entera is a middle-ware which hosts a number of things - smooth integration of the flow of data regardless of the endian format, ability for different languages to talk to the middle-ware broker (a broker is a CORBA or DCE like process, that conforms to 'The Open Group) that listens on a particular port) and is specified by an IDL which makes a process appear to be local - if you understand the terminology used in Remoting under Microsoft's .NET Framework, you are not far off the mark! The middle-ware generates stubs which are linked at compile-time and manages the creation of the process, hosting it off a port, multi-threading at run-time, and also, the modern front-ends (such as .NET, Java, PowerBuilder even the unspeakable VB6...ok...VB.NET for the purists out there) can interact by opening a connection to the specified port on a particular IP address, and using the stubs generated, can interact with it directly.
Obviously, from what was described you can see how the legacy systems can have new life breathed into it and thus scalability of the process, the major downside of this is the cost factor which can run into thousdands of dollars. Big companies who uses mainframes as their back-end processing systems for billing/invoicing, who generate a huge revenue can obviously afford such an expensive product - to them it would seem like throwing pennies into a pool of water...because of the use of middle-ware which prolongs the business process, and breathe new life into it, can extend the business by a good number of years into the future without worrying about 'legacy' tag attached to it.
Incidentally, I carried this out as part of my thesis for my BSc. in Information Systems which covered this commercial front-end. There was an open source version of the middle-ware available on sourceforge called FreeDCE, but development efforts have declined or stopped.
Edit:
#cocotwo: That is exactly what middle-ware does as you said it is a plumbing tool...message oriented middle-ware is not really heard of AFAIK because I would imagine, the processes (functions) would need to be called as if they are locally visible within the application domain of the front-end to make it easy to interact with.
Using messages may have its advantages over RPC calls in that the messages are queued in a safe-holding area in the event that a network disconnection occurs - there may be some data caching going on within that aspect to allow the front-end to continue regardless...it would be useful in the instances of 'updating a status of a particular billing/invoice number' - a one-way write-data to the back-end via the middle-ware.
Ok, big companies would have advanced systems infrastructure in that technicians are constantly around the clock to ensure a smooth delivery of data-flow so that would have to be factored in. The company that I worked with had IBM Global Support contract to fulfill in order to ensure a maximum uptime 99% with 6 nine's after the decimal point...with hot-swapping/balanced-clusters/mirroring systems in place...
Whereas with RPC, if the disconnection occurs, the front-end would have to be restarted or would have to handle the disconnection event. It really depends if the message-queueing middle-ware handles each message in real-time and pass back results to the front-end immediately...
This is where each (Message-queueing and RPC related middle-ware) have their strengths and weaknesses...and also the cost mitigation factor such as support, maximum up-time, development efforts and training - that's a biggie here as middle-ware are really proprietary (despite following the 'The Open Group' layout/standards) and complex to setup and to glue the whole thing together via scripts.
Good answers and discussion here. Our consulting team has two preferred "messaging" solutions: RabittMQ and NXTera a high speed RPC middleware, the contemporary version of Entera mentioned above. My partners and I have developed several solutions using RabittMQ, it is the best tool available in that space right now. Additionally, I happen to work for the company that makes NXTera/Entera.
From experience I can clearly say that both of these products meet the need for reliability and low maintenance as discussed above. There are situations where a messaging service, like RabittMQ, is the right choice -- where Publish and subscribe, certified delivery, Queuing or store-and-forward are required.
In other cases, RPC's (remote procedure calls) are the best and fastest solutions for transactional and distributed processing for enterprise or cloud-based applications. When it is right to use an RPC, but SOAP/.NET (yes these are RPC implementations) are too slow, expensive or complex, a lightwieght high speed RPC middleware like NXTera/Entera is the right choice for us.
There is some use case overlap between RPC middleware and message oriented middleware, and where there are you can use either successfully. But both are strong and dependable choices.
The large companies I work with use both RPC and MoM side-by-side. As far as Internet companies, Google (Protocol Buffers) and Facebook (Thrift) show that RPC's have a roll to play in modern web and cloud-based development.

What's the best way to notify a non-web application about a change on a web page?

Let's say I have two applications which have to work together to a certain extent.
A web application (PHP, Ruby on Rails, ...)
A desktop application (Java, C++, ...)
The desktop application has to be notified from the web application and the delay between sending and receiving the notification must be short. (< 10 seconds)
What are possible ways to do this? I can think of polling in a 10 second interval, but that would produce much traffic if many desktop applications have to be notified. On a LAN I'd use an UDP broadcast, but unfortunately that's not possible here...
I appreciate any ideas you could give me.
I think the "best practice" here will depend on the number of desktop clients you expect to serve. If there's just one desktop to be notified, then polling may well be a fine approach -- yes, polling is much more overhead than an event-based notification, but it'll certainly be the easiest solution to implement.
If the overhead of polling is truly unacceptable, then I see two basic alternatives:
Keep a persistent connection open between the desktop and web-server (could be a "comet"-style web request, or a raw socket connection)
Expose a service from within the desktop app, and register the address of the service with the web-server. This way, the web-server can call out to the desktop as needed.
Be warned, though -- both alternatives are chock full of gotchas. A few highlights:
Keeping a connection open can be tricky, since you want your web-servers to be hot-swappable
Calling out to an external service (eg, your desktop) from a web-server is dangerous, because this request could hang. You'd want move this notification onto a separate thread to avoid tying up the webserver.
To mitigate some of the concerns, you might decouple the unreliable desktop from the web-server by introducing an intermediary notification server -- the web-server could post an update somewhere, and the desktop could poll/connect/register there to be notified. To avoid reinventing the wheel here, this could involve some sort of MessageQueue system... This, of course, adds the complexity of needing to maintain the new intermediary.
Again, all of these approaches are probably quite complex, so I'd say polling is probably the best bet.
I can see two ways:
Your desktop application polls the web app
Your web app notifies the desktop application
Your web app could publish an RSS feed, but your desktop app will still have to poll the feed every 10 s.
The traffic need not be huge: if you use an HTTP HEAD request, you'll get a small packet with the date of the last modification (conveniently named Last-Modified).
I don't know exactly what to do to achieve your task but I can suggest to create a windows service at the desktop application PC.
This service checks the web application every interval of time for new changes and if changes occurred it can run the desktop application with notification that there is a change in the web application and in the web application when any change occurrs you can response with acknowledgment
I hope that this may be useful I didn't try it exactly but I am suggesting using like this idea.
A layer of syndication would help to scale out the system.
The desktop app can register itself with a "publisher" service (running on one of several/many machines) This publisher service receives the "notice" from your web app that something has changed, and immediately starts notifying all of its registered subscribers.
The number of publishers you need will increase with the number of users.
Edit: Forgot to mention that the desktop app will need to listen on a socket.