When using WebRTC, is a peer-to-peer architecture redundant to build a video chat service like Skype? - html

We're playing around with WebRTC and trying to understand its benefits.
One reason Skype can serve hundreds of millions of people is because of its decentralized, peer-to-peer architecture, which keeps server costs down.
Does WebRTC allow people to build a video chat application similar to Skype in that the architecture can be decentralized (i.e., video streams are not routed from a broadcaster through a central server to listeners but rather routed directly from broadcaster to listener)?
Or, put another way, does WebRTC allow someone to essentially replicate the benefits of a P2P architecture similar to Skype's?
Or do you still need something similar to Skype's P2P architecture?

Yes, that's basically what WebRTC does. Calls using the getPeerConnection() API don't send voice/video data through a centralized server, but rather use firewall traversal protocols like ICE, STUN and TURN to allow a direct, peer-to-peer connection. However, the initial call setup still requires a server (most likely something running a WebSocket implementation, but it could be anything that you can figure out how to get JavaScript to talk to), so that the two clients can figure out that they're both online, signal that they want to connect, and then figure out how to do it (this is where the ICE/STUN/TURN bit comes in).
However, there's more to Skype's P2P architecture than just passing voice/video data back and forth. The majority of Skype's IP isn't in the codecs or protocols (much of which they licensed from Global IP Solutions, which Google purchased two years ago and then open-sourced, and which forms of the basis of Chrome's WebRTC implementation). Skype's real IP is all in the piece of WebRTC which still depends on a server: figuring out which people are online, and where they are, and how to get a hold of them, and doing that in a massively decentralized fashion. (See here for some rough details.) I think that you could probably use the DataStream portion of the getPeerConnection() API to do that sort of thing, if you were really, really smart - but it would be complicated, and would most likely stomp on a few Skype patents. Unless you want to be really, really huge, you'd probably just want to run your own centralized presence and location servers and handle all that stuff through standard WebSockets.

I should note that Skype's network architecture has changed since it was created; it no longer (from what I hear) uses random users as supernodes to relay data from client 1 to client 2; it didn't scale well and caused rampant variability in results (and annoyed people who had non-firewalled connections and bandwidth).
You definitely can build something SKype-like with WebRTC - and more. :-)

Related

Is there a standard PubSub protocol over WebSocket?

I'm looking for a way to implement basic Publish / Subscribe between applications written in different languages, to exchange events with JSON payloads.
WebSocket seems like the obvious choice for the transport, but you need an (arguably small) layer on top to implement some of the plumbing:
aggreeing on messages representing the pubsub domain "subscribe to a topic", "publish a message"
aggreeing on messages for the infra ("heartbeat", "authentication")
I was expecting to find an obvious standard for this, but there does not seem to be any.
WAMP is often refered to, but in my (short) experience, the implementations of server / clients libraries are not great
STOMP is often refered to, but in my (even shorter) experience, it's even worse
Phoenix Channels are nice, but they're restricted to Phoenix/Elixir world, and not standard (so the messages can be changed at any phoenix version without notice.)
So, is everyone using MQTT/WS (which require another broker components, rather than simple servers ?) Or gRPC ?
Is everyone just re-implementing it from scratch ? (It's one of those things that seems easy enough to do oneselves, but I guess you just end up with an half-baked, poorly-specified, broken version of the thing I'm looking for...)
Or is there something fundamentally broken with the idea of serving streams of data from a server over WS ?
There are two primary classes of WebSocket libraries; those that implement the protocol and leave the rest to the developer, and those that build on top of the protocol with various additional features commonly required by realtime messaging applications, such as restoring lost connections, pub/sub, and channels, authentication, authorization, etc.
The latter variety often requires that their own libraries be used on the client-side, rather than just using the raw WebSocket API provided by the browser. As such, it becomes crucial to make sure you’re happy with how they work and what they’re offering. You may find yourself locked into your chosen solution’s way of doing things once it has been integrated into your architecture, and any issues with reliability, performance, and extensibility may come back to bite you.
ws, faye-websockets, socket.io, μWebSockets and SocketCluster are some good open-source options.
The number of concurrent connections a server can handle is rarely the bottleneck when it comes to server load. Most decent WebSocket servers can support thousands of concurrent connections, but what’s the workload required to process and respond to messages once the WebSocket server process has handled receipt of the actual data?
Typically there will be all kinds of potential concerns, such as reading and writing to and from a database, integration with a game server, allocation and management of resources for each client, and so forth.
As soon as one machine is unable to cope with the workload, you’ll need to start adding additional servers, which means now you’ll need to start thinking about load-balancing, synchronization of messages among clients connected to different servers, generalized access to client state irrespective of connection lifespan or the specific server that the client is connected to – the list goes on and on.
There’s a lot involved when implementing support for the WebSocket protocol, not just in terms of client and server implementation details, but also with respect to support for other transports to ensure robust support for different client environments, as well as broader concerns, such as authentication and authorization, guaranteed message delivery, reliable message ordering, historical message retention, and so forth. A data stream network such as Ably Realtime would be a good option to use in such cases if you'd rather avoid re-inventing the wheel.
There's a nice piece on WebSockets, Pub/Sub, and all issues related to scaling that I'd recommend reading.
Full disclosure: I'm a Developer Advocate for Ably but I hope this genuinely answers your question.

Video and audio stream - server to clients only

Is there a way to stream a video and audio on a website just to the clients, using a camera installed on the server - for instance, like youtube does ?
I've started reading webrtc, but if I use webrtc I should create a stun/turn server and other things, which for one way stream I think is not necessary (this is just my understanding of the things..) because I don't need anything from the clients, literally, neither their video, or audio..
So is there a way to achieve this using html5, streaming just in one direction:
server (camera) -> clients
Is there something about this out there, or should I stick with webrtc ?
I'm going to explain a possible solution for this scenario, there might be others, but I hope mine gives you a rough idea of how you could do it and a start point to explore more about the amazing possibilities of WebRTC. Please let me know if something is not understood.
So, WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. Sweet, that is: WebRTC has a quite good browser support (not in every browser though, Safari just started supporting it a month ago with Safari 11). But in this case we want to use WebRTC in the server side. At the end of the day we can still think about peer-to-peer real time communication, where one of our peers is the server.
I don't know if you are familiar with Node.js, but I recommend you to write your Server app with it (<3 Javascript!):
There are a few libraries that wrap WebRTC functionality to be used in the server side, like node-webrtc and node-rtc-peer-connection.
But I recommend you to to take a look at electron-werbrtc, since
the others might be using deprecated methods or be incomplete.
electron-webrtc runs a headless Electron client in the background to
use Chromium's built-in WebRTC implementation. So with it you should
be able to access the Camera in your server and create a stream to be
served to the other peer (the browser).
All above would be the WebRTC related tasks, in this case: streaming video peer(server)-to-peer(browser).
Now, let's talk about signaling process, stun and turn.
Signaling: imagine now a scenario peer-to-peer with 2 browsers, they want to establish a direct connection and stream video and audio between each other. But they don't know each other, like if I don't know your home address, I can't send you a letter. So they need a service that helps them know each other, so they can have the other's IP. This should be done by what is called "a signaling server". If somehow you know the other peer IP, you wouldn't need a signaling server.
STUN/TURN: the scheme above works perfectly in a local area network where each peer has its own IP address and there are no firewalls and routers between them. But otherwise, you can have peers behind a NAT or firewalls, and then your signaling server won't be able to make both peers to discover themselves. If you have peers behind a NAT, you'll need a STUN server, and if you have peers behind firewalls you'll need a TURN server. This is a bit simplified, but I just want you to have the general picture of when you might need STUN/TURN servers.
To better understand Signaling, STUN and TURN, there is a very graphic article that explains them perfectly.
Now, for your scenario:
I think you prob don't need STUN/TURN servers and also you prob don't need to implement the signaling process, because the browsers that are supposed to receive the stream from the server will know that server address, right? So they can establish a WebRTC connection with it.
EDIT: it is likely that you will need to implement some sort of handshake between the server and the clients (browsers), so this will be the signaling process. This is not part of WebRTC and this is why you need to implement it yourself. As I said, it is the way 2 peers can discover each other, but they also exchange information as their local media conditions, like codecs, resolutions they can handle, etc. For your case, your signaling server could be hosted in the same server you use to strea: you can build a small node.js app that runs there and that manages all the signaling process easily, it is not a big deal. I recommend you to read this article, and specially the section "How can I build a signaling service?". In general all WebRTC articles from that site are very helpful.
Does this make sense to you? I think with it you can start digging a little bit more and see if with this is enough or you need to implement more stuff. Hope it helps!

Live video streaming webrtc?

So I want to be able to let one client distribute live video feed to alot of other clients only watching. Is it possible to use WebRTC for this? Or will I basically have to go through a service like ustream or whatever.
It is possible, with a caveat. One client can establish many outgoing P2P connections to every viewer and stream video to them directly. However, for more than a very small handful of viewers this will quickly saturate the origin's bandwidth and possibly CPU. You would not be able to serve many viewers this way; however this would work entirely without middleman.*
* Safe for the WebRTC connection negotiation server.
To be able to serve a larger number of viewers you'll have to use a centralised distribution server. The source would send exactly one video stream to that server, and the server would stream it to anyone who's interested. This still requires that server to have a beefy CPU and a lot of bandwidth; but this is more realistic to scale than the client side.
You may need to spend a lot of money on such a server; have a look at c3.xlarge and better instances at AWS to get an idea. Using an established, cheap infrastructure like ustream may indeed be the more realistic option.

Do HTML WebSockets maintain an open connection for each client? Does this scale?

I am curious if anyone has any information about the scalability of HTML WebSockets. For everything I've read it appears that every client will maintain an open line of communication with the server. I'm just wondering how that scales and how many open WebSocket connections a server can handle. Maybe leaving those connections open isn't a problem in reality, but it feels like it is.
In most ways WebSockets will probably scale better than AJAX/HTML requests. However, that doesn't mean WebSockets is a replacement for all uses of AJAX/HTML.
Each TCP connection in itself consumes very little in terms server resources. Often setting up the connection can be expensive but maintaining an idle connection it is almost free. The first limitation that is usually encountered is the maximum number of file descriptors (sockets consume file descriptors) that can be open simultaneously. This often defaults to 1024 but can easily be configured higher.
Ever tried configuring a web server to support tens of thousands of simultaneous AJAX clients? Change those clients into WebSockets clients and it just might be feasible.
HTTP connections, while they don't create open files or consume port numbers for a long period, are more expensive in just about every other way:
Each HTTP connection carries a lot of baggage that isn't used most of the time: cookies, content type, conetent length, user-agent, server id, date, last-modified, etc. Once a WebSockets connection is established, only the data required by the application needs to be sent back and forth.
Typically, HTTP servers are configured to log the start and completion of every HTTP request taking up disk and CPU time. It will become standard to log the start and completion of WebSockets data, but while the WebSockets connection doing duplex transfer there won't be any additional logging overhead (except by the application/service if it is designed to do so).
Typically, interactive applications that use AJAX either continuously poll or use some sort of long-poll mechanism. WebSockets is a much cleaner (and lower resource) way of doing a more event'd model where the server and client notify each other when they have something to report over the existing connection.
Most of the popular web servers in production have a pool of processes (or threads) for handling HTTP requests. As pressure increases the size of the pool will be increased because each process/thread handles one HTTP request at a time. Each additional process/thread uses more memory and creating new processes/threads is quite a bit more expensive than creating new socket connections (which those process/threads still have to do). Most of the popular WebSockets server frameworks are going the event'd route which tends to scale and perform better.
The primary benefit of WebSockets will be lower latency connections for interactive web applications. It will scale better and consume less server resources than HTTP AJAX/long-poll (assuming the application/server is designed properly), but IMO lower latency is the primary benefit of WebSockets because it will enable new classes of web applications that are not possible with the current overhead and latency of AJAX/long-poll.
Once the WebSockets standard becomes more finalized and has broader support, it will make sense to use it for most new interactive web applications that need to communicate frequently with the server. For existing interactive web applications it will really depend on how well the current AJAX/long-poll model is working. The effort to convert will be non-trivial so in many cases the cost just won't be worth the benefit.
Update:
Useful link: 600k concurrent websocket connections on AWS using Node.js
Just a clarification: the number of client connections that a server can support has nothing to do with ports in this scenario, since the server is [typically] only listening for WS/WSS connections on one single port. I think what the other commenters meant to refer to were file descriptors. You can set the maximum number of file descriptors quite high, but then you have to watch out for socket buffer sizes adding up for each open TCP/IP socket. Here's some additional info: https://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system
As for decreased latency via WS vs. HTTP, it's true since there's no more parsing of HTTP headers beyond the initial WS handshake. Plus, as more and more packets are successfully sent, the TCP congestion window widens, effectively reducing the RTT.
Any modern single server is able to server thousands of clients at once. Its HTTP server software has just to be is Event-Driven (IOCP) oriented (we are not in the old Apache one connection = one thread/process equation any more). Even the HTTP server built in Windows (http.sys) is IOCP oriented and very efficient (running in kernel mode). From this point of view, there won't be a lot of difference at scaling between WebSockets and regular HTTP connection. One TCP/IP connection uses a little resource (much less than a thread), and modern OS are optimized for handling a lot of concurrent connections: WebSockets and HTTP are just OSI 7 application layer protocols, inheriting from this TCP/IP specifications.
But, from experiment, I've seen two main problems with WebSockets:
They do not support CDN;
They have potential security issues.
So I would recommend the following, for any project:
Use WebSockets for client notifications only (with a fallback mechanism to long-polling - there are plenty of libraries around);
Use RESTful / JSON for all other data, using a CDN or proxies for cache.
In practice, full WebSockets applications do not scale well. Just use WebSockets for what they were designed to: push notifications from the server to the client.
About the potential problems of using WebSockets:
1. Consider using a CDN
Today (almost 4 years later), web scaling involves using Content Delivery Network (CDN) front ends, not only for static content (html,css,js) but also your (JSON) application data.
Of course, you won't put all your data on your CDN cache, but in practice, a lot of common content won't change often. I suspect that 80% of your REST resources may be cached... Even a one minute (or 30 seconds) CDN expiration timeout may be enough to give your central server a new live, and enhance the application responsiveness a lot, since CDN can be geographically tuned...
To my knowledge, there is no WebSockets support in CDN yet, and I suspect it would never be. WebSockets are statefull, whereas HTTP is stateless, so is much easily cached. In fact, to make WebSockets CDN-friendly, you may need to switch to a stateless RESTful approach... which would not be WebSockets any more.
2. Security issues
WebSockets have potential security issues, especially about DOS attacks. For illustration about new security vulnerabilities , see this set of slides and this webkit ticket.
WebSockets avoid any chance of packet inspection at OSI 7 application layer level, which comes to be pretty standard nowadays, in any business security. In fact, WebSockets makes the transmission obfuscated, so may be a major breach of security leak.
Think of it this way: what is cheaper, keeping an open connection, or opening a new connection for every request (with the negotiation overhead of doing so, remember it's TCP.)
Of course it depends on the application, but for long-term realtime connections (e.g. an AJAX chat) it's far better to keep the connection open.
The max number of connections will be capped by the max number of free ports for the sockets.
No it does not scale, gives tremendous work to intermediate routes switches. Then on the server side the page faults (you have to keep all those descriptors) are reaching high values, and the time to bring a resource into the work area increases. These are mostly JAVA written servers and it might be faster to hold on those gazilions of sockets then to destroy/create one.
When you run such a server on a machine any other process can't move anymore.

In which domains are message oriented middleware like AMQP useful?

What problem do MOM (Message Oriented Middleware) solve? Scalability? Integration?
In which domain are they typically used and in which domains are they typically not used?
For example, say, is Google using such solution for it's main search engine or to power GMail?
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Does it make sense for desktop apps that need to communicate with a server?
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
I'm a bit confused as to what they're useful for and I think that with example of where they're appropriate and where they're not appropriate I could better understand their use.
This is a great question.
The main uses of messaging are: scaling, offloading work, integration, monitoring, event handling, routing, networking, push, mobility, buffering, queueing, task sharing, alerts, management, logging, batch, data delivery, pubsub, multicast, audit, scheduling, ... and more. Basically: anything where you need data but don't want to make a database request. (Caching is another, longer story).
Another way of looking at this is to notice that many applications used to be built by assuming that users (people) would perform actions that would be fulfilled by executing a transaction on a database (including reads, writes). But today, many actions are not user-initiated. Instead they are application-initiated. For example "tell me when the book that I want to buy is in stock". The best way to solve this class of problems is with messaging of some sort. Whether you call it middleware or web push or real time salad dressing does not matter. It's all messaging.
When you enable applications to initiate or react to events, then it is much easier to scale because your architecture can be based on loosely coupled components. It is also much easier to integrate those components if your messaging is based on a stable, scalable, serviceable tool, preferably using open standard APIs and protocols.
I hope this helps. We try to maintain a list of useful links about messaging here
Please get in touch with questions and comments on any of this, we are dead easy to find.
To address your specific questions:
In which domain are they typically used and in which domains are they typically not used?
Like databases, messaging systems crop up everywhere.
For example, say, is Google using such solution for it's main search engine or to power GMail?
Google uses a lot of home grown technology, but a lot of their open source contributions and known use cases suggest that messaging is (or should be) central to some of the main services.
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Very much so.
An example use case is scaling web page requests. When the user makes a web request, the web server puts it onto a queue for background processing. This means that the web server can keep working while the request is processed. It also means that the web server does not need to know how the request is handled, making system maintenance, upgrade and rollback much simpler because the main parts are 'decoupled'.
So, anyway, the web request gets processed by a back end service, or possibly by many services, eg 'look up book titles', 'draw shopping cart', 'get advertisement', 'check user account'... Finally all the results get put onto another queue, ready for collection and user response by the web server. Typically the system will include a timeout of around 100ms so that any late requests just get thrown away. The user sees anything that got processed in the time interval. This is one reason why some large ecommerce sites have pages that appear to load in stages.
There are many more use cases...
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Definitely. If you have an unknown, or unbounded, number of users, server side instances, and application latencies, then it makes sense to use messaging, even if just as a scalable substrate for non-blocking RPC.
Does it make sense for desktop apps that need to communicate with a server?
In lots of cases. One very common case is when the server pushes events to the desktop app, eg game event, tweets, price feeds in finance, system alerts....
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
Definitely not only for those 'legacy integration' cases but they are important too. At RabbitMQ, the biggest customers we have in terms of pure scale or message volume are cloud providers and big web application providers.
I will answer only one answer, from prior experience - take a look at this middle-ware that is employed by big companies here - middle-ware has one purpose - to glue dis-connected systems (written in disparate languages) together so that they can interact with one another and streamline the business process - Entera as I have had experience with, creates a middle layer in which the unix box using processes written in C, interact with the mainframe system (DB2, COBOL) via a front-end written in PowerBuilder (I am not naming the company!).
From the description I have given, Entera is a middle-ware which hosts a number of things - smooth integration of the flow of data regardless of the endian format, ability for different languages to talk to the middle-ware broker (a broker is a CORBA or DCE like process, that conforms to 'The Open Group) that listens on a particular port) and is specified by an IDL which makes a process appear to be local - if you understand the terminology used in Remoting under Microsoft's .NET Framework, you are not far off the mark! The middle-ware generates stubs which are linked at compile-time and manages the creation of the process, hosting it off a port, multi-threading at run-time, and also, the modern front-ends (such as .NET, Java, PowerBuilder even the unspeakable VB6...ok...VB.NET for the purists out there) can interact by opening a connection to the specified port on a particular IP address, and using the stubs generated, can interact with it directly.
Obviously, from what was described you can see how the legacy systems can have new life breathed into it and thus scalability of the process, the major downside of this is the cost factor which can run into thousdands of dollars. Big companies who uses mainframes as their back-end processing systems for billing/invoicing, who generate a huge revenue can obviously afford such an expensive product - to them it would seem like throwing pennies into a pool of water...because of the use of middle-ware which prolongs the business process, and breathe new life into it, can extend the business by a good number of years into the future without worrying about 'legacy' tag attached to it.
Incidentally, I carried this out as part of my thesis for my BSc. in Information Systems which covered this commercial front-end. There was an open source version of the middle-ware available on sourceforge called FreeDCE, but development efforts have declined or stopped.
Edit:
#cocotwo: That is exactly what middle-ware does as you said it is a plumbing tool...message oriented middle-ware is not really heard of AFAIK because I would imagine, the processes (functions) would need to be called as if they are locally visible within the application domain of the front-end to make it easy to interact with.
Using messages may have its advantages over RPC calls in that the messages are queued in a safe-holding area in the event that a network disconnection occurs - there may be some data caching going on within that aspect to allow the front-end to continue regardless...it would be useful in the instances of 'updating a status of a particular billing/invoice number' - a one-way write-data to the back-end via the middle-ware.
Ok, big companies would have advanced systems infrastructure in that technicians are constantly around the clock to ensure a smooth delivery of data-flow so that would have to be factored in. The company that I worked with had IBM Global Support contract to fulfill in order to ensure a maximum uptime 99% with 6 nine's after the decimal point...with hot-swapping/balanced-clusters/mirroring systems in place...
Whereas with RPC, if the disconnection occurs, the front-end would have to be restarted or would have to handle the disconnection event. It really depends if the message-queueing middle-ware handles each message in real-time and pass back results to the front-end immediately...
This is where each (Message-queueing and RPC related middle-ware) have their strengths and weaknesses...and also the cost mitigation factor such as support, maximum up-time, development efforts and training - that's a biggie here as middle-ware are really proprietary (despite following the 'The Open Group' layout/standards) and complex to setup and to glue the whole thing together via scripts.
Good answers and discussion here. Our consulting team has two preferred "messaging" solutions: RabittMQ and NXTera a high speed RPC middleware, the contemporary version of Entera mentioned above. My partners and I have developed several solutions using RabittMQ, it is the best tool available in that space right now. Additionally, I happen to work for the company that makes NXTera/Entera.
From experience I can clearly say that both of these products meet the need for reliability and low maintenance as discussed above. There are situations where a messaging service, like RabittMQ, is the right choice -- where Publish and subscribe, certified delivery, Queuing or store-and-forward are required.
In other cases, RPC's (remote procedure calls) are the best and fastest solutions for transactional and distributed processing for enterprise or cloud-based applications. When it is right to use an RPC, but SOAP/.NET (yes these are RPC implementations) are too slow, expensive or complex, a lightwieght high speed RPC middleware like NXTera/Entera is the right choice for us.
There is some use case overlap between RPC middleware and message oriented middleware, and where there are you can use either successfully. But both are strong and dependable choices.
The large companies I work with use both RPC and MoM side-by-side. As far as Internet companies, Google (Protocol Buffers) and Facebook (Thrift) show that RPC's have a roll to play in modern web and cloud-based development.